Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 10 discussion

Actual exam question from Databricks's Certified Data Engineer Professional
Question #: 10
Topic #: 1
[All Certified Data Engineer Professional Questions]

A Delta table of weather records is partitioned by date and has the below schema: date DATE, device_id INT, temp FLOAT, latitude FLOAT, longitude FLOAT
To find all the records from within the Arctic Circle, you execute a query with the below filter: latitude > 66.3
Which statement describes how the Delta engine identifies which files to load?

  • A. All records are cached to an operational database and then the filter is applied
  • B. The Parquet file footers are scanned for min and max statistics for the latitude column
  • C. All records are cached to attached storage and then the filter is applied
  • D. The Delta log is scanned for min and max statistics for the latitude column
  • E. The Hive metastore is scanned for min and max statistics for the latitude column
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
taif12340
Highly Voted 1 year, 3 months ago
Answer D: In the Transaction log, Delta Lake captures statistics for each data file of the table. These statistics indicate per file: - Total number of records - Minimum value in each column of the first 32 columns of the table - Maximum value in each column of the first 32 columns of the table - Null value counts for in each column of the first 32 columns of the table When a query with a selective filter is executed against the table, the query optimizer uses these statistics to generate the query result. it leverages them to identify data files that may contain records matching the conditional filter. For the SELECT query in the question, The transaction log is scanned for min and max statistics for the price column
upvoted 22 times
...
akashdesarda
Most Recent 1 month, 3 weeks ago
Selected Answer: D
Above mentioned points are correct. If the table was just parquet table then parquet file footer have been used. But since this is Delta table, then delta log is used to scan & skip files. It uses stats written in in transaction log.
upvoted 1 times
...
AndreFR
3 months, 1 week ago
Answer D : Delta data skipping automatically collects the stats (min, max, etc.) for the first 32 columns for each underlying Parquet file when you write data into a Delta table. Databricks takes advantage of this information (minimum and maximum values) at query time to skip unnecessary files in order to speed up the queries. https://www.databricks.com/discover/pages/optimize-data-workloads-guide#delta-data
upvoted 1 times
...
saravanan289
3 months, 1 week ago
Selected Answer: D
Delta table stores file statistics in transaction log
upvoted 1 times
...
03355a2
5 months ago
Selected Answer: D
No explanation needed, this is where the information is stored.
upvoted 2 times
...
imatheushenrique
5 months, 3 weeks ago
D. The Delta log is scanned for min and max statistics for the latitude column
upvoted 1 times
...
coercion
6 months, 1 week ago
Selected Answer: D
Delta log collects statistics like min value, max value, no of records, no of files for each transaction that happens on the table for the first 32 columns (default value)
upvoted 1 times
...
Tayari
6 months, 4 weeks ago
Selected Answer: D
D is the answer
upvoted 1 times
...
arik90
8 months ago
Selected Answer: D
Based on Docu is D I don't know why here is showing B
upvoted 1 times
...
alexvno
8 months, 2 weeks ago
Selected Answer: D
Delta log first
upvoted 1 times
...
DavidRou
8 months, 2 weeks ago
Selected Answer: D
Statistics on first 32 columns of a table are computed and written in the Delta Log by default.
upvoted 1 times
...
vikram12apr
8 months, 4 weeks ago
Selected Answer: D
D is the right answer
upvoted 1 times
...
Curious76
9 months ago
Selected Answer: D
D is the answer
upvoted 1 times
...
kkravets
9 months, 1 week ago
Selected Answer: D
D is correct one
upvoted 1 times
...
RiktRikt007
9 months, 2 weeks ago
I checked the delta log, and it dose store stat, stats":"{\"numRecords\":1,\"minValues\":{\"id\":1,\"name\":\"one\",\"age\":11},\"maxValues\":{\"id\":1,\"name\":\"one\",\"age\":11},\"nullCount\":{\"id\":0,\"name\":0,\"age\":0}}"
upvoted 2 times
...
AziLa
10 months, 1 week ago
correct ans is D
upvoted 1 times
...
Jay_98_11
10 months, 2 weeks ago
Selected Answer: D
D for sure
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...