Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 8 discussion

Actual exam question from Databricks's Certified Data Engineer Professional
Question #: 8
Topic #: 1
[All Certified Data Engineer Professional Questions]

An upstream source writes Parquet data as hourly batches to directories named with the current date. A nightly batch job runs the following code to ingest all data from the previous day as indicated by the date variable:

Assume that the fields customer_id and order_id serve as a composite key to uniquely identify each order.
If the upstream system is known to occasionally produce duplicate entries for a single order hours apart, which statement is correct?

  • A. Each write to the orders table will only contain unique records, and only those records without duplicates in the target table will be written.
  • B. Each write to the orders table will only contain unique records, but newly written records may have duplicates already present in the target table.
  • C. Each write to the orders table will only contain unique records; if existing records with the same key are present in the target table, these records will be overwritten.
  • D. Each write to the orders table will only contain unique records; if existing records with the same key are present in the target table, the operation will fail.
  • E. Each write to the orders table will run deduplication over the union of new and existing records, ensuring no duplicate records are present.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
Eertyy
Highly Voted 1 year, 2 months ago
B. Each write to the orders table will only contain unique records, but newly written records may have duplicates already present in the target table. Explanation: In the provided code, the .dropDuplicates(["customer_id","order_id"]) operation is performed on the data loaded from the Parquet files. This operation ensures that only unique records, based on the composite key of "customer_id" and "order_id," are retained in the DataFrame before writing to the "orders" table. However, this operation does not consider duplicates that may already exist in the "orders" table. It only filters duplicates from the current batch of data. If there are duplicates in the "orders" table from previous batches, they will remain in the table. So, newly written records will not have duplicates within the batch being written, but duplicates from previous batches may still exist in the target table.
upvoted 11 times
...
benni_ale
Most Recent 2 months ago
Selected Answer: B
Append method does not take in consideration any key in the target table, it simply add all rows of the input table to the target table.
upvoted 1 times
...
panya
5 months ago
Yes it should be B
upvoted 1 times
...
imatheushenrique
5 months, 3 weeks ago
B. Each write to the orders table will only contain unique records, but newly written records may have duplicates already present in the target table. Using merge this problem would not happen
upvoted 1 times
...
DavidRou
8 months, 3 weeks ago
Selected Answer: B
B is the right answer. The above code only remove duplicates from the batch that is processed, no logic is applied to already saved records.
upvoted 1 times
...
Jay_98_11
10 months, 2 weeks ago
Selected Answer: B
B is correct
upvoted 1 times
...
5ffcd04
10 months, 4 weeks ago
Selected Answer: B
Answer B
upvoted 1 times
...
kz_data
11 months, 1 week ago
Selected Answer: B
B is correct
upvoted 1 times
...
vivekla
12 months ago
correct B
upvoted 1 times
...
sturcu
1 year, 1 month ago
Selected Answer: B
Correct
upvoted 1 times
...
Starvosxant
1 year, 1 month ago
Correct. B
upvoted 1 times
...
thxsgod
1 year, 2 months ago
Selected Answer: B
Correct
upvoted 2 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...