Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 75 discussion

Actual exam question from Databricks's Certified Data Engineer Professional
Question #: 75
Topic #: 1
[All Certified Data Engineer Professional Questions]

A data engineer is configuring a pipeline that will potentially see late-arriving, duplicate records.

In addition to de-duplicating records within the batch, which of the following approaches allows the data engineer to deduplicate data against previously processed records as it is inserted into a Delta table?

  • A. Set the configuration delta.deduplicate = true.
  • B. VACUUM the Delta table after each batch completes.
  • C. Perform an insert-only merge with a matching condition on a unique key.
  • D. Perform a full outer join on a unique key and overwrite existing data.
  • E. Rely on Delta Lake schema enforcement to prevent duplicate records.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
aragorn_brego
1 year ago
Selected Answer: C
To handle deduplication against previously processed records in a Delta table, the MERGE INTO command can be used to perform an upsert operation. This means that if the incoming data has a record that matches an existing record based on a unique key, the MERGE INTO operation can update the existing record (if needed) or simply ignore the duplicate. If there is no match (i.e., the record is new), then the record will be inserted
upvoted 4 times
...
60ties
1 year ago
Selected Answer: C
answer is C
upvoted 2 times
...
Dileepvikram
1 year ago
Answer is C
upvoted 2 times
...
hm358
1 year ago
Selected Answer: C
merge will be more efficient
upvoted 2 times
...
sturcu
1 year, 1 month ago
Selected Answer: C
Merge, when not match insert
upvoted 4 times
...
Crocjun
1 year, 1 month ago
C Reference: file:///C:/Users/yuen1/Downloads/databricks-certified-data-engineer-professional-exam-guide.pdf
upvoted 1 times
mouad_attaqi
1 year, 1 month ago
you are referencing a local pdf in your computer !!!
upvoted 9 times
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...