Exam Certified Data Engineer Professional topic 1 question 149 discussion

Actual exam question from Databricks's Certified Data Engineer Professional

Question #: 149
Topic #: 1

[All Certified Data Engineer Professional Questions]

A data pipeline uses Structured Streaming to ingest data from Apache Kafka to Delta Lake. Data is being stored in a bronze table, and includes the Kafka-generated timestamp, key, and value. Three months after the pipeline is deployed, the data engineering team has noticed some latency issues during certain times of the day.

A senior data engineer updates the Delta Table's schema and ingestion logic to include the current timestamp (as recorded by Apache Spark) as well as the Kafka topic and partition. The team plans to use these additional metadata fields to diagnose the transient processing delays.

Which limitation will the team face while diagnosing this problem?

A. New fields will not be computed for historic records.
B. Spark cannot capture the topic and partition fields from a Kafka source.
C. Updating the table schema requires a default value provided for each field added.
D. Updating the table schema will invalidate the Delta transaction log metadata.

Show Suggested Answer

Suggested Answer: A 🗳️

by m79590530 at Oct. 20, 2024, 5:51 p.m.

Comments

Submit Cancel

RandomForest

3 months ago

Selected Answer: A

A is correct: the old records are lost as no history was saved for the new toppic.

upvoted 1 times

...

m79590530

6 months ago

Selected Answer: A

There is no way to reprocess history/old records to populate these values after 3 months as Kafka does not necessarily preserve them so long. This is the function of the Raw or Bronze table. Also, the other answers just don't make sense.

upvoted 2 times

...

Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 149 discussion

Comments

RandomForest

m79590530

SY0-701