exam questions

Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 72 discussion

Actual exam question from Databricks's Certified Data Engineer Professional
Question #: 72
Topic #: 1
[All Certified Data Engineer Professional Questions]

A data team's Structured Streaming job is configured to calculate running aggregates for item sales to update a downstream marketing dashboard. The marketing team has introduced a new promotion, and they would like to add a new field to track the number of times this promotion code is used for each item. A junior data engineer suggests updating the existing query as follows. Note that proposed changes are in bold.

Original query:



Proposed query:



Proposed query:

.start(“/item_agg”)

Which step must also be completed to put the proposed query into production?

  • A. Specify a new checkpointLocation
  • B. Increase the shuffle partitions to account for additional aggregates
  • C. Run REFRESH TABLE delta.'/item_agg'
  • D. Register the data in the "/item_agg" directory to the Hive metastore
  • E. Remove .option(‘mergeSchema’, ‘true’) from the streaming write
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
f728f7f
Highly Voted 7 months, 2 weeks ago
This question is broken. Proposed query cannot be identified.
upvoted 23 times
...
AlejandroU
Highly Voted 1 month, 2 weeks ago
Selected Answer: A
Below is the proposed query: df.groupBy("item") .agg(count("item").alias("total_count"), mean("sale_price").alias("avg_price"), count("promo_code = 'NEW MEMBER'") .alias("new member_promo")) writeStream .outputMode("complete") .option('mergeSchema', 'true') .option("checkpointLocation", "/item_agg/ checkpoint") .start("/item_agg") Answer A. When updating the schema of a streaming job by adding new fields (like the new_member_promo field), it’s important to use a new checkpoint location. This is because the existing checkpoint location is tied to the old schema, and adding a new field could lead to schema mismatch issues.
upvoted 5 times
OnlyPraveen
1 month, 1 week ago
Thank you! Also check Question #114 which has the Proposed Query image too.
upvoted 1 times
...
...
kino_1994
Most Recent 1 month, 3 weeks ago
Selected Answer: A
Since the new field is a count (an aggregation), it is non-nullable, making the change incompatible with the existing schema. This requires a new checkpointLocation to avoid schema mismatch issues. Additionally, the "mergeSchema=true" option must remain enabled to allow Spark to handle the schema evolution properly. However, if the field were nullable and not an aggregation, it would be a backward-compatible change, allowing the checkpoint to remain unchanged, as happens with schema evolution in Kafka. In this case, the correct answer is A.
upvoted 2 times
...
Sriramiyer92
1 month, 3 weeks ago
Selected Answer: A
The given answer is correct. In case of addition of new cols (or changes) the checkpoint location also needs to change.
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago