Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 60 discussion

Actual exam question from Databricks's Certified Data Engineer Professional
Question #: 60
Topic #: 1
[All Certified Data Engineer Professional Questions]

The data engineering team maintains a table of aggregate statistics through batch nightly updates. This includes total sales for the previous day alongside totals and averages for a variety of time periods including the 7 previous days, year-to-date, and quarter-to-date. This table is named store_saies_summary and the schema is as follows:

The table daily_store_sales contains all the information needed to update store_sales_summary. The schema for this table is: store_id INT, sales_date DATE, total_sales FLOAT
If daily_store_sales is implemented as a Type 1 table and the total_sales column might be adjusted after manual data auditing, which approach is the safest to generate accurate reports in the store_sales_summary table?

  • A. Implement the appropriate aggregate logic as a batch read against the daily_store_sales table and overwrite the store_sales_summary table with each Update.
  • B. Implement the appropriate aggregate logic as a batch read against the daily_store_sales table and append new rows nightly to the store_sales_summary table.
  • C. Implement the appropriate aggregate logic as a batch read against the daily_store_sales table and use upsert logic to update results in the store_sales_summary table.
  • D. Implement the appropriate aggregate logic as a Structured Streaming read against the daily_store_sales table and use upsert logic to update results in the store_sales_summary table.
  • E. Use Structured Streaming to subscribe to the change data feed for daily_store_sales and apply changes to the aggregates in the store_sales_summary table with each update.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
hammer_1234_h
Highly Voted 1 year, 2 months ago
The answer should be A. it is the safest to generate accurate report
upvoted 11 times
alexvno
11 months, 1 week ago
Incorrect BATCH processing and OVERWRITE will give partial results
upvoted 1 times
...
Def21
10 months ago
This is confusing: "overwrite the store_sales_summary table with each Update." sounds like it is only doing updates, not inserting new possible stories.
upvoted 3 times
...
...
shaojunni
Most Recent 1 month, 2 weeks ago
Selected Answer: A
daily_store_sales is type1 table, no history is maintained. You have to treat every record as new record and do aggregation for every store. Overwrite is much efficient than upsert.
upvoted 1 times
...
Ati1362
5 months ago
Selected Answer: C
I will go with c. upsert
upvoted 3 times
...
MDWPartners
5 months, 4 weeks ago
Selected Answer: C
A is not correct because the table is daily. If you overwrite you delete all history. You need to insert/update to keep history.
upvoted 4 times
fe3b2fc
3 months, 1 week ago
Incorrect. The daily store sales table contains all of the history needed to update the table. The summary table holds no historical records. Seeing as this is a nightly job, any manual changes made to daily store sales will be captured. A is the correct answer.
upvoted 2 times
...
...
ThoBustos
7 months, 1 week ago
Selected Answer: A
Not sure if that's right but I would go for A. What do you think? Type1: Data is overwritten Type 2: History is maintained, new data is inserted as new rows Type 3: Stores two versions per record: a previous and a current value A. batch + overwrite -> Match Type 1 requirements. YES B: batch + append new rows -> Would be for type 2. NO C. Batch + Upsert -> Data is not being overwritten (which is required for Type 1). NO D. ReadStream + Upsert -> Data is not being overwritten (which is required for Type 1). NO E. Change Data Feed to update -> Problem is manual edits + not overwriting (required for type 1). No I have doubts around "which approach is the safest". Maybe because due to some manual changes it is hard to track changes or do upsert, so to make sure that the stats are right overwriting is safer.
upvoted 3 times
...
vikram12apr
8 months, 2 weeks ago
Selected Answer: C
Not A because overwriting will only provide a daily based data not the history of it. Not B because it will not fix the issue of incorrect sales amount As these data are fit for natch processing so neither D or E. C will only upsert the changes while making sure we are updating the records based on sales_date & store_id
upvoted 2 times
...
Rinscy
9 months, 4 weeks ago
E definitely because it say that the total_sales column may be change by manual auditing so not via a job, so streaming with CDF is the only option here !
upvoted 1 times
...
Somesh512
9 months, 4 weeks ago
Selected Answer: A
I would go with Option A. Because it has manual auditing hence values can change. Uses type 1 hence replace original data
upvoted 3 times
...
spaceexplorer
10 months ago
Selected Answer: E
It should be E, as structure streaming has built-in fault-tolerance feature.
upvoted 1 times
...
Rinscy
10 months ago
It said type 1 so A is the correct answer !
upvoted 1 times
...
divingbell17
10 months, 4 weeks ago
The question is unclear whether the aggregated table needs to support a rolling history. Note the aggregated table does not have a date column to distinguish which date the summary is generated for so one could assume the table is maintained only for the current snapshot. Assuming the above - A would be the safest option as all stores and aggregates would need to be refreshed nightly
upvoted 2 times
...
dmov
11 months ago
Selected Answer: A
A is correct because it's a static table that is written nightly through a batch job. The summary table does not maintain history and so an upsert results in having extra, unecessary records. Overwrite it nightly with updated aggregates for the required time period.
upvoted 3 times
Def21
10 months ago
"Safest" probably includes having Delta table. And history is maintained anyway.
upvoted 1 times
...
...
Luv4data
11 months ago
The answer is A. Note that the target table has columns which stores quarter to date,previous day sates etc, which will result in daily updates, i.e. large volume of records will be updated, hence better to overwirte than to update large volume of records.
upvoted 3 times
...
alexvno
11 months, 1 week ago
Selected Answer: C
Batch processing so you need to update and insert - C
upvoted 3 times
...
Enduresoul
12 months ago
Selected Answer: C
Answer C is correct. Answer E would do the job too, but the table schema and the question indicates, that there will be only one update daily needed. Therefore a structured streaming job is way too expensive to archive the outcome.
upvoted 3 times
Gulenur_GS
11 months, 3 weeks ago
You are absolutely right!
upvoted 1 times
...
...
Syd
1 year ago
Correct answer A Type 1 data is overwritten https://streamsets.com/blog/slowly-changing-dimensions-vs-change-data-capture/#:~:text=In%20Type%201%2C%20any%20new,change%20to%20maintain%20a%20history.
upvoted 1 times
...
sturcu
1 year, 1 month ago
Selected Answer: E
I would say that it is E. If daily_store_sales table is implemented as a Type 1 table, this means that values are overwritten, and we do not keep the history. So we would need to create a streaming from CDF and apply those changes into the aggregated table.
upvoted 2 times
sturcu
1 year ago
manual data auditing, implies we do not know when a change is made, hence we do not know when to schedule the "batch update" for the aggregated table
upvoted 1 times
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...