Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam Certified Data Engineer Professional All Questions

View all questions & answers for the Certified Data Engineer Professional exam

Exam Certified Data Engineer Professional topic 1 question 35 discussion

Actual exam question from Databricks's Certified Data Engineer Professional
Question #: 35
Topic #: 1
[All Certified Data Engineer Professional Questions]

To reduce storage and compute costs, the data engineering team has been tasked with curating a series of aggregate tables leveraged by business intelligence dashboards, customer-facing applications, production machine learning models, and ad hoc analytical queries.
The data engineering team has been made aware of new requirements from a customer-facing application, which is the only downstream workload they manage entirely. As a result, an aggregate table used by numerous teams across the organization will need to have a number of fields renamed, and additional fields will also be added.
Which of the solutions addresses the situation while minimally interrupting other teams in the organization without increasing the number of tables that need to be managed?

  • A. Send all users notice that the schema for the table will be changing; include in the communication the logic necessary to revert the new table schema to match historic queries.
  • B. Configure a new table with all the requisite fields and new names and use this as the source for the customer-facing application; create a view that maintains the original data schema and table name by aliasing select fields from the new table.
  • C. Create a new table with the required schema and new fields and use Delta Lake's deep clone functionality to sync up changes committed to one table to the corresponding table.
  • D. Replace the current table definition with a logical view defined with the query logic currently writing the aggregate table; create a new table to power the customer-facing application.
  • E. Add a table comment warning all users that the table schema and field names will be changing on a given date; overwrite the table in place to the specifications of the customer-facing application.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
alexvno
Highly Voted 8 months, 2 weeks ago
Selected Answer: D
Create view. Can't be B as -> without increasing the number of tables that need to be managed
upvoted 7 times
...
guillesd
Highly Voted 9 months, 3 weeks ago
Selected Answer: B
B makes way more sense, the number of tables managed do not increase since the old table won't be used anymore, then the view on top of this table is not another table to manage, just maintains the "original API" of the table to avoid breaking changes in downstream applications
upvoted 5 times
...
vish007
Most Recent 1 week, 1 day ago
Selected Answer: B
Option D will increase the Compute cost significantly as all the downstream teams will run the view which has logic for Aggregate table. Option B make more sense with less impact to storage and compute cost which is the original ask for the data engineering team in the question.
upvoted 1 times
...
benni_ale
1 week, 6 days ago
Selected Answer: D
I am not sure whether B or D... I believe B increases the number of managed Tables as it states that a CREATE TABLE statement is run before a CREATE VIEW ... the fact that the CREATE VIEW will replace the current table is not really specified... still one could argue that it would be dumb not do it but at this point i would say that D is more precise
upvoted 1 times
...
b.b.da.costa
2 weeks, 1 day ago
The problem with this question is if the order of the sentence matters. B: Create a table then create a view. Teams are interrupted after the creation of the table. D: Create a view then create a table. Teams are not interrupted because they are consuming the view first.
upvoted 1 times
vish007
1 week, 1 day ago
Option D will increase the Compute cost significantly as all the downstream teams will run the view which has logic for Aggregate table. Option B make more sense with less impact to storage and compute cost which is the original ask for the data engineering team in the question.
upvoted 1 times
...
benni_ale
1 week, 6 days ago
Also B does it really not increase the number of written tables? It states that a CREATE TABLE is run and CREATE VIEW is run... Nothing really points to the fact that the view will repalce the table... Indeed I would opt for D
upvoted 1 times
...
...
kimberlyvsmith
2 weeks, 4 days ago
Selected Answer: B
B is Correct. It does not create additional tables. The view mimics the old schema so not to interrupt downstream consumers. It ensures the aggregates are persisted to save on compute. D is incorrect mostly due to the aggregates being baked into the view which is not optimal as each time downstream users query the view the joins and aggregates have to be recomputed.
upvoted 1 times
...
shaojunni
1 month, 2 weeks ago
Selected Answer: D
D will not increate the number of table. It will create a new table and replace the aggregation table with a view. B will create a new table, a new view match old table name and schema, aggregation table still there.
upvoted 1 times
...
KB_Ai_Champ
2 months, 1 week ago
option D is correct docs : https://docs.databricks.com/en/delta/update-schema.html also they specifically says that they dont want to increase managed tables!
upvoted 2 times
KB_Ai_Champ
2 months, 1 week ago
Reasons : No Increase in Managed Tables: By replacing the current table with a view, you maintain the same number of managed tables. Backward Compatibility: The view can mimic the original table’s schema, ensuring that existing queries and applications continue to function without modification. Dedicated Table for New Requirements: The new table can be tailored to meet the specific needs of the customer-facing application without affecting other users.
upvoted 2 times
...
...
AndreFR
3 months ago
Selected Answer: B
B is correct, no new tables, and minimally interrupting other teams in the organization A & E excluded, because they interrupt other teams in the organisation, usually answer that require user communication are wrong answers. C excluded, because it’s used for table creation, not after creation D excluded because it increases the number of tables
upvoted 1 times
...
fe3b2fc
3 months ago
Selected Answer: A
B,C and D all state creating a new table, therefore increasing the number of tables to manage. This is exactly what the question says to avoid. "minimally interrupting other teams in the organization without increasing the number of tables that need to be managed" Answer A is the only one that makes sense and is pretty standard operation procedure for databases. E is wrong because you would never update a column comment to inform users of anything.
upvoted 2 times
...
faraaz132
3 months, 3 weeks ago
Selected Answer: B
B is correct. Why not D: Because it will create interruption when you replace the current table with a view and question says minimal interruption
upvoted 2 times
...
pravieee
4 months, 1 week ago
Selected Answer: B
I would go for B. With option B you will run the aggregations once and store in in a table, then present these aggregations in the old schema in a view. With D the aggregations will be done twice, for the old schema view and for the new table.
upvoted 2 times
...
ThoBustos
7 months ago
Selected Answer: B
to me it's b because by creating a new table + the view that will substitute the previous table we still have 1 table. It seems to be the most efficient way to solve this. Not 100% sure though
upvoted 1 times
...
hal2401me
8 months, 2 weeks ago
Selected Answer: D
in my exam today I chose D.
upvoted 2 times
...
IWantCerts
10 months, 2 weeks ago
Selected Answer: B
I think it's B. D replaces original table definition with a view, which will run up compute costs for queries using the table.
upvoted 2 times
...
aksand13
11 months, 1 week ago
Selected Answer: D
D. B has new table and view created.
upvoted 4 times
...
Quadronoid
1 year ago
Selected Answer: B
B is definitely the best option
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...