Exam Certified Data Engineer Associate All Questions

View all questions & answers for the Certified Data Engineer Associate exam

Exam Certified Data Engineer Associate topic 1 question 37 discussion

Actual exam question from Databricks's Certified Data Engineer Associate

Question #: 37
Topic #: 1

[All Certified Data Engineer Associate Questions]

A data engineer has a single-task Job that runs each morning before they begin working. After identifying an upstream data issue, they need to set up another task to run a new notebook prior to the original task.
Which of the following approaches can the data engineer use to set up the new task?

A. They can clone the existing task in the existing Job and update it to run the new notebook.
B. They can create a new task in the existing Job and then add it as a dependency of the original task.
C. They can create a new task in the existing Job and then add the original task as a dependency of the new task.
D. They can create a new job from scratch and add both tasks to run concurrently.
E. They can clone the existing task to a new Job and then edit it to run the new notebook.

Show Suggested Answer

Suggested Answer: B 🗳️

by 4be8126 at April 5, 2023, 1:06 p.m.

Comments

Submit Cancel

Redwings538

Highly Voted 2 years, 2 months ago

Selected Answer: B

It seems there is some confusion on what dependency means in this case. Option B is correct because adding the new task as a dependency of the original task means that the new task will run BEFORE the original task, which is the goal defined in the question.

upvoted 24 times

loyik65509

3 months, 2 weeks ago

This means the original task must run first before the new task starts. The original task will wait for the new task. This is the wrong order because we need the new task to run first to fix upstream data issues before the original task executes.

upvoted 2 times

...

Data_4ever

Highly Voted 2 years, 2 months ago

Selected Answer: B

B is the right answer.

upvoted 15 times

...

Billybob0604

Most Recent 3 months, 2 weeks ago

Selected Answer: C

The new task should run before the original task, meaning the original task must depend on the new task

upvoted 3 times

...

pint414

4 months, 1 week ago

Selected Answer: B

B as the new task runs first

upvoted 1 times

...

avidlearner

4 months, 2 weeks ago

Selected Answer: C

I think the confusion here is because it mentions "as a dependency" which to my opinion means following. if we go by that wording C is the correct answer because we want the original task to be run after the new task.

upvoted 1 times

...

Usaha1

5 months, 3 weeks ago

Selected Answer: B

B because when we add a task which is supposed to run after previous task then dependency ("depends on") gets added to the second job, not the first job.

upvoted 1 times

...

rohitrc8521

5 months, 4 weeks ago

Selected Answer: C

Answer is C, folks Please pay solid attention to the wording. They deliberately have constructed the wordings of option B and C to confuse the audience.

upvoted 2 times

...

danishanis

5 months, 4 weeks ago

Selected Answer: C

I think the correct answer should be C and not B. Adding the new task as a dependency of the original task would mean that the original task runs first and then the new task runs. This is the opposite of what is desired in the question.

upvoted 3 times

...

brconejeros

6 months, 1 week ago

Selected Answer: C

Basically because on the sentence we have a prior: "they need to set up another task to run a new notebook prior to the original task.". So, the correct answer is C

upvoted 1 times

...

Rifrif

6 months, 2 weeks ago

Selected Answer: B

the answer B as it need runs before start working

upvoted 1 times

...

sam_chalvet

6 months, 2 weeks ago

Selected Answer: B

B - Event without know anything about Databricks, answer B is how I would want to be able to handle this scenario, it makes the most sense.

upvoted 1 times

...

806e7d2

7 months, 2 weeks ago

Selected Answer: B

In Databricks Jobs, you can manage task dependencies within a single job. If you want to add a new task that needs to run before the original task due to an upstream issue, the appropriate approach would be to: Create a new task: This new task would run the notebook that addresses the upstream data issue. Add it as a dependency of the original task: By making the new task dependent on the original task, you ensure that the new task runs first, and only after its successful completion will the original task run. This approach ensures that the sequence of tasks is correctly managed in a single job, with dependencies explicitly defined.

upvoted 1 times

...

Colje

9 months, 1 week ago

C. They can create a new task in the existing Job and then add the original task as a dependency of the new task. Why this is correct: In Databricks, you can set up a task dependency chain by adding a new task and specifying that the original task depends on the new one. This ensures that the new task will run first, followed by the original task.

upvoted 1 times

...

tangerine141

9 months, 2 weeks ago

Selected Answer: B

Both B and C involve dependencies between tasks, but the difference is in how the dependencies are structured: B: "They can create a new task in the existing Job and then add it as a dependency of the original task." In this case, the new task is added as a prerequisite (dependency) for the original task. This means the new task will run first, and once it's completed, the original task will run. C: "They can create a new task in the existing Job and then add the original task as a dependency of the new task." In this case, the original task is added as a dependency for the new task, meaning the new task will wait for the original task to finish before running. The correct answer is B: You want the new task (the one handling the upstream issue) to run before the original task, so it should be set as a dependency of the original task.

upvoted 1 times

...

Stefan94

9 months, 2 weeks ago

Selected Answer: B

B is correct as Redwings538 says

upvoted 1 times

...

CID2024

10 months, 1 week ago

I think the Correct answer is C. Because as per the statement in the question "they need to set up another task to run a new notebook prior to the original task." i.e. original task should run AFTER the new task. So, By creating a new task in the existing job and setting the original task as a dependency of the new task, the data engineer ensures that the new notebook runs first, followed by the original task. This approach maintains the sequence of execution required to address the upstream data issue.

upvoted 2 times

...

9d4d68a

10 months, 1 week ago

Below is the info I am convinced after checking with AI..... Here's the break down the differences between options B and C: Option B: Create a new task in the existing Job and then add it as a dependency of the original task: Result: The new task will run after the original task. Option C: Create a new task in the existing Job and then add the original task as a dependency of the new task: Result: The new task will run before the original task. Summary: Option B: Original task → New task Option C: New task → Original task In your case, Option C is the correct choice because you need the new task to run first to resolve the upstream data issue before the original task executes.

upvoted 2 times

...

Load full discussion...