exam questions

Exam Certified Data Engineer Associate All Questions

View all questions & answers for the Certified Data Engineer Associate exam

Exam Certified Data Engineer Associate topic 1 question 37 discussion

Actual exam question from Databricks's Certified Data Engineer Associate
Question #: 37
Topic #: 1
[All Certified Data Engineer Associate Questions]

A data engineer has a single-task Job that runs each morning before they begin working. After identifying an upstream data issue, they need to set up another task to run a new notebook prior to the original task.
Which of the following approaches can the data engineer use to set up the new task?

  • A. They can clone the existing task in the existing Job and update it to run the new notebook.
  • B. They can create a new task in the existing Job and then add it as a dependency of the original task.
  • C. They can create a new task in the existing Job and then add the original task as a dependency of the new task.
  • D. They can create a new job from scratch and add both tasks to run concurrently.
  • E. They can clone the existing task to a new Job and then edit it to run the new notebook.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Redwings538
Highly Voted 1 year, 8 months ago
Selected Answer: B
It seems there is some confusion on what dependency means in this case. Option B is correct because adding the new task as a dependency of the original task means that the new task will run BEFORE the original task, which is the goal defined in the question.
upvoted 19 times
...
Data_4ever
Highly Voted 1 year, 8 months ago
Selected Answer: B
B is the right answer.
upvoted 15 times
...
sam_chalvet
Most Recent 1 day, 21 hours ago
Selected Answer: B
B - Event without know anything about Databricks, answer B is how I would want to be able to handle this scenario, it makes the most sense.
upvoted 1 times
...
806e7d2
1 month ago
Selected Answer: B
In Databricks Jobs, you can manage task dependencies within a single job. If you want to add a new task that needs to run before the original task due to an upstream issue, the appropriate approach would be to: Create a new task: This new task would run the notebook that addresses the upstream data issue. Add it as a dependency of the original task: By making the new task dependent on the original task, you ensure that the new task runs first, and only after its successful completion will the original task run. This approach ensures that the sequence of tasks is correctly managed in a single job, with dependencies explicitly defined.
upvoted 1 times
...
Colje
2 months, 3 weeks ago
C. They can create a new task in the existing Job and then add the original task as a dependency of the new task. Why this is correct: In Databricks, you can set up a task dependency chain by adding a new task and specifying that the original task depends on the new one. This ensures that the new task will run first, followed by the original task.
upvoted 1 times
...
tangerine141
2 months, 4 weeks ago
Selected Answer: B
Both B and C involve dependencies between tasks, but the difference is in how the dependencies are structured: B: "They can create a new task in the existing Job and then add it as a dependency of the original task." In this case, the new task is added as a prerequisite (dependency) for the original task. This means the new task will run first, and once it's completed, the original task will run. C: "They can create a new task in the existing Job and then add the original task as a dependency of the new task." In this case, the original task is added as a dependency for the new task, meaning the new task will wait for the original task to finish before running. The correct answer is B: You want the new task (the one handling the upstream issue) to run before the original task, so it should be set as a dependency of the original task.
upvoted 1 times
...
Stefan94
3 months ago
Selected Answer: B
B is correct as Redwings538 says
upvoted 1 times
...
CID2024
3 months, 3 weeks ago
I think the Correct answer is C. Because as per the statement in the question "they need to set up another task to run a new notebook prior to the original task." i.e. original task should run AFTER the new task. So, By creating a new task in the existing job and setting the original task as a dependency of the new task, the data engineer ensures that the new notebook runs first, followed by the original task. This approach maintains the sequence of execution required to address the upstream data issue.
upvoted 2 times
...
9d4d68a
3 months, 3 weeks ago
Below is the info I am convinced after checking with AI..... Here's the break down the differences between options B and C: Option B: Create a new task in the existing Job and then add it as a dependency of the original task: Result: The new task will run after the original task. Option C: Create a new task in the existing Job and then add the original task as a dependency of the new task: Result: The new task will run before the original task. Summary: Option B: Original task → New task Option C: New task → Original task In your case, Option C is the correct choice because you need the new task to run first to resolve the upstream data issue before the original task executes.
upvoted 2 times
...
7a22144
4 months, 1 week ago
C is correct because it correctly handles the sequence of execution. By creating a new task in the existing Job and adding the original task as a dependency of the new task, the new task will run first, and once it completes successfully, the original task will run. This ensures that the upstream data issue is addressed before the original task runs.
upvoted 1 times
...
kokosz
7 months ago
Selected Answer: B
B is the right answer.
upvoted 2 times
...
benni_ale
7 months, 3 weeks ago
Selected Answer: B
original depends on new
upvoted 1 times
...
Mircuz
9 months, 3 weeks ago
Selected Answer: C
C because the new task has to run prior the original one
upvoted 3 times
...
Nika12
11 months ago
Selected Answer: B
Just got 100% on the test. B was correct.
upvoted 6 times
...
Shaxxie
11 months, 1 week ago
This has become more of a English grammatical test as the word dependency is confusing people. When the Original task has a dependency on the new task this means the original task needs to depend on the new task. So it's Option C.
upvoted 3 times
...
Garyn
11 months, 4 weeks ago
Selected Answer: C
The data engineer can create a new task in the existing Job and then add the original task as a dependency of the new task (Option C). This way, the new task will run first, and once it’s completed, the original task will run. Here are the steps to do this: Click Workflows in the sidebar and click New and select Job. The Tasks tab appears with the create task dialog. Replace Add a name for your job… with your job name. Enter a name for the task in the Task name field. In the Type drop-down menu, select the type of task to run. Configure the cluster where the task runs. To add dependent libraries, click + Add next to Dependent libraries. You can pass parameters for your task. Please note that the exact process may vary depending on the specific configurations and permissions set up in your workspace. It’s always a good idea to consult with your organization’s IT or data governance team to ensure the correct procedures are followed.
upvoted 4 times
...
Tinendra
12 months ago
Answer is C
upvoted 5 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago