exam questions

Exam Associate Data Practitioner All Questions

View all questions & answers for the Associate Data Practitioner exam

Exam Associate Data Practitioner topic 1 question 9 discussion

Actual exam question from Google's Associate Data Practitioner
Question #: 9
Topic #: 1
[All Associate Data Practitioner Questions]

You are designing a pipeline to process data files that arrive in Cloud Storage by 3:00 am each day. Data processing is performed in stages, where the output of one stage becomes the input of the next. Each stage takes a long time to run. Occasionally a stage fails, and you have to address the problem. You need to ensure that the final output is generated as quickly as possible. What should you do?

  • A. Design a Spark program that runs under Dataproc. Code the program to wait for user input when an error is detected. Rerun the last action after correcting any stage output data errors.
  • B. Design the pipeline as a set of PTransforms in Dataflow. Restart the pipeline after correcting any stage output data errors.
  • C. Design the workflow as a Cloud Workflow instance. Code the workflow to jump to a given stage based on an input parameter. Rerun the workflow after correcting any stage output data errors.
  • D. Design the processing as a directed acyclic graph (DAG) in Cloud Composer. Clear the state of the failed task after correcting any stage output data errors.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
JAGLees
4 days, 21 hours ago
Selected Answer: D
Composer is the recommended orchestration tool for managing workloads in Google Cloud and perfectly meets these requirements
upvoted 1 times
...
n2183712847
1 month ago
Selected Answer: D
The best option is D. Design the processing as a directed acyclic graph (DAG) in Cloud Composer. Clear the state of the failed task after correcting any stage output data errors. Cloud Composer (Apache Airflow) is specifically designed for orchestrating complex data pipelines, provides robust error handling with task-level rerun capabilities for efficient recovery, and offers a fully managed and scheduled environment, making it the most suitable choice for ensuring the final output is generated as quickly as possible in the face of occasional stage failures. Options A, B, and C are less efficient due to manual intervention, less granular error recovery, or lack of dedicated workflow orchestration features.
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago