Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam Professional Data Engineer All Questions

View all questions & answers for the Professional Data Engineer exam

Exam Professional Data Engineer topic 1 question 235 discussion

Actual exam question from Google's Professional Data Engineer
Question #: 235
Topic #: 1
[All Professional Data Engineer Questions]

You want to schedule a number of sequential load and transformation jobs. Data files will be added to a Cloud Storage bucket by an upstream process. There is no fixed schedule for when the new data arrives. Next, a Dataproc job is triggered to perform some transformations and write the data to BigQuery. You then need to run additional transformation jobs in BigQuery. The transformation jobs are different for every table. These jobs might take hours to complete. You need to determine the most efficient and maintainable workflow to process hundreds of tables and provide the freshest data to your end users. What should you do?

  • A. 1. Create an Apache Airflow directed acyclic graph (DAG) in Cloud Composer with sequential tasks by using the Cloud Storage, Dataproc, and BigQuery operators.
    2. Use a single shared DAG for all tables that need to go through the pipeline.
    3. Schedule the DAG to run hourly.
  • B. 1. Create an Apache Airflow directed acyclic graph (DAG) in Cloud Composer with sequential tasks by using the Cloud Storage, Dataproc, and BigQuery operators.
    2. Create a separate DAG for each table that needs to go through the pipeline.
    3. Schedule the DAGs to run hourly.
  • C. 1. Create an Apache Airflow directed acyclic graph (DAG) in Cloud Composer with sequential tasks by using the Dataproc and BigQuery operators.
    2. Use a single shared DAG for all tables that need to go through the pipeline.
    3. Use a Cloud Storage object trigger to launch a Cloud Function that triggers the DAG.
  • D. 1. Create an Apache Airflow directed acyclic graph (DAG) in Cloud Composer with sequential tasks by using the Dataproc and BigQuery operators.
    2. Create a separate DAG for each table that needs to go through the pipeline.
    3. Use a Cloud Storage object trigger to launch a Cloud Function that triggers the DAG.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
cuadradobertolinisebastiancami
Highly Voted 9 months ago
D * Transformations are in Dataproc and BigQuery. So you don't need operators for GCS (A and B can be discard) * "There is no fixed schedule for when the new data arrives." so you trigger the DAG when a file arrives * "The transformation jobs are different for every table. " so you need a DAG for each table. Then, D is the most suitable answer
upvoted 6 times
...
8ad5266
Most Recent 5 months ago
Selected Answer: C
This explains why it's not D: maintainable workflow to process hundreds of tables and provide the freshest data to your end users How is creating a DAG for each of the hundreds of tables maintainable?
upvoted 2 times
...
JyoGCP
9 months, 1 week ago
Selected Answer: D
Option D
upvoted 1 times
...
Matt_108
10 months, 2 weeks ago
Selected Answer: D
Option D, which gets triggered when the data comes in and accounts for the fact that each table has its own set of transformations
upvoted 3 times
...
Jordan18
10 months, 3 weeks ago
why not C?
upvoted 3 times
It says that the transformations for each table are very different
upvoted 2 times
...
AllenChen123
10 months, 2 weeks ago
Same question, why not use single DAG to manage as there are hundreds of tables.
upvoted 5 times
...
...
raaad
10 months, 3 weeks ago
Selected Answer: D
- Option D: Tailored handling and scheduling for each table; triggered by data arrival for more timely and efficient processing.
upvoted 2 times
...
scaenruy
10 months, 4 weeks ago
Selected Answer: D
D. 1. Create an Apache Airflow directed acyclic graph (DAG) in Cloud Composer with sequential tasks by using the Dataproc and BigQuery operators. 2. Create a separate DAG for each table that needs to go through the pipeline. 3. Use a Cloud Storage object trigger to launch a Cloud Function that triggers the DAG.
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...