Exam Associate Data Practitioner topic 1 question 3 discussion

Actual exam question from Google's Associate Data Practitioner

Question #: 3
Topic #: 1

[All Associate Data Practitioner Questions]

Your company is building a near real-time streaming pipeline to process JSON telemetry data from small appliances. You need to process messages arriving at a Pub/Sub topic, capitalize letters in the serial number field, and write results to BigQuery. You want to use a managed service and write a minimal amount of code for underlying transformations. What should you do?

A. Use a Pub/Sub to BigQuery subscription, write results directly to BigQuery, and schedule a transformation query to run every five minutes.
B. Use a Pub/Sub to Cloud Storage subscription, write a Cloud Run service that is triggered when objects arrive in the bucket, performs the transformations, and writes the results to BigQuery.
C. Use the “Pub/Sub to BigQuery” Dataflow template with a UDF, and write the results to BigQuery.
D. Use a Pub/Sub push subscription, write a Cloud Run service that accepts the messages, performs the transformations, and writes the results to BigQuery.

Show Suggested Answer

Suggested Answer: C 🗳️

by trashbox at Jan. 22, 2025, 1:56 p.m.

Comments

Submit Cancel

JAGLees

4 days, 21 hours ago

Selected Answer: C

Cloud Run is not minimal code (or recommended for Data Pipelines) A scheduled job is not "near realtime" So the answer is Dataflow with a UDF which gives a scalable managed solution with minimal code

upvoted 1 times

...

n2183712847

4 weeks ago

Selected Answer: C

The best option is C. Use the “Pub/Sub to BigQuery” Dataflow template with a UDF. Option C is best because Dataflow templates are managed, serverless, and designed for streaming Pub/Sub data to BigQuery. UDFs allow minimal code for transformations within the pipeline. Option A (Pub/Sub to BigQuery + scheduled query) is incorrect because scheduled queries are not real-time transformations. Option B (Pub/Sub to Cloud Storage + Cloud Run) is incorrect because it adds unnecessary complexity with Cloud Storage as an intermediary and is not truly streaming. Option D (Pub/Sub push + Cloud Run) is incorrect because while real-time, it requires more code in Cloud Run than using a Dataflow UDF and is less purpose-built for data pipelines than Dataflow. Therefore, Option C, Dataflow template with UDF, is the best balance of managed service, minimal code, and near real-time streaming.

upvoted 1 times

...

bc3f222

1 month ago

Selected Answer: A

Pub/Sub to BQ is now the recommended solution, no longer need dataflow

upvoted 1 times

...

rich_maverick

1 month ago

Selected Answer: C

I agree that C is the best answer. However, answer A is doable and is also low/no code and also considered acceptable.

upvoted 1 times

...

trashbox

2 months, 1 week ago

Selected Answer: C

A UDF of the Dataflow is a simpler coding option than a Cloud Run.

upvoted 1 times

...

Exam Associate Data Practitioner All Questions

View all questions & answers for the Associate Data Practitioner exam

Exam Associate Data Practitioner topic 1 question 3 discussion

Comments

JAGLees

n2183712847

bc3f222

rich_maverick

trashbox

SY0-701