exam questions

Exam Associate Data Practitioner All Questions

View all questions & answers for the Associate Data Practitioner exam

Exam Associate Data Practitioner topic 1 question 3 discussion

Actual exam question from Google's Associate Data Practitioner
Question #: 3
Topic #: 1
[All Associate Data Practitioner Questions]

Your company is building a near real-time streaming pipeline to process JSON telemetry data from small appliances. You need to process messages arriving at a Pub/Sub topic, capitalize letters in the serial number field, and write results to BigQuery. You want to use a managed service and write a minimal amount of code for underlying transformations. What should you do?

  • A. Use a Pub/Sub to BigQuery subscription, write results directly to BigQuery, and schedule a transformation query to run every five minutes.
  • B. Use a Pub/Sub to Cloud Storage subscription, write a Cloud Run service that is triggered when objects arrive in the bucket, performs the transformations, and writes the results to BigQuery.
  • C. Use the “Pub/Sub to BigQuery” Dataflow template with a UDF, and write the results to BigQuery.
  • D. Use a Pub/Sub push subscription, write a Cloud Run service that accepts the messages, performs the transformations, and writes the results to BigQuery.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
JAGLees
4 days, 21 hours ago
Selected Answer: C
Cloud Run is not minimal code (or recommended for Data Pipelines) A scheduled job is not "near realtime" So the answer is Dataflow with a UDF which gives a scalable managed solution with minimal code
upvoted 1 times
...
n2183712847
4 weeks ago
Selected Answer: C
The best option is C. Use the “Pub/Sub to BigQuery” Dataflow template with a UDF. Option C is best because Dataflow templates are managed, serverless, and designed for streaming Pub/Sub data to BigQuery. UDFs allow minimal code for transformations within the pipeline. Option A (Pub/Sub to BigQuery + scheduled query) is incorrect because scheduled queries are not real-time transformations. Option B (Pub/Sub to Cloud Storage + Cloud Run) is incorrect because it adds unnecessary complexity with Cloud Storage as an intermediary and is not truly streaming. Option D (Pub/Sub push + Cloud Run) is incorrect because while real-time, it requires more code in Cloud Run than using a Dataflow UDF and is less purpose-built for data pipelines than Dataflow. Therefore, Option C, Dataflow template with UDF, is the best balance of managed service, minimal code, and near real-time streaming.
upvoted 1 times
...
bc3f222
1 month ago
Selected Answer: A
Pub/Sub to BQ is now the recommended solution, no longer need dataflow
upvoted 1 times
...
rich_maverick
1 month ago
Selected Answer: C
I agree that C is the best answer. However, answer A is doable and is also low/no code and also considered acceptable.
upvoted 1 times
...
trashbox
2 months, 1 week ago
Selected Answer: C
A UDF of the Dataflow is a simpler coding option than a Cloud Run.
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago