exam questions

Exam Professional Data Engineer All Questions

View all questions & answers for the Professional Data Engineer exam

Exam Professional Data Engineer topic 1 question 271 discussion

Actual exam question from Google's Professional Data Engineer
Question #: 271
Topic #: 1
[All Professional Data Engineer Questions]

You are monitoring your organization’s data lake hosted on BigQuery. The ingestion pipelines read data from Pub/Sub and write the data into tables on BigQuery. After a new version of the ingestion pipelines is deployed, the daily stored data increased by 50%. The volumes of data in Pub/Sub remained the same and only some tables had their daily partition data size doubled. You need to investigate and fix the cause of the data increase. What should you do?

  • A. 1. Check for duplicate rows in the BigQuery tables that have the daily partition data size doubled.
    2. Schedule daily SQL jobs to deduplicate the affected tables.
    3. Share the deduplication script with the other operational teams to reuse if this occurs to other tables.
  • B. 1. Check for code errors in the deployed pipelines.
    2. Check for multiple writing to pipeline BigQuery sink.
    3. Check for errors in Cloud Logging during the day of the release of the new pipelines.
    4. If no errors, restore the BigQuery tables to their content before the last release by using time travel.
  • C. 1. Check for duplicate rows in the BigQuery tables that have the daily partition data size doubled.
    2. Check the BigQuery Audit logs to find job IDs.
    3. Use Cloud Monitoring to determine when the identified Dataflow jobs started and the pipeline code version.
    4. When more than one pipeline ingests data into a table, stop all versions except the latest one.
  • D. 1. Roll back the last deployment.
    2. Restore the BigQuery tables to their content before the last release by using time travel.
    3. Restart the Dataflow jobs and replay the messages by seeking the subscription to the timestamp of the release.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
raaad
Highly Voted 1 year, 1 month ago
Selected Answer: C
- Detailed Investigation of Logs and Jobs Checking for duplicate rows targets the potential immediate cause of the issue. - Checking the BigQuery Audit logs helps identify which jobs might be contributing to the increased data volume. - Using Cloud Monitoring to correlate job starts with pipeline versions helps identify if a specific version of the pipeline is responsible. - Managing multiple versions of pipelines ensures that only the intended version is active, addressing any versioning errors that might have occurred during deployment. ======= Why not B While it addresses the symptom (excess data), it doesn't necessarily stop the problem from recurring. (The questions asked to investigate and fix)
upvoted 12 times
...
mi_yulai
Most Recent 3 months, 1 week ago
Why not D?
upvoted 1 times
...
SamuelTsch
3 months, 1 week ago
Selected Answer: B
No idea which one to choose. Option C miss a step - to restore the tables.
upvoted 2 times
Ryannn23
1 week, 6 days ago
" You need to investigate and fix the cause of the data increase. " - fixing the target tables was not required.
upvoted 1 times
...
...
Matt_108
1 year ago
Selected Answer: C
Option C - agree with Raaad on the reasons
upvoted 1 times
...
task_7
1 year ago
Selected Answer: B
B. Check for code errors in the deployed pipelines, multiple writing to pipeline BigQuery sink, errors in Cloud Logging, and if necessary, restore tables using time travel. Check for code errors Check for multiple writes Check Cloud Logging Restore tables if necessary:
upvoted 2 times
RenePetersen
11 months, 4 weeks ago
This does not fix the error, it basically assumes that the error is not really there.
upvoted 3 times
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago