Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.
exam questions

Exam Professional Data Engineer All Questions

View all questions & answers for the Professional Data Engineer exam

Exam Professional Data Engineer topic 1 question 237 discussion

Actual exam question from Google's Professional Data Engineer
Question #: 237
Topic #: 1
[All Professional Data Engineer Questions]

You are planning to load some of your existing on-premises data into BigQuery on Google Cloud. You want to either stream or batch-load data, depending on your use case. Additionally, you want to mask some sensitive data before loading into BigQuery. You need to do this in a programmatic way while keeping costs to a minimum. What should you do?

  • A. Use Cloud Data Fusion to design your pipeline, use the Cloud DLP plug-in to de-identify data within your pipeline, and then move the data into BigQuery.
  • B. Use the BigQuery Data Transfer Service to schedule your migration. After the data is populated in BigQuery, use the connection to the Cloud Data Loss Prevention (Cloud DLP) API to de-identify the necessary data.
  • C. Create your pipeline with Dataflow through the Apache Beam SDK for Python, customizing separate options within your code for streaming, batch processing, and Cloud DLP. Select BigQuery as your data sink.
  • D. Set up Datastream to replicate your on-premise data on BigQuery.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
raaad
Highly Voted 10 months, 3 weeks ago
Selected Answer: C
- Programmatic Flexibility: Apache Beam provides extensive control over pipeline design, allowing for customization of data transformations, including integration with Cloud DLP for sensitive data masking. - Streaming and Batch Support: Beam seamlessly supports both streaming and batch data processing modes, enabling flexibility in data loading patterns. - Cost-Effective Processing: Dataflow offers a serverless model, scaling resources as needed, and only charging for resources used, helping optimize costs. - Integration with Cloud DLP: Beam integrates well with Cloud DLP for sensitive data masking, ensuring data privacy before loading into BigQuery.
upvoted 10 times
qq589539483084gfrgrgfr
10 months, 2 weeks ago
In correct Option is A because you want a programatic way whereas datafusion is codeless solution and also dataflow is cost effective
upvoted 2 times
AllenChen123
10 months, 2 weeks ago
You are saying Option C
upvoted 2 times
...
...
...
JyoGCP
Most Recent 9 months, 1 week ago
Selected Answer: C
Option C
upvoted 1 times
...
tibuenoc
10 months ago
Selected Answer: C
C is correct. Using Dataflow as Python as programming and BQ as sink. A is incorrect - DataFusion is Code-free as the main propose
upvoted 2 times
...
scaenruy
10 months, 3 weeks ago
Selected Answer: A
A. Use Cloud Data Fusion to design your pipeline, use the Cloud DLP plug-in to de-identify data within your pipeline, and then move the data into BigQuery.
upvoted 1 times
ggg24
3 weeks, 4 days ago
Data Fusion support only Batch and Streaming is required
upvoted 1 times
...
chrissamharris
6 months, 1 week ago
Incorrect, that's a low-code solution. Doesnt meet this specific requirement: "You need to do this in a programmatic way"
upvoted 1 times
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...