Exam Associate Data Practitioner topic 1 question 6 discussion

Actual exam question from Google's Associate Data Practitioner

Question #: 6
Topic #: 1

[All Associate Data Practitioner Questions]

You work for a healthcare company that has a large on-premises data system containing patient records with personally identifiable information (PII) such as names, addresses, and medical diagnoses. You need a standardized managed solution that de-identifies PII across all your data feeds prior to ingestion to Google Cloud. What should you do?

A. Use Cloud Run functions to create a serverless data cleaning pipeline. Store the cleaned data in BigQuery.
B. Use Cloud Data Fusion to transform the data. Store the cleaned data in BigQuery.
C. Load the data into BigQuery, and inspect the data by using SQL queries. Use Dataflow to transform the data and remove any errors.
D. Use Apache Beam to read the data and perform the necessary cleaning and transformation operations. Store the cleaned data in BigQuery.

Show Suggested Answer

Suggested Answer: B 🗳️

by rich_maverick at Feb. 26, 2025, 7:37 p.m.

Comments

Submit Cancel

JAGLees

4 days, 21 hours ago

Selected Answer: B

A isnt right as Cloud Run isn't well suited to Data pipelines and generally isn't recommended. It *can* handle processing data, but not large volumes of batch data and isn't intended as an ETL tool. C isn't right because Ingesting first into BigQuery doesn't meet the requirement to cleansing BEFORE ingestion. D isn't right because Apache BEAM isn't a managed solution (although if the data had to be cleansed before leaving the on prem network it might be a suitable option to install on prem) Therefore B (Data Fusion) is the best opinion. If DataFlow was here, then that would also be a good choice - perhaps a better one - but without it, Fusion is the obvious winner.

upvoted 1 times

...

n2183712847

4 weeks ago

Selected Answer: B

The best option is B. Cloud Data Fusion. Option B is best because Data Fusion is a managed, visual, standardized data integration service ideal for building de-identification pipelines. Option A (Cloud Run functions) is incorrect because it requires more coding and is less inherently standardized for pipelines. Option C (Load to BigQuery first) is incorrect because it violates the requirement to de-identify before ingestion, creating a security risk. Option D (Apache Beam/Dataflow) is incorrect because while powerful, it's more code-centric and less of a pre-built managed solution compared to Data Fusion. Therefore, Option B, Cloud Data Fusion, is the best managed and standardized solution for pre-ingestion PII de-identification.

upvoted 1 times

...

rich_maverick

1 month ago

Selected Answer: B

Cloud Data Fusion can be used to Sensitive Protection Service to de-identify feeds. However, Cloud Dataflow (a Google managed version of Beam) is the more general approach being used. I am only selecting Data Fusion over Beam because it is a named Google service. Had they said Dataflow, I would have gone there instead.

upvoted 1 times

...

Exam Associate Data Practitioner All Questions

View all questions & answers for the Associate Data Practitioner exam

Exam Associate Data Practitioner topic 1 question 6 discussion

Comments

JAGLees

n2183712847

rich_maverick

SY0-701