exam questions

Exam Associate Data Practitioner All Questions

View all questions & answers for the Associate Data Practitioner exam

Exam Associate Data Practitioner topic 1 question 6 discussion

Actual exam question from Google's Associate Data Practitioner
Question #: 6
Topic #: 1
[All Associate Data Practitioner Questions]

You work for a healthcare company that has a large on-premises data system containing patient records with personally identifiable information (PII) such as names, addresses, and medical diagnoses. You need a standardized managed solution that de-identifies PII across all your data feeds prior to ingestion to Google Cloud. What should you do?

  • A. Use Cloud Run functions to create a serverless data cleaning pipeline. Store the cleaned data in BigQuery.
  • B. Use Cloud Data Fusion to transform the data. Store the cleaned data in BigQuery.
  • C. Load the data into BigQuery, and inspect the data by using SQL queries. Use Dataflow to transform the data and remove any errors.
  • D. Use Apache Beam to read the data and perform the necessary cleaning and transformation operations. Store the cleaned data in BigQuery.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
JAGLees
4 days, 21 hours ago
Selected Answer: B
A isnt right as Cloud Run isn't well suited to Data pipelines and generally isn't recommended. It *can* handle processing data, but not large volumes of batch data and isn't intended as an ETL tool. C isn't right because Ingesting first into BigQuery doesn't meet the requirement to cleansing BEFORE ingestion. D isn't right because Apache BEAM isn't a managed solution (although if the data had to be cleansed before leaving the on prem network it might be a suitable option to install on prem) Therefore B (Data Fusion) is the best opinion. If DataFlow was here, then that would also be a good choice - perhaps a better one - but without it, Fusion is the obvious winner.
upvoted 1 times
...
n2183712847
4 weeks ago
Selected Answer: B
The best option is B. Cloud Data Fusion. Option B is best because Data Fusion is a managed, visual, standardized data integration service ideal for building de-identification pipelines. Option A (Cloud Run functions) is incorrect because it requires more coding and is less inherently standardized for pipelines. Option C (Load to BigQuery first) is incorrect because it violates the requirement to de-identify before ingestion, creating a security risk. Option D (Apache Beam/Dataflow) is incorrect because while powerful, it's more code-centric and less of a pre-built managed solution compared to Data Fusion. Therefore, Option B, Cloud Data Fusion, is the best managed and standardized solution for pre-ingestion PII de-identification.
upvoted 1 times
...
rich_maverick
1 month ago
Selected Answer: B
Cloud Data Fusion can be used to Sensitive Protection Service to de-identify feeds. However, Cloud Dataflow (a Google managed version of Beam) is the more general approach being used. I am only selecting Data Fusion over Beam because it is a named Google service. Had they said Dataflow, I would have gone there instead.
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago