exam questions

Exam AWS Certified Data Engineer - Associate DEA-C01 All Questions

View all questions & answers for the AWS Certified Data Engineer - Associate DEA-C01 exam

Exam AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 21 discussion

A data engineer must use AWS services to ingest a dataset into an Amazon S3 data lake. The data engineer profiles the dataset and discovers that the dataset contains personally identifiable information (PII). The data engineer must implement a solution to profile the dataset and obfuscate the PII.
Which solution will meet this requirement with the LEAST operational effort?

  • A. Use an Amazon Kinesis Data Firehose delivery stream to process the dataset. Create an AWS Lambda transform function to identify the PII. Use an AWS SDK to obfuscate the PII. Set the S3 data lake as the target for the delivery stream.
  • B. Use the Detect PII transform in AWS Glue Studio to identify the PII. Obfuscate the PII. Use an AWS Step Functions state machine to orchestrate a data pipeline to ingest the data into the S3 data lake.
  • C. Use the Detect PII transform in AWS Glue Studio to identify the PII. Create a rule in AWS Glue Data Quality to obfuscate the PII. Use an AWS Step Functions state machine to orchestrate a data pipeline to ingest the data into the S3 data lake.
  • D. Ingest the dataset into Amazon DynamoDB. Create an AWS Lambda function to identify and obfuscate the PII in the DynamoDB table and to transform the data. Use the same Lambda function to ingest the data into the S3 data lake.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
milofficial
Highly Voted 1 year, 1 month ago
Selected Answer: B
How does Data Quality obfuscate PII? You can do this directly in Glue Studio: https://docs.aws.amazon.com/glue/latest/dg/detect-PII.html
upvoted 12 times
Eleftheriia
4 months, 2 weeks ago
Yes, and regarding the "Create a rule in AWS Glue Data Quality to obfuscate the PII. " which is included in answer C, it cannot be done like this because in the aws glue console there is a section, "detect sensitive data" and then "types of sensitive information to detect". Therefore through this console you can obfuscate PII. Relevant tutorial: https://www.youtube.com/watch?v=-TZZBfcnxBw
upvoted 1 times
...
...
Khooks
Highly Voted 10 months ago
Selected Answer: B
Option C involves additional steps and complexity with creating rules in AWS Glue Data Quality, which adds more operational effort compared to directly using AWS Glue Studio's capabilities.
upvoted 5 times
...
Kalyso
Most Recent 2 weeks, 6 days ago
Selected Answer: B
Actually it is B. No need to create a rule in AWS Glue.
upvoted 1 times
...
plutonash
3 months, 1 week ago
Selected Answer: C
B. Use the Detect PII transform in AWS Glue Studio to identify the PII. Obfuscate the PII. Use an AWS Step Functions state machine to orchestrate a data pipeline to ingest the data into the S3 data lake. Detect PII transform only detects. Obfuscate the PII ok but how ? Answer C explain how
upvoted 1 times
...
Udyan
3 months, 1 week ago
Selected Answer: C
Why C is better than B: Obfuscation clarity: Option C explicitly mentions using a Glue Data Quality rule to obfuscate PII, while option B does not specify how obfuscation is implemented. Accuracy: Glue Data Quality provides a more structured way to handle obfuscation compared to relying solely on Glue Studio's PII detection. Thus, C is the most accurate and operationally efficient solution.
upvoted 1 times
...
markill123
7 months, 1 week ago
The keyt
upvoted 1 times
...
antun3ra
8 months, 2 weeks ago
Selected Answer: B
B provides a streamlined, mostly visual approach using purpose-built tools for data processing and PII handling, making it the solution with the least operational effort.
upvoted 2 times
...
portland
8 months, 3 weeks ago
Selected Answer: C
https://aws.amazon.com/blogs/big-data/automated-data-governance-with-aws-glue-data-quality-sensitive-data-detection-and-aws-lake-formation/
upvoted 1 times
portland
8 months, 3 weeks ago
Actually it is B
upvoted 1 times
...
...
qwertyuio
9 months, 1 week ago
Selected Answer: B
https://docs.aws.amazon.com/glue/latest/dg/detect-PII.html
upvoted 2 times
...
bakarys
9 months, 3 weeks ago
Selected Answer: C
anwser is C
upvoted 1 times
...
bigfoot1501
10 months, 1 week ago
I don't think we need to use much more services to fulfill these requirements. Just AWS Glue is enough, it can detect and obfuscate PII data already. Source: https://docs.aws.amazon.com/glue/latest/dg/detect-PII.html#choose-action-pii
upvoted 3 times
...
VerRi
11 months ago
Selected Answer: C
We cannot directly handle PII with Glue Studio, and Glue Data Quality can be used to handle PII.
upvoted 3 times
...
Just_Ninja
11 months, 2 weeks ago
Selected Answer: A
A very easy was is to use the SDK to identify PII. https://docs.aws.amazon.com/code-library/latest/ug/comprehend_example_comprehend_DetectPiiEntities_section.html
upvoted 1 times
...
kairosfc
11 months, 2 weeks ago
Selected Answer: C
The transform Detect PII in AWS Glue Studio is specifically used to identify personally identifiable information (PII) within the data. It can detect and flag this information, but on its own, it does not perform the obfuscation or removal of these details. To effectively obfuscate or alter the identified PII, an additional transformation would be necessary. This could be accomplished in several ways, such as: Writing a custom script within the same AWS Glue job using Python or Scala to modify the PII data as needed. Using AWS Glue Data Quality, if available, to create rules that automatically obfuscate or modify the data identified as PII. AWS Glue Data Quality is a newer tool that helps improve data quality through rules and transformations, but whether it's needed will depend on the functionality's availability and the specificity of the obfuscation requirements
upvoted 3 times
...
okechi
1 year ago
Answer is option C. Period
upvoted 2 times
...
arvehisa
1 year ago
Selected Answer: B
B is correct. C: glue data quality cannot obfuscate the PII D: need to write code but the question is the "LEAST operational effort"
upvoted 4 times
...
certplan
1 year, 1 month ago
In python --- from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from awsglue.context import GlueContext from pyspark.sql import SparkSession # Initialize Spark session spark = SparkSession.builder \ .appName("Example Glue Job") \ .getOrCreate() # Initialize Glue context glueContext = GlueContext(SparkContext.getOrCreate()) # Retrieve Glue job arguments args = getResolvedOptions(sys.argv, ['JOB_NAME']) # Define your EMR step emr_step = [ { "Name": "My EMR Step", "ActionOnFailure": "CONTINUE", "HadoopJarStep": { "Jar": "s3://your-bucket/emr-scripts/your_script.jar", "Args": [ "arg1", "arg2" ] } } ] # Execute the EMR step response = glueContext.start_job_run(args['JOB_NAME'], job_run_args={'--extra-py-files': 'your_script.py'}) print(response)
upvoted 2 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago