A Data Engineer needs to build a model using a dataset containing customer credit card information How can the Data Engineer ensure the data remains encrypted and the credit card information is secure?
A.
Use a custom encryption algorithm to encrypt the data and store the data on an Amazon SageMaker instance in a VPC. Use the SageMaker DeepAR algorithm to randomize the credit card numbers.
B.
Use an IAM policy to encrypt the data on the Amazon S3 bucket and Amazon Kinesis to automatically discard credit card numbers and insert fake credit card numbers.
C.
Use an Amazon SageMaker launch configuration to encrypt the data once it is copied to the SageMaker instance in a VPC. Use the SageMaker principal component analysis (PCA) algorithm to reduce the length of the credit card numbers.
D.
Use AWS KMS to encrypt the data on Amazon S3 and Amazon SageMaker, and redact the credit card numbers from the customer data with AWS Glue.
https://aws.amazon.com/blogs/big-data/detect-and-process-sensitive-data-using-aws-glue-studio/
AWS Glue can be used for detecting and processing sensitive data.
Use AWS KMS for encryption and AWS Glue to redact credit card numbers
Reasoning:
AWS KMS (Key Management Service) encrypts data at rest in Amazon S3 and during processing in Amazon SageMaker.
AWS Glue can be used to redact sensitive data before processing, ensuring that credit card numbers are removed from datasets before being used for ML.
Complies with PCI DSS requirements for handling payment information securely.
The reason for this choice is that AWS KMS is a service that allows you to easily create and manage encryption keys and control the use of encryption across a wide range of AWS services and in your applications1. By using AWS KMS, you can encrypt the data on Amazon S3, which is a durable, scalable, and secure object storage service2, and on Amazon SageMaker, which is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy machine learning models quickly3. This way, you can protect the data at rest and in transit.
IMHO, the problem with the question is that it is not clear whether the credit card number is used in the model. In that case discarding is never a good option. Hashing should be a safe option to keep it in the learning path
It's gotta be D but C is a clever fake answer. Use PCA to reduce the length of the credit card number? That's a clever joke, as if reducing the length of a character string is the same as reducing dimensionality in a feature set.
A voting comment increases the vote count for the chosen answer by one.
Upvoting a comment with a selected answer will also increase the vote count towards that answer by one.
So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.
vetal
Highly Voted 3 years, 7 months agoWWODIN
3 years, 7 months agozzeng
3 years, 6 months agoAntriksh
3 years, 6 months agocybe001
Highly Voted 3 years, 7 months agoGanshank
Most Recent 2 months agoJonSno
2 months, 1 week agoMickey321
7 months agoloict
1 year, 7 months agoVenkatesh_Babu
1 year, 9 months agoValcilio
2 years, 1 month agoystotest
2 years, 5 months agojerto97
3 years, 5 months agocloud_trail
3 years, 6 months agocnethers
3 years, 6 months agocloud_trail
3 years, 5 months agosyu31svc
3 years, 6 months agoroytruong
3 years, 6 months agoPRC
3 years, 6 months agoAKT
3 years, 6 months agobhavesh0124
3 years, 6 months ago