Exam AWS Certified Machine Learning - Specialty All Questions

View all questions & answers for the AWS Certified Machine Learning - Specialty exam

Exam AWS Certified Machine Learning - Specialty topic 1 question 206 discussion

Exam question from Amazon's AWS Certified Machine Learning - Specialty

Question #: 206
Topic #: 1

[All AWS Certified Machine Learning - Specialty Questions]

A company wants to deliver digital car management services to its customers. The company plans to analyze data to predict the likelihood of users changing cars. The company has 10 TB of data that is stored in an Amazon Redshift cluster. The company's data engineering team is using Amazon SageMaker Studio for data analysis and model development. Only a subset of the data is relevant for developing the machine learning models. The data engineering team needs a secure and cost-effective way to export the data to a data repository in Amazon S3 for model development.

Which solutions will meet these requirements? (Choose two.)

A. Launch multiple medium-sized instances in a distributed SageMaker Processing job. Use the prebuilt Docker images for Apache Spark to query and plot the relevant data and to export the relevant data from Amazon Redshift to Amazon S3.
B. Launch multiple medium-sized notebook instances with a PySpark kernel in distributed mode. Download the data from Amazon Redshift to the notebook cluster. Query and plot the relevant data. Export the relevant data from the notebook cluster to Amazon S3.
C. Use AWS Secrets Manager to store the Amazon Redshift credentials. From a SageMaker Studio notebook, use the stored credentials to connect to Amazon Redshift with a Python adapter. Use the Python client to query the relevant data and to export the relevant data from Amazon Redshift to Amazon S3.
D. Use AWS Secrets Manager to store the Amazon Redshift credentials. Launch a SageMaker extra-large notebook instance with block storage that is slightly larger than 10 TB. Use the stored credentials to connect to Amazon Redshift with a Python adapter. Download, query, and plot the relevant data. Export the relevant data from the local notebook drive to Amazon S3.
E. Use SageMaker Data Wrangler to query and plot the relevant data and to export the relevant data from Amazon Redshift to Amazon S3.

Show Suggested Answer

Suggested Answer: CE 🗳️

by VinceCar at Nov. 28, 2022, 1:52 p.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

solution123

Highly Voted 2 years, 2 months ago

Selected Answer: CE

CE Option A: Launching multiple medium-sized instances in a distributed SageMaker Processing job and using the prebuilt Docker images for Apache Spark to query and plot the relevant data is a possible solution, but it may not be the most cost-effective solution as it requires spinning up multiple instances. Option B: Launching multiple medium-sized notebook instances with a PySpark kernel in distributed mode is another solution, but it may not be the most secure solution as the data would be stored on the instances and not in a centralized data repository. Option D: Using AWS Secrets Manager to store the Amazon Redshift credentials and launching a SageMaker extra-large notebook instance is a solution, but the block storage requirement that is slightly larger than 10 TB could be costly and may not be necessary.

upvoted 7 times

...

VinceCar

Highly Voted 2 years, 4 months ago

Selected Answer: CE

C and E. No secure control is in option A.

upvoted 5 times

...

MultiCloudIronMan

Most Recent 6 months, 1 week ago

Selected Answer: CE

Option A can do it as well but could be expensive and not as easy as option E

upvoted 3 times

MultiCloudIronMan

6 months ago

Changed my mind to AC because Data Wrangler may struggle with 10TB. By using distributed SageMaker Processing jobs with Apache Spark and securely managing credentials with AWS Secrets Manager, the data engineering team can efficiently and securely export the relevant data from Amazon Redshift to Amazon S3

upvoted 1 times

...

endeesa

1 year, 4 months ago

Selected Answer: AE

As soon as I see, Download andpython client, I am worried about speed and efficiency. So I would say A and E

upvoted 4 times

...

loict

1 year, 7 months ago

Selected Answer: CE

A. NO - SageMaker Processing job is a self-contained feature using the sagemaker.processing API; it does not rely on invoking Spark directly B. NO - you want to identify the relevant slice of data without having to download everything first C. YES - minimize data movement D. NO - you want to identify the relevant slice of data without having to download everything first E. YES - built-in tool specifically designed for that use case

upvoted 1 times

...

Mickey321

1 year, 8 months ago

Selected Answer: CE

E for sure but was a bit confused on A or C but based on the link would go for C https://aws.amazon.com/blogs/big-data/using-the-amazon-redshift-data-api-to-interact-from-an-amazon-sagemaker-jupyter-notebook/

upvoted 1 times

...

drcok87

2 years, 2 months ago

e is obvious choice: c https://aws.amazon.com/blogs/big-data/using-the-amazon-redshift-data-api-to-interact-from-an-amazon-sagemaker-jupyter-notebook/

upvoted 2 times

...

BoroJohn

2 years, 4 months ago

C & E seems right - https://docs.aws.amazon.com/sagemaker/latest/dg/data-wrangler.html

upvoted 3 times

...

Exam AWS Certified Machine Learning - Specialty All Questions

View all questions & answers for the AWS Certified Machine Learning - Specialty exam

Exam AWS Certified Machine Learning - Specialty topic 1 question 206 discussion

Comments

solution123

VinceCar

MultiCloudIronMan

MultiCloudIronMan

endeesa

loict

Mickey321

drcok87

BoroJohn

SY0-701