Exam AWS Certified Machine Learning - Specialty All Questions

View all questions & answers for the AWS Certified Machine Learning - Specialty exam

Exam AWS Certified Machine Learning - Specialty topic 1 question 35 discussion

Exam question from Amazon's AWS Certified Machine Learning - Specialty

Question #: 35
Topic #: 1

[All AWS Certified Machine Learning - Specialty Questions]

A Data Science team is designing a dataset repository where it will store a large amount of training data commonly used in its machine learning models. As Data
Scientists may create an arbitrary number of new datasets every day, the solution has to scale automatically and be cost-effective. Also, it must be possible to explore the data using SQL.
Which storage scheme is MOST adapted to this scenario?

A. Store datasets as files in Amazon S3.
B. Store datasets as files in an Amazon EBS volume attached to an Amazon EC2 instance.
C. Store datasets as tables in a multi-node Amazon Redshift cluster.
D. Store datasets as global tables in Amazon DynamoDB.

Show Suggested Answer

Suggested Answer: A 🗳️

by rsimham at Dec. 9, 2019, 9:04 p.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

rsimham

Highly Voted 3 years, 1 month ago

Ans: A (S3) is most cost effective

upvoted 15 times

...

sonalev419

Highly Voted 2 years, 12 months ago

A : S3 cost effective + athena ( not c redshift dont support unstructured data)

upvoted 7 times

...

JonSno

Most Recent 2 months, 1 week ago

Selected Answer: A

Amazon S3 (Simple Storage Service) is the best choice because it: Scales automatically to store an arbitrary number of datasets. Is cost-effective, as S3 charges only for storage used, unlike provisioned databases. Supports querying datasets with SQL using Amazon Athena. Is highly durable (99.999999999% durability) and optimized for large datasets. How It Works in This Scenario? Store datasets in S3 as files in Parquet, ORC, or CSV format. Use AWS Glue Data Catalog to create table metadata. Query the datasets using Amazon Athena (serverless SQL querying on S3). Automatically scale without worrying about storage limits.

upvoted 1 times

...

james2033

7 months, 3 weeks ago

Selected Answer: A

'cost effective' --> AWS S3

upvoted 1 times

...

loict

1 year, 1 month ago

Selected Answer: A

A. YES - S3 + Athena/Presto B. NO - no SQL support C. NO - expensive to scale D. NO - DynamoDB is NoSQL

upvoted 1 times

...

DavidRou

1 year, 1 month ago

Selected Answer: A

AWS S3 + Athena will do it

upvoted 1 times

...

AjoseO

1 year, 8 months ago

Selected Answer: A

The most appropriate storage scheme for this scenario is option A: Store datasets as files in Amazon S3. Amazon S3 is a highly scalable and cost-effective object storage service that can store a large amount of data. S3 can scale automatically to accommodate a large number of datasets, making it a good option for storing the training data used in machine learning models. Additionally, S3 supports SQL querying through Amazon Athena or Amazon Redshift Spectrum, allowing data scientists to easily explore the data.

upvoted 2 times

...

harmanbirstudy

3 years ago

"store a large amount of training data commonly used in its machine learning models".. well it cannot be anything other than S3. Athena can query S3 cataloged data with SQL commands. Anwser is A

upvoted 2 times

...

Stephen_C

3 years ago

Amazon Redshift is not cost-effective.

upvoted 1 times

...

syu31svc

3 years ago

I would say C https://docs.aws.amazon.com/redshift/latest/mgmt/working-with-clusters.html "For workloads that require ever-growing storage, managed storage lets you automatically scale your data warehouse storage capacity without adding and paying for additional nodes."

upvoted 3 times

HaiHN

3 years ago

Data warehouse is not needed. For exploring data using SQL, you can use Athena

upvoted 5 times

...

kwangje

3 years ago

Amazon Redshift is a fast, fully managed data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing Business Intelligence (BI) tools. It allows you to run complex analytic queries against petabytes of structured data using sophisticated query optimization, columnar storage on high-performance storage, and massively parallel query execution. Most results come back in seconds.

upvoted 1 times

...

roytruong

3 years ago

s3 is right

upvoted 1 times

...

cybe001

3 years, 1 month ago

A, S3 is most cost effective

upvoted 3 times

...

Exam AWS Certified Machine Learning - Specialty All Questions

View all questions & answers for the AWS Certified Machine Learning - Specialty exam

Exam AWS Certified Machine Learning - Specialty topic 1 question 35 discussion

Comments

rsimham

sonalev419

JonSno

james2033

loict

DavidRou

AjoseO

harmanbirstudy

Stephen_C

syu31svc

HaiHN

kwangje

roytruong

cybe001

SY0-701