exam questions

Exam AWS Certified Machine Learning - Specialty All Questions

View all questions & answers for the AWS Certified Machine Learning - Specialty exam

Exam AWS Certified Machine Learning - Specialty topic 1 question 35 discussion

A Data Science team is designing a dataset repository where it will store a large amount of training data commonly used in its machine learning models. As Data
Scientists may create an arbitrary number of new datasets every day, the solution has to scale automatically and be cost-effective. Also, it must be possible to explore the data using SQL.
Which storage scheme is MOST adapted to this scenario?

  • A. Store datasets as files in Amazon S3.
  • B. Store datasets as files in an Amazon EBS volume attached to an Amazon EC2 instance.
  • C. Store datasets as tables in a multi-node Amazon Redshift cluster.
  • D. Store datasets as global tables in Amazon DynamoDB.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
rsimham
Highly Voted 3 years, 1 month ago
Ans: A (S3) is most cost effective
upvoted 15 times
...
sonalev419
Highly Voted 2 years, 12 months ago
A : S3 cost effective + athena ( not c redshift dont support unstructured data)
upvoted 7 times
...
JonSno
Most Recent 2 months, 1 week ago
Selected Answer: A
Amazon S3 (Simple Storage Service) is the best choice because it: Scales automatically to store an arbitrary number of datasets. Is cost-effective, as S3 charges only for storage used, unlike provisioned databases. Supports querying datasets with SQL using Amazon Athena. Is highly durable (99.999999999% durability) and optimized for large datasets. How It Works in This Scenario? Store datasets in S3 as files in Parquet, ORC, or CSV format. Use AWS Glue Data Catalog to create table metadata. Query the datasets using Amazon Athena (serverless SQL querying on S3). Automatically scale without worrying about storage limits.
upvoted 1 times
...
james2033
7 months, 3 weeks ago
Selected Answer: A
'cost effective' --> AWS S3
upvoted 1 times
...
loict
1 year, 1 month ago
Selected Answer: A
A. YES - S3 + Athena/Presto B. NO - no SQL support C. NO - expensive to scale D. NO - DynamoDB is NoSQL
upvoted 1 times
...
DavidRou
1 year, 1 month ago
Selected Answer: A
AWS S3 + Athena will do it
upvoted 1 times
...
AjoseO
1 year, 8 months ago
Selected Answer: A
The most appropriate storage scheme for this scenario is option A: Store datasets as files in Amazon S3. Amazon S3 is a highly scalable and cost-effective object storage service that can store a large amount of data. S3 can scale automatically to accommodate a large number of datasets, making it a good option for storing the training data used in machine learning models. Additionally, S3 supports SQL querying through Amazon Athena or Amazon Redshift Spectrum, allowing data scientists to easily explore the data.
upvoted 2 times
...
harmanbirstudy
3 years ago
"store a large amount of training data commonly used in its machine learning models".. well it cannot be anything other than S3. Athena can query S3 cataloged data with SQL commands. Anwser is A
upvoted 2 times
...
Stephen_C
3 years ago
Amazon Redshift is not cost-effective.
upvoted 1 times
...
syu31svc
3 years ago
I would say C https://docs.aws.amazon.com/redshift/latest/mgmt/working-with-clusters.html "For workloads that require ever-growing storage, managed storage lets you automatically scale your data warehouse storage capacity without adding and paying for additional nodes."
upvoted 3 times
HaiHN
3 years ago
Data warehouse is not needed. For exploring data using SQL, you can use Athena
upvoted 5 times
...
kwangje
3 years ago
Amazon Redshift is a fast, fully managed data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing Business Intelligence (BI) tools. It allows you to run complex analytic queries against petabytes of structured data using sophisticated query optimization, columnar storage on high-performance storage, and massively parallel query execution. Most results come back in seconds.
upvoted 1 times
...
...
roytruong
3 years ago
s3 is right
upvoted 1 times
...
cybe001
3 years, 1 month ago
A, S3 is most cost effective
upvoted 3 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago