Exam AWS Certified Machine Learning - Specialty All Questions

View all questions & answers for the AWS Certified Machine Learning - Specialty exam

Exam AWS Certified Machine Learning - Specialty topic 1 question 235 discussion

Exam question from Amazon's AWS Certified Machine Learning - Specialty

Question #: 235
Topic #: 1

[All AWS Certified Machine Learning - Specialty Questions]

A company is training machine learning (ML) models on Amazon SageMaker by using 200 TB of data that is stored in Amazon S3 buckets. The training data consists of individual files that are each larger than 200 MB in size. The company needs a data access solution that offers the shortest processing time and the least amount of setup.

Which solution will meet these requirements?

A. Use File mode in SageMaker to copy the dataset from the S3 buckets to the ML instance storage.
B. Create an Amazon FSx for Lustre file system. Link the file system to the S3 buckets.
C. Create an Amazon Elastic File System (Amazon EFS) file system. Mount the file system to the training instances.
D. Use FastFile mode in SageMaker to stream the files on demand from the S3 buckets.

Show Suggested Answer

Suggested Answer: D 🗳️

by sevosevo at March 18, 2023, 2:16 p.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

mawsman

Highly Voted 1 year, 8 months ago

Selected Answer: D

When to use fast file mode: For larger datasets with larger files (more than 50 MB per file), the first option is to try fast file mode, which is more straightforward to use than FSx for Lustre because it doesn't require creating a file system, or connecting to a VPC. Fast file mode is ideal for large file containers (more than 150 MB), and might also do well with files more than 50 MB. https://docs.aws.amazon.com/sagemaker/latest/dg/model-access-training-data.html#model-access-training-data-best-practices

upvoted 13 times

...

Gmishra

Most Recent 9 months, 1 week ago

Selected Answer: D

https://aws.amazon.com/about-aws/whats-new/2021/10/amazon-sagemaker-fast-file-mode/

upvoted 1 times

...

endeesa

1 year, 1 month ago

Selected Answer: D

Least setup is D, B could work but requires more setup!

upvoted 1 times

...

geoan13

1 year, 1 month ago

D Amazon SageMaker now supports Fast File Mode for accessing data in training jobs. This enables high performance data access by streaming directly from Amazon S3 with no code changes from the existing File Mode. For example, training a K-Means clustering model on a 100GB dataset took 28 minutes with File Mode but only 5 minutes with Fast File Mode (82% decrease). https://aws.amazon.com/about-aws/whats-new/2021/10/amazon-sagemaker-fast-file-mode/

upvoted 1 times

...

jopaca1216

1 year, 3 months ago

B. Yes Please, look this link (https://aws.amazon.com/blogs/aws/enhanced-amazon-s3-integration-for-amazon-fsx-for-lustre/)

upvoted 2 times

...

loict

1 year, 4 months ago

Selected Answer: D

A. NO - the files are too big and will fill the instance storage for no reason B. NO - Lustre create stripes for each file on different hard drives, maximizing throughput; our challenge is more about the volume of data to be made available on the training instance, not throughput C. NO - EFS support File semantic, but does not change any system property D. YES - FastFile allows training to start before the full file has been downloaded (like Pipe Mode) but does not require code change

upvoted 2 times

...

Mickey321

1 year, 4 months ago

Selected Answer: D

changing to D

upvoted 1 times

...

Mickey321

1 year, 4 months ago

Selected Answer: B

although D is very tempting but leaning towards B

upvoted 1 times

...

dkx

1 year, 8 months ago

Selected Answer: D

https://aws.amazon.com/blogs/machine-learning/ensure-efficient-compute-resources-on-amazon-sagemaker/

upvoted 2 times

...

daidaidai

1 year, 8 months ago

Selected Answer: D

When to use fast file mode For larger datasets with larger files (more than 50 MB per file), the first option is to try fast file mode, which is more straightforward to use than FSx for Lustre because it doesn't require creating a file system, or connecting to a VPC. Fast file mode is ideal for large file containers (more than 150 MB), and might also do well with files more than 50 MB. Because fast file mode provides a POSIX interface, it supports random reads (reading non-sequential byte-ranges). However, this is not the ideal use case, and your throughput might be lower than with the sequential reads. However, if you have a relatively large and computationally intensive ML model, fast file mode might still be able to saturate the effective bandwidth of the training pipeline and not result in an IO bottleneck.

upvoted 2 times

...

Mllb

1 year, 9 months ago

Selected Answer: B

Option D, FastFile mode, streams files on demand from S3 buckets to the training instance, which can be efficient for small datasets but may not be optimal for large datasets. Moreover, this solution does not provide a file system that is optimized for high performance, and it may require additional development effort to set up

upvoted 3 times

Mllb

1 year, 9 months ago

B because we have 200TB https://saturncloud.io/blog/using-aws-sagemaker-input-modes-amazon-s3-efs-or-fsx/

upvoted 2 times

...

blanco750

1 year, 9 months ago

Selected Answer: D

Fast File Mode combines the ease of use of the existing File Mode with the performance of Pipe Mode. This provides convenient access to data as if it was downloaded locally, while offering the performance benefit of streaming the data directly from Amazon S3. No code change required or no lengthy setup

upvoted 4 times

...

oso0348

1 year, 9 months ago

Selected Answer: B

The solution that meets the requirements of the company is B, which involves creating an Amazon FSx for Lustre file system and linking it to the S3 buckets. Amazon FSx for Lustre is a fully managed, high-performance file system optimized for compute-intensive workloads, such as machine learning training. It is designed to provide low latencies and high throughput for processing large data sets, and it can directly access data from S3 buckets without any data movement or copying. This solution requires minimal setup and provides the shortest processing time since the data can be accessed in parallel by multiple instances.

upvoted 4 times

...

austinoy

1 year, 9 months ago

I will go with D https://sagemaker.readthedocs.io/en/stable/api/utility/inputs.html

upvoted 3 times

...

sevosevo

1 year, 9 months ago

Selected Answer: B

https://aws.amazon.com/blogs/machine-learning/speed-up-training-on-amazon-sagemaker-using-amazon-efs-or-amazon-fsx-for-lustre-file-systems/

upvoted 3 times

...