exam questions

Exam AWS Certified Machine Learning - Specialty All Questions

View all questions & answers for the AWS Certified Machine Learning - Specialty exam

Exam AWS Certified Machine Learning - Specialty topic 1 question 235 discussion

A company is training machine learning (ML) models on Amazon SageMaker by using 200 TB of data that is stored in Amazon S3 buckets. The training data consists of individual files that are each larger than 200 MB in size. The company needs a data access solution that offers the shortest processing time and the least amount of setup.

Which solution will meet these requirements?

  • A. Use File mode in SageMaker to copy the dataset from the S3 buckets to the ML instance storage.
  • B. Create an Amazon FSx for Lustre file system. Link the file system to the S3 buckets.
  • C. Create an Amazon Elastic File System (Amazon EFS) file system. Mount the file system to the training instances.
  • D. Use FastFile mode in SageMaker to stream the files on demand from the S3 buckets.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
mawsman
Highly Voted 1 year, 6 months ago
Selected Answer: D
When to use fast file mode: For larger datasets with larger files (more than 50 MB per file), the first option is to try fast file mode, which is more straightforward to use than FSx for Lustre because it doesn't require creating a file system, or connecting to a VPC. Fast file mode is ideal for large file containers (more than 150 MB), and might also do well with files more than 50 MB. https://docs.aws.amazon.com/sagemaker/latest/dg/model-access-training-data.html#model-access-training-data-best-practices
upvoted 13 times
...
Gmishra
Most Recent 6 months, 3 weeks ago
Selected Answer: D
https://aws.amazon.com/about-aws/whats-new/2021/10/amazon-sagemaker-fast-file-mode/
upvoted 1 times
...
endeesa
11 months ago
Selected Answer: D
Least setup is D, B could work but requires more setup!
upvoted 1 times
...
geoan13
11 months, 2 weeks ago
D Amazon SageMaker now supports Fast File Mode for accessing data in training jobs. This enables high performance data access by streaming directly from Amazon S3 with no code changes from the existing File Mode. For example, training a K-Means clustering model on a 100GB dataset took 28 minutes with File Mode but only 5 minutes with Fast File Mode (82% decrease). https://aws.amazon.com/about-aws/whats-new/2021/10/amazon-sagemaker-fast-file-mode/
upvoted 1 times
...
jopaca1216
1 year, 1 month ago
B. Yes Please, look this link (https://aws.amazon.com/blogs/aws/enhanced-amazon-s3-integration-for-amazon-fsx-for-lustre/)
upvoted 1 times
...
loict
1 year, 1 month ago
Selected Answer: D
A. NO - the files are too big and will fill the instance storage for no reason B. NO - Lustre create stripes for each file on different hard drives, maximizing throughput; our challenge is more about the volume of data to be made available on the training instance, not throughput C. NO - EFS support File semantic, but does not change any system property D. YES - FastFile allows training to start before the full file has been downloaded (like Pipe Mode) but does not require code change
upvoted 2 times
...
Mickey321
1 year, 2 months ago
Selected Answer: D
changing to D
upvoted 1 times
...
Mickey321
1 year, 2 months ago
Selected Answer: B
although D is very tempting but leaning towards B
upvoted 1 times
...
dkx
1 year, 5 months ago
Selected Answer: D
https://aws.amazon.com/blogs/machine-learning/ensure-efficient-compute-resources-on-amazon-sagemaker/
upvoted 2 times
...
daidaidai
1 year, 5 months ago
Selected Answer: D
When to use fast file mode For larger datasets with larger files (more than 50 MB per file), the first option is to try fast file mode, which is more straightforward to use than FSx for Lustre because it doesn't require creating a file system, or connecting to a VPC. Fast file mode is ideal for large file containers (more than 150 MB), and might also do well with files more than 50 MB. Because fast file mode provides a POSIX interface, it supports random reads (reading non-sequential byte-ranges). However, this is not the ideal use case, and your throughput might be lower than with the sequential reads. However, if you have a relatively large and computationally intensive ML model, fast file mode might still be able to saturate the effective bandwidth of the training pipeline and not result in an IO bottleneck.
upvoted 2 times
...
Mllb
1 year, 7 months ago
Selected Answer: B
Option D, FastFile mode, streams files on demand from S3 buckets to the training instance, which can be efficient for small datasets but may not be optimal for large datasets. Moreover, this solution does not provide a file system that is optimized for high performance, and it may require additional development effort to set up
upvoted 3 times
Mllb
1 year, 6 months ago
B because we have 200TB https://saturncloud.io/blog/using-aws-sagemaker-input-modes-amazon-s3-efs-or-fsx/
upvoted 2 times
...
...
blanco750
1 year, 7 months ago
Selected Answer: D
Fast File Mode combines the ease of use of the existing File Mode with the performance of Pipe Mode. This provides convenient access to data as if it was downloaded locally, while offering the performance benefit of streaming the data directly from Amazon S3. No code change required or no lengthy setup
upvoted 4 times
...
oso0348
1 year, 7 months ago
Selected Answer: B
The solution that meets the requirements of the company is B, which involves creating an Amazon FSx for Lustre file system and linking it to the S3 buckets. Amazon FSx for Lustre is a fully managed, high-performance file system optimized for compute-intensive workloads, such as machine learning training. It is designed to provide low latencies and high throughput for processing large data sets, and it can directly access data from S3 buckets without any data movement or copying. This solution requires minimal setup and provides the shortest processing time since the data can be accessed in parallel by multiple instances.
upvoted 4 times
...
austinoy
1 year, 7 months ago
I will go with D https://sagemaker.readthedocs.io/en/stable/api/utility/inputs.html
upvoted 3 times
...
sevosevo
1 year, 7 months ago
Selected Answer: B
https://aws.amazon.com/blogs/machine-learning/speed-up-training-on-amazon-sagemaker-using-amazon-efs-or-amazon-fsx-for-lustre-file-systems/
upvoted 3 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago