exam questions

Exam AWS Certified Machine Learning - Specialty All Questions

View all questions & answers for the AWS Certified Machine Learning - Specialty exam

Exam AWS Certified Machine Learning - Specialty topic 1 question 225 discussion

A data scientist is training a large PyTorch model by using Amazon SageMaker. It takes 10 hours on average to train the model on GPU instances. The data scientist suspects that training is not converging and that resource utilization is not optimal.

What should the data scientist do to identify and address training issues with the LEAST development effort?

  • A. Use CPU utilization metrics that are captured in Amazon CloudWatch. Configure a CloudWatch alarm to stop the training job early if low CPU utilization occurs.
  • B. Use high-resolution custom metrics that are captured in Amazon CloudWatch. Configure an AWS Lambda function to analyze the metrics and to stop the training job early if issues are detected.
  • C. Use the SageMaker Debugger vanishing_gradient and LowGPUUtilization built-in rules to detect issues and to launch the StopTrainingJob action if issues are detected.
  • D. Use the SageMaker Debugger confusion and feature_importance_overweight built-in rules to detect issues and to launch the StopTrainingJob action if issues are detected.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Amit11011996
Highly Voted 1 year, 2 months ago
It has to be C.
upvoted 5 times
...
Mickey321
Most Recent 8 months, 1 week ago
Selected Answer: C
Option C uses the SageMaker Debugger vanishing_gradient and LowGPUUtilization built-in rules to detect issues and to launch the StopTrainingJob action if issues are detected. This option is the most efficient because it leverages the existing features of SageMaker Debugger to monitor and troubleshoot your training job without requiring any additional development effort. You can use the following steps to implement this option.
upvoted 1 times
...
oso0348
1 year, 1 month ago
Selected Answer: C
Answer is C The best option for the data scientist to identify and address training issues with the least development effort is option C: Use the SageMaker Debugger vanishing_gradient and LowGPUUtilization built-in rules to detect issues and to launch the StopTrainingJob action if issues are detected. SageMaker Debugger is a tool that helps to debug machine learning training processes. It provides several built-in rules that can detect and diagnose common issues that can occur during training. In this case, the data scientist suspects that the training is not converging and that resource utilization is not optimal. The vanishing_gradient and LowGPUUtilization rules can help to identify these issues.
upvoted 3 times
...
AjoseO
1 year, 2 months ago
Selected Answer: C
C. Use the SageMaker Debugger vanishing_gradient and LowGPUUtilization built-in rules to detect issues and to launch the StopTrainingJob action if issues are detected. The SageMaker Debugger is a built-in tool that helps with debugging and profiling machine learning models trained in SageMaker. In this scenario, the data scientist suspects that there are issues with the training process, so using the SageMaker Debugger is the most appropriate solution. The vanishing_gradient and LowGPUUtilization built-in rules can detect common training issues such as a vanishing gradient problem or low GPU utilization, which could affect the training convergence and resource utilization. By launching the StopTrainingJob action if issues are detected, the training job can be stopped early, which can help to save resources and time. This approach requires the least development effort, as it is built-in to SageMaker and does not require the data scientist to create custom metrics or configure CloudWatch alarms.
upvoted 3 times
...
Jerry84
1 year, 2 months ago
Selected Answer: C
https://docs.aws.amazon.com/sagemaker/latest/dg/debugger-built-in-rules.html
upvoted 4 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago