You need to specify the InputDataConfig, but it does not need to be "S3"
I think the reason why A and B are wrong, not because data location is not required, but because it doesn't need to be S3, it can be Amazon S3, EFS, or FSx location
From here https://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/API_CreateTrainingJob.html .. the only "Required: Yes" attributes are:
1. AlgorithmSpecification (in this TrainingInputMode is Required - i.e. File or Pipe)
2. OutputDataConfig (in this S3OutputPath is Required - where the model artifacts are stored)
3. ResourceConfig (in this EC2 InstanceType and VolumeSizeInGB are required)
4. RoleArn (..The Amazon Resource Name (ARN) of an IAM role that Amazon SageMaker can assume to perform tasks on your behalf...the caller of this API must have the iam:PassRole permission.)
5. StoppingCondition
6. TrainingJobName (The name of the training job. The name must be unique within an AWS Region in an AWS account.)
From the given options in the questions.. we have 2, 3, and 4 above. so, the answer is CEF.
This is the best explanation that CEF is the right answer, IMO. The document at that url is very informative. It also specifically states that InputDataConfig is NOT required. Having said that, I have no idea how the model will train if it doesn't know where to find the training data, but that is what the document says. If someone can explain that, I'd like to hear the explanation.
If I see this question on the actual exam, I'm going with AEF. The model absolutely must know where the training data is. I have seen other documentation that does confirm that you need the location of the input data, the compute instance and location to output the model artifacts.
but you also need to specify the service role sagemaker should use otherwise it will not be able to perform actions on your behalf like provisioning the training instances.
The question is asking about built in algorithms. It should be ADE. See https://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/API_CreateTrainingJob.html
for "3. ResourceConfig", only VolumeSizeInGB is required. So, it's not about the instance type.
Check: https://docs.aws.amazon.com/zh_tw/sagemaker/latest/APIReference/API_ResourceConfig.html
Options B, D, and E are important but not always mandatory for every training job. For example, validation data (Option B) is not always required, and hyperparameters (Option D) and instance types (Option E) can have default values or be optional depending on the specific algorithm and setup.
import boto3
import sagemaker
sess = sagemaker.Session()
# Example for the linear learner
linear = sagemaker.estimator.Estimator(
container,
role, # role (c)
instance_count=1,
instance_type="ml.c4.xlarge", # instance type (e)
output_path=output_location, # output path (f)
sagemaker_session=sess,
)
ANSWER IS CEF
Here from Amazon docs
InputDataConfig
An array of Channel objects. Each channel is a named input source. InputDataConfig describes the input data and its location.
Required: No
OutputDataConfig
Specifies the path to the S3 location where you want to store model artifacts. SageMaker creates subfolders for the artifacts.
Required: Yes
ResourceConfig - Identifies the resources, ML compute instances, and ML storage volumes to deploy for model training. In distributed training, you specify more than one instance.
Required: Yes
Based on https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html
Required parameters are:
- AlgorithmSpecification (registry path of the Docker image with the training algorithm)
- OutputDataConfig (path to the S3 location where you want to store model artifacts)
- ResourceConfig (resources, including the ML compute instances and ML storage volumes, to use for model training)
- RoleArn
- StoppingCondition (time limit for training job)
- TrainingJobName
Thus, the answer is: C E F
wording for option E is inaccurate "EC2 instance class specifying whether training will be run using CPU or GPU" but they do it on purpose
The input channel and output channel are mandatory, as the training job needs to know where to get the input data from and where to publish the model artifact. IAM role is also needed, for AWS services. others are not mandatory, validation channel is not mandatory for instance in case of unsupervised learning, likewise hyper params can be be auto tuned for as well as the ec2 instance types can be default ones that will be picked
As they narrowed it to S3, A is incorrect BUT when submitting Amazon SageMaker training jobs using one of the built-in algorithms, it is a MUST to identify the location of training data. While Amazon S3 is commonly used for storing training data, other sources like Docker containers, DynamoDB, or local disks of training instances can also be used. Therefore, specifying the location of training data is essential for SageMaker to know where to access the data during training.
So the right answer is CEF for me for this case... However if A was saying identify the location of training data, I think option A would be included in the MUST parameter.
InputDataConffig is optional in create_training_job.Please check thte parameters that are required.
So answer is CEF: https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html
InputDataConffig is optional in create_training_job.Please check thte parameters that are required.
So answer is SEF: https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html
E is not important, some models could simply work on the default of CPU.
A is a must and E is a must too.
C is important for permission handling on S3 etc.
It has to be A, C, F
A voting comment increases the vote count for the chosen answer by one.
Upvoting a comment with a selected answer will also increase the vote count towards that answer by one.
So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.
DonaldCMLIN
Highly Voted 3 years, 1 month agohamimelon
1 year, 10 months agoZSun
1 year, 7 months agoHaiHN
3 years agouninit
1 year, 9 months agomirik
1 year, 4 months agoVB
Highly Voted 3 years, 1 month agocloud_trail
3 years agocloud_trail
3 years agoCloudGuru_ZA
3 years agorafaelo
2 years, 11 months agoJK1977
1 year, 5 months agoOAmine
1 year, 1 month agoMultiCloudIronMan
Most Recent 2 weeks, 1 day agoMultiCloudIronMan
2 weeks, 3 days agoamlgeek
1 month, 1 week agokiran15789
2 months, 3 weeks agoML_2
3 months agoRathanKalluri
4 months agoninomfr64
5 months agorookiee1111
6 months, 2 weeks agoDenise123
6 months, 3 weeks agosachin80
6 months, 3 weeks agosachin80
6 months, 3 weeks agovkbajoria
7 months, 1 week agorav009
7 months, 3 weeks agovkbajoria
8 months, 3 weeks agoVR10
8 months, 4 weeks agoVR10
8 months, 4 weeks ago