exam questions

Exam AWS Certified Machine Learning - Specialty All Questions

View all questions & answers for the AWS Certified Machine Learning - Specialty exam

Exam AWS Certified Machine Learning - Specialty topic 1 question 207 discussion

A company is building an application that can predict spam email messages based on email text. The company can generate a few thousand human-labeled datasets that contain a list of email messages and a label of "spam" or "not spam" for each email message. A machine learning (ML) specialist wants to use transfer learning with a Bidirectional Encoder Representations from Transformers (BERT) model that is trained on English Wikipedia text data.

What should the ML specialist do to initialize the model to fine-tune the model with the custom data?

  • A. Initialize the model with pretrained weights in all layers except the last fully connected layer.
  • B. Initialize the model with pretrained weights in all layers. Stack a classifier on top of the first output position. Train the classifier with the labeled data.
  • C. Initialize the model with random weights in all layers. Replace the last fully connected layer with a classifier. Train the classifier with the labeled data.
  • D. Initialize the model with pretrained weights in all layers. Replace the last fully connected layer with a classifier. Train the classifier with the labeled data.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
2eb8df0
1 week ago
Selected Answer: B
Its B. The [CLS] token (first position) represents the embedding of the entire sentence. Doing classification over this token on top makes the most sense
upvoted 1 times
...
giustino98
4 months, 2 weeks ago
Selected Answer: B
I don't see why everyone is voting for D. To fine tune BERT you should add a classifier on top of the [CLS] token representing the hidden state. So it's not clear to me what does the question mean with "last fully connected layer"
upvoted 3 times
...
teka112233
6 months ago
Selected Answer: D
D is the right option since initializing the model with pretrained weights, the model can leverage the knowledge learned from a large corpus of text data, such as English Wikipedia text data, to improve its performance on a specific task, such as spam email classification . And Replacing the last fully connected layer with a classifier is necessary because the last layer of BERT is designed for predicting masked words in a sentence, which is different from the task of spam email classification
upvoted 2 times
...
loict
6 months, 1 week ago
Selected Answer: B
A. NO - the last fully connected layer will not do SoftMax classification B. YES - output of BERT (word embeddings) can be used as input of classification C. NO - random weights will discard previous transfer learning D. NO - we don't want to loose the word embeddings; "cut the head off" (replacing the last layer) is if we want to learn different classes than what the model was trained for, but here we want to augment
upvoted 1 times
teka112233
6 months ago
You should consider that Stacking a classifier on top of the first output position and training it with labeled data is not recommended because it does not take advantage of the knowledge learned from pretraining on a large corpus of text data
upvoted 3 times
...
...
Mickey321
7 months ago
Selected Answer: D
D although was leaning towards B
upvoted 1 times
Mickey321
7 months ago
on a second thought going for B
upvoted 2 times
...
...
kaike_reis
7 months, 1 week ago
Selected Answer: D
Cut the Head Off
upvoted 1 times
...
blanco750
1 year ago
Selected Answer: D
D seems correct
upvoted 1 times
...
rrshah83
1 year, 2 months ago
Selected Answer: D
D is a best practice
upvoted 4 times
...
BoroJohn
1 year, 3 months ago
Is B correct? - https://www.analyticsvidhya.com/blog/2020/07/transfer-learning-for-nlp-fine-tuning-bert-for-text-classification/ Freeze the entire architecture – We can even freeze all the layers of the model and attach a few neural network layers of our own and train this new model. Note that the weights of only the attached layers will be updated during model training.
upvoted 2 times
kaike_reis
7 months, 1 week ago
You would have two classifiers stacked, so your predictions would be based in the other classifier.
upvoted 1 times
...
...
dunhill
1 year, 3 months ago
I think the answer is D.
upvoted 4 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago