Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 159 discussion

Actual exam question from Google's Professional Machine Learning Engineer

Question #: 159
Topic #: 1

[All Professional Machine Learning Engineer Questions]

Your organization manages an online message board. A few months ago, you discovered an increase in toxic language and bullying on the message board. You deployed an automated text classifier that flags certain comments as toxic or harmful. Now some users are reporting that benign comments referencing their religion are being misclassified as abusive. Upon further inspection, you find that your classifier's false positive rate is higher for comments that reference certain underrepresented religious groups. Your team has a limited budget and is already overextended. What should you do?

A. Add synthetic training data where those phrases are used in non-toxic ways.
B. Remove the model and replace it with human moderation.
C. Replace your model with a different text classifier.
D. Raise the threshold for comments to be considered toxic or harmful.

Show Suggested Answer

Suggested Answer: A 🗳️

by PST21 at July 20, 2023, 4:10 p.m.

Comments

Submit Cancel

Omi_04040

1 week, 4 days ago

Selected Answer: A

Main issue is to address the bias in the model

upvoted 1 times

...

Laur_C

1 week, 4 days ago

Selected Answer: A

I chose A - least expensive/time consuming way to actually solve the problem. D sacrifices model quality and does not change the inherent bias of the model, meaning that the biases would still remain if this solution was chosen. Does not seem like the ethical/best practices solution

upvoted 2 times

...

rajshiv

2 weeks, 5 days ago

Selected Answer: A

D is not a good answer. Raising the threshold would reduce the number of toxic comments flagged (perhaps lowering false positives), but it would also increase the number of actual toxic comments being missed (higher false negatives). This exacerbates the problem and do not address the bias in the model. I think A is the best answer.

upvoted 2 times

...

f084277

1 month, 1 week ago

Selected Answer: D

The answer is D. Of course A would be ideal, but it totally ignores the constraints presented in the question that your team is already overextended.

upvoted 2 times

...

Dirtie_Sinkie

3 months, 1 week ago

Selected Answer: A

Gonna go with A on this one. Some toxic comments will still make it through if you choose D, whereas A addresses the problem fully and directly. Therefore I think A is a more complete answer than D.

upvoted 1 times

Dirtie_Sinkie

3 months, 1 week ago

Even though in the question it says "Your team has a limited budget and is already overextended" I still think A is the better answer because it doesn't take much effort to create synthetic data and add it to train. The outcome will be more accurate than D.

upvoted 2 times

f084277

1 month, 1 week ago

Of course A is "better", but it ignores the constraints of the question and is therefore wrong.

upvoted 1 times

...

baimus

3 months, 1 week ago

Selected Answer: A

A is better than D, because D means that more geniunely toxic comments will make it through. A will teach the model to acknowledge the small subset of mislabelled comments, without exposing the customers to additional toxicity.

upvoted 1 times

...

AzureDP900

6 months ago

option A (Add synthetic training data where those phrases are used in non-toxic ways) directly addresses the specific issue of bias and improves the model's accuracy by providing more contextually relevant training examples. This approach is more targeted and has a lower risk of introducing new biases or negatively impacting other aspects of comment moderation. I hope this additional explanation helps clarify why option D might not be the best choice in this scenario!

upvoted 2 times

AzureDP900

6 months ago

Raising the threshold would mean increasing the minimum score required for a comment to be classified as toxic or harmful. This could potentially reduce the number of false positives (benign comments being misclassified as toxic) by making it harder for the model to classify a comment as toxic.

upvoted 1 times

...

Simple_shreedhar

6 months, 4 weeks ago

A option directly addresses the bias issue without incurring significant ongoing costs or burdening the moderation team. By augmenting the training dataset with synthetic examples where phrases related to underrepresented religious groups are used in non-toxic ways, the classifier can learn to distinguish between toxic and benign comments more accurately.

upvoted 2 times

...

gscharly

8 months ago

Selected Answer: D

agree with daidai75

upvoted 1 times

...

pinimichele01

8 months, 2 weeks ago

Selected Answer: D

Your team has a limited budget and is already overextended

upvoted 2 times

...

7cb0ab3

8 months, 2 weeks ago

Selected Answer: A

I went fo A because it directly tackels the issue of misclassification and improving the models unterstanding of religious references. B and C don't make sense. D would generally reduce the number of comments flagged as toxic, which could decrease the false positive rate. However, this approach risks allowing genuinely harmful comments to go unflagged. It addresses the symptom (high false positive rate) rather than the underlying cause

upvoted 2 times

...

edoo

9 months, 3 weeks ago

Selected Answer: A

B and C are non sense, I don't want to risk potentially increasing the FNR by reducing the FPR (Raise the threshold). Thus A.

upvoted 1 times

...

daidai75

10 months, 3 weeks ago

Selected Answer: D

Your team has a limited budget and is already overextended, that means the re-training is hardly possible.

upvoted 2 times

...

tavva_prudhvi

1 year, 5 months ago

In the long run, usually we go with A, but Option D could be a temporary solution to reduce false positives, while being aware that it may allow some genuinely toxic comments to go unnoticed. However, this may be a necessary trade-off until your team has the resources to improve the classifier or find a better solution.

upvoted 1 times

...

powerby35

1 year, 5 months ago

Selected Answer: D

"Your team has a limited budget and is already overextended"

upvoted 2 times

...

[Removed]

1 year, 5 months ago

Selected Answer: D

By raising the threshold for comments to be considered toxic or harmful, you will decrease the number of false positives. B is wrong because we are taking a Google MLE exam :) A and C are wrong because both of them involve a good amount of additional work, either for extending the dataset or training/experimenting with a new model. Considering your team is already over the budget and has too many tasks on their plate (overextended), these two options are not available for you.

upvoted 2 times

tavva_prudhvi

1 year, 1 month ago

But, by raising the threshold, we might be allowing some genuinely toxic comments to pass through without being flagged. This could potentially lead to an increase in the false negative rate, right?

upvoted 1 times

...

PST21

1 year, 5 months ago

Selected Answer: A

A. Add synthetic training data where those phrases are used in non-toxic ways. In this situation, where your automated text classifier is misclassifying benign comments referencing certain underrepresented religious groups as toxic or harmful, adding synthetic training data where those phrases are used in non-toxic ways can be a cost-effective solution to improve the model's performance.

upvoted 1 times

...

Exam Professional Machine Learning Engineer All Questions

View all questions & answers for the Professional Machine Learning Engineer exam

Exam Professional Machine Learning Engineer topic 1 question 159 discussion

Comments

Omi_04040

Laur_C

rajshiv

f084277

Dirtie_Sinkie

Dirtie_Sinkie

f084277

baimus

AzureDP900

AzureDP900

Simple_shreedhar

gscharly

pinimichele01

7cb0ab3

edoo

daidai75

tavva_prudhvi

powerby35

[Removed]

tavva_prudhvi

PST21

SY0-701