exam questions

Exam AWS Certified AI Practitioner AIF-C01 All Questions

View all questions & answers for the AWS Certified AI Practitioner AIF-C01 exam

Exam AWS Certified AI Practitioner AIF-C01 topic 1 question 106 discussion

A company is introducing a mobile app that helps users learn foreign languages. The app makes text more coherent by calling a large language model (LLM). The company collected a diverse dataset of text and supplemented the dataset with examples of more readable versions. The company wants the LLM output to resemble the provided examples.

Which metric should the company use to assess whether the LLM meets these requirements?

  • A. Value of the loss function
  • B. Semantic robustness
  • C. Recall-Oriented Understudy for Gisting Evaluation (ROUGE) score
  • D. Latency of the text generation
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Jessiii
2 weeks, 6 days ago
Selected Answer: C
The ROUGE (Recall-Oriented Understudy for Gisting Evaluation) score is widely used to measure the similarity between generated text and a set of reference texts. Since the company wants the LLM's output to resemble the provided readable examples, ROUGE is the most appropriate metric. ROUGE compares the LLM-generated text with the human-provided reference texts by evaluating n-gram overlap, precision, recall, and F1 score, making it a great choice for text coherence and readability assessment.
upvoted 2 times
...
may2021_r
2 months ago
Selected Answer: C
The correct answer is C. ROUGE score measures how well generated text matches reference examples.
upvoted 1 times
...
aws_Tamilan
2 months ago
Selected Answer: C
Since the company wants the LLM output to resemble the provided examples in terms of coherence and readability, ROUGE score is the best metric for this evaluation.
upvoted 1 times
...
26b8fe1
2 months, 1 week ago
Selected Answer: C
he most suitable metric to assess whether the LLM output resembles the provided examples of more readable text is: C. Recall-Oriented Understudy for Gisting Evaluation (ROUGE) score The ROUGE score is commonly used for evaluating the quality of text summarization and machine-generated text by comparing it to a set of reference texts. It measures how well the generated text matches the provided examples in terms of content and coherence. Specifically, ROUGE scores focus on the overlap of n-grams, word sequences, and word pairs between the generated text and the reference texts, making it ideal for this use case.
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago