A company has fine-tuned a large language model (LLM) to answer questions for a help desk. The company wants to determine if the fine-tuning has enhanced the model's accuracy.
Which metric should the company use for the evaluation?
C: F1 score
Explanation:
The F1 score is a balanced metric that combines precision and recall to evaluate the accuracy of a model, particularly in scenarios like question-answering, where both correctness (precision) and completeness (recall) matter. The F1 score is particularly useful when there is an uneven distribution of classes or when the model's ability to retrieve relevant and accurate answers is being assessed.
The F1 score provides a balanced evaluation of the model's ability to give both relevant and accurate answers, making it the most suitable metric for assessing the fine-tuned model’s performance in answering help desk questions.
F1 score is a metric that combines precision and recall to evaluate the balance between correctly identified outputs and missed or irrelevant outputs. It is particularly useful for tasks like question answering, where both accuracy and completeness are critical.
In this help desk scenario, the F1 score helps assess whether the model consistently provides correct and relevant answers to user queries, reflecting the effectiveness of fine-tuning.
upvoted 1 times
...
Log in to ExamTopics
Sign in:
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.
Upvoting a comment with a selected answer will also increase the vote count towards that answer by one.
So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.
Moon
1 month agomay2021_r
1 month, 1 week agoaws_Tamilan
1 month, 1 week agoap6491
1 month, 1 week ago