, the best option is B. Upload a .zip file that contains a collection of audio files in the .wav format and a corresponding text transcript file. This method provides a balance of audio quality (with .wav files) and organization (having audio and transcripts together), which is essential for efficient and accurate training of speech recognition models.
B is the answer.
https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-custom-voice-training-data#types-of-training-data
A voice training dataset includes audio recordings, and a text file with the associated transcriptions. Each audio file should contain a single utterance (a single sentence or a single turn for a dialog system), and be less than 15 seconds long.
- Individual utterances + matching transcript
A collection (.zip) of audio files (.wav) as individual utterances. Each audio file should be 15 seconds or less in length, paired with a formatted transcript (.txt).
B is correct answer.
https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-custom-speech-test-and-train
upvoted 1 times
...
Log in to ExamTopics
Sign in:
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.
Upvoting a comment with a selected answer will also increase the vote count towards that answer by one.
So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.
evangelist
Highly Voted 11 months, 2 weeks agoGvalli
Highly Voted 1 year agoreigenchimpo
Most Recent 7 months agoanto69
7 months, 1 week agonanaw770
7 months, 2 weeks agofunny_penguin
7 months, 2 weeks agoorionduo
1 year agordemontis
1 year, 2 months agoManvaIT
1 year, 3 months agoJDKJDKJDK
1 year, 3 months agozellck
1 year, 6 months agozellck
1 year, 6 months agordemontis
1 year, 2 months agoEltooth
2 years, 5 months ago