3Play Media Study Reveals Automatic Speech Recognition (ASR) Engines are Fine Tuning After a Year of Massive Improvement
June 20 2024 - 9:59AM
Business Wire
After a year of significant developments,
research finds Artificial Intelligence (AI) speech recognition
tools are honing in on differentiation, but human-in-the-loop
workflows remain critical for ASR captioning and transcription use
cases.
After a year of profound improvement in accuracy, ASR providers
are doubling down on improving the accuracy of their solutions and
focusing on their differentiation, according to the latest State of
ASR report by 3Play Media, the leading media accessibility provider
in North America, released today.
“The ASR market continues to evolve and is fiercely competitive.
It is clearly reaching a maturation stage in its evolution,” Josh
Miller, co-CEO and co-Founder, 3Play Media, said. “After a year of
revolutionary changes in the accuracy of the technology, the 2024
report finds vendors working on their differentiation based on
specific use cases and fine-tuning their technologies
accordingly.
“This year, it has become clear that not all errors are equal,
challenging the standalone metric of accuracy rate. Ultimately, ASR
alone is still insufficient for the captioning use case, especially
regarding formatting and hallucinations. Human-in-the-loop
captioning and transcription workflows remain critical for
accuracy, quality, and accessibility.”
The annual study analyzes the general state of speech-to-text
technology as it applies to the task of captioning and
transcription. In addition to a surge in new advancements, 2023
brought several new players, such as Assembly and Whisper, whose
ASR engines rivaled top competitors such as Speechmatics.
The new report investigates errors like hallucinations, where
the engine generates incorrect words not present in the input.
Whisper, a fast gainer in last year’s study, continues to be a
competitive engine, but its hallucinations remain a cause for
concern. These hallucinations appear more common than initially
believed, and the consequences for accessibility – and ultimately a
brand – are profound.
This year’s State of ASR report additionally highlights the need
for a more nuanced evaluation framework that considers factors like
Word Error Rate (WER), Formatted Error Rate (FER), and the Canadian
NER Model. The top engines were found to have different strengths
and weaknesses, and each prioritizes differing types of content or
styles of transcription.
To obtain a free copy of The 2024 State of ASR report, please
visit: https://go.3playmedia.com/rs-2024-asr.
About 3Play Media
3Play Media is an integrated media accessibility platform with
patented solutions for closed captioning, transcription, live
captioning, audio description, and subtitling. 3Play Media combines
machine learning (ML), artificial intelligence (AI), and automatic
speech recognition (ASR) with human review to provide innovative,
highly accurate services. Customers span multiple industries,
including media & entertainment, corporate, e-commerce,
fitness, higher education, government, and eLearning.
View source
version on businesswire.com: https://www.businesswire.com/news/home/20240620085571/en/
Media Contact Phil LeClare phil.leclare@3playmedia.com
617-209-9406 www.3playmedia.com @3playmedia