An Israeli startup that uses AI voice capture to automate workflows for businesses has released a new open source model it says outstrips OpenAI’s own speech recognition model, performing 50 percent faster without any loss in performance.
Herzliya-based aiOla says its open-source model, Whisper-Medusa, greatly improves the speed compared to OpenAI’s Whisper by improving the way in which the model predicts tokens – the small units of data used by AI.
Unlike OpenAI’s Whisper, which predicts one token at a time, Whisper-Medusa can predict up to 10 at one time, which the company says led to a 50 percent increase in speech prediction speed and generation runtime.
“Creating Whisper-Medusa was not an easy task, but its significance to the community is profound,” said aiOla VP Research Gill Hetz.
“Improving the speed and latency of [large language models] is much easier to do than with automatic speech recognition systems,” he said.
“The encoder and decoder architectures present unique challenges due to the complexity of processing continuous audio signals and handling noise or accents. We addressed these challenges by employing our novel multi-head attention approach, which resulted in a model with nearly double the prediction speed while maintaining Whisper’s high levels of accuracy. It’s a major feat, and we are very proud to be the first in the industry to successfully leverage multi-head attention architecture for automatic speech recognition systems and bring it to the public. “
Facebook comments