ElevenLabs starts text models from its own audio

ElevenLabs, an AI startup that has just raised a $180 million megafunding round, is known primarily for its audio generation skills. The company took a step in another technical direction by launching its first standalone speech-to-text model called Scribe.

The $3.3 billion worth of startup has helped many other companies in providing text-to-speech services through a huge library of voices. However, the company is now moving on to detect speeches and is trying to compete with the Whisper models of Gladia, SpeechMatics, AssemblyAi, Deepgram and Openai.

ElevenLabs’ Scribe Model supports over 99 languages at launch. The company categorizes more than 25 languages in the excellent accuracy category for models with word error rates below 5%. This list includes English (97% accuracy rate), French, German, Hindi, Indonesian, Japanese, Kannada, Malayalam, Polish, Portuguese, Spanish and Vietnamese. Other languages are ranked in different categories with high (5%-10% word error rate), good (10%-20% word error rate), and moderate (25%-50%) word error rates.

The company said the model surpasses the Google Gemini 2.0 Flash and whispers a big V3 in multiple languages from the Fleurs and Common Voice Benchmark tests.

ElevenLabs developed text components from the audio of the AI Conversation Agent platform, released last year. However, this is the first time the company has released a standalone voice detection model. In a conversation with TechCrunch last month, CEO Mati Staniszewski talked about improving its voice detection model.

“We want to better understand what you’re saying in the conversation. We’re working on ways to move away from content generation and understanding and transcriptional speech alone,” Stanisefski said at that point. “A lot of people say that speech and text are solved problems. But in many languages, that’s pretty bad. I think we can build a better speech detection model because we have an internal team that annotates the data and provides quick feedback.”

The model also has a smart speaker diaryization to communicate who’s talking, word-level timestamps for accurate subtitles, and automatic tag sound events like audience laughter. The startup offers a way for customers to directly transfer video content and add subtitles or captions to their studios.

Currently, Scribe only works with pre-recorded audio formats. The company said it will soon release a low-latency real-time version of the model. This means that it is not yet effective in getting transcriptions or voice memos.

ElevenLabs prices Scribes at $0.40 for an hour of transcribed audio. The rates are competitive, but some of its rivals currently offer low prices for audio trumptions with differentiation of several features.

Source link

What's Hot

At least two people die as harsh weather clears the south and the Midwest | Weather News

Kids love video game movies

Houthis targets the capital city of sanaa and says our bomb Yemen again | Israeli-Palestinian conflict news

ElevenLabs starts text models from its own audio

Openai’s O3AI model lowers the score on the benchmarks lower than the company initially suggested

Famous AI researchers launch controversial startups to replace all human workers everywhere

Openai’s new inference AI model shows even more hallucinations

ChatGpt refers to users by undeclared names, and some people find them “creepy”

ChatGPT now uses “memory” to personalize web searches

Is the Spack back? | TechCrunch

At least two people die as harsh weather clears the south and the Midwest | Weather News

Kids love video game movies

Houthis targets the capital city of sanaa and says our bomb Yemen again | Israeli-Palestinian conflict news

Openai’s O3AI model lowers the score on the benchmarks lower than the company initially suggested

Cancelling the Joy Reed Show is “mistakes”

Black melodrama has a possibility

The “Facts of Life” star died in 83

Cara Sophia Gascon joins Oscar despite social media controversy

Our Picks

At least two people die as harsh weather clears the south and the Midwest | Weather News

Kids love video game movies

Houthis targets the capital city of sanaa and says our bomb Yemen again | Israeli-Palestinian conflict news

Most Popular

TikTok announces it will go dark on Sunday without ‘definitive’ guarantees

President Trump mints $31 billion in new official $TRUMP crypto meme coin

El Salvador’s secret weapon? Stacey Herbert talks about the company’s extensive Bitcoin education program

Subscribe to Updates

What's Hot

ElevenLabs starts text models from its own audio

Related Posts