Two undergraduates have built an AI speech model comparable to Notebooklm

Although they don’t have extensive AI expertise, they say they have created an openly available AI model that can generate clips in a similar podcast style to Google’s NoteBookLM.

The market for synthetic speech tools is vast and growing. ElevenLabs is one of the biggest players, but there is no shortage of challengers (see Playai, Sesame, etc.). Investors believe these tools have great potential. According to Pitchbook, Startups Dearaling Voice AI Tech raised more than $398 million in VC funding last year.

Toby Kim, one of the co-founders of Nari Labs, the group behind the newly released model, said he and his fellow co-founders had begun learning about Speech AI three months ago. Inspired by Notebooklm, they wanted to create a model that provided more control over the generated voice and “freedom of scripts.”

Kim says he used Google’s TPU Research Cloud Program. This allows researchers to access the company’s TPU AI chips for free to train Nari’s model, DIA. DIA weighs 1.6 billion parameters and generates dialogs from scripts, allowing users to customize speaker tones and insert disfluence, cough, laughter and other nonverbal cues.

Parameters are internal variables used by the model and create predictions. In general, models with more parameters will perform better.

Available by hugging the face of the AI Dev platform and Github, DIA can run on most modern PCs with at least 10GB of VRAM. It produces random audio unless prompted with the intended style description, but you can also clone a person’s voice.

In a quick test of TechCrunch’s DIA via Nari’s web demo, DIA worked very well and was unable to generate two-way chats for any subject. Voice quality appears to be competitive with other tools, and the voice clone feature is the easiest thing this reporter has tried.

Here’s a sample:

But like many audio generators, DIA offers little protection. Creating recordings of disinformation and fraudsters is trivial. On the DIA project page, Nari dissuades the model from abuse, pretends to, deceives, or engages in illegal campaigns, but the group says it is “not responsible” for the misuse.

Nari also has not revealed which data has been shattered to train DIAs. DIA may have been developed using copyrighted content. Hacker News commenters point out that one sample sounds like a host of NPR’s “Planet Money” podcast. The training model for copyrighted content is a broad but legally questionable practice. Some AI companies argue that fair use protects them from liability, but rights holders argue that fair use doesn’t apply to training.

In any case, Kim says Nari’s plan is to create a synthetic speech platform with a “social aspect” above DIA and create a larger model for the future. Nari will also release a technical report from DIA to expand support for the model for languages beyond English.

Source link

What's Hot

Marks & Spencer confirms cybersecurity incident amid ongoing chaos

The IMF warns that Trump’s tariffs promote an uncertain global economic outlook | International Trade

Indian EV Startup Ather cuts IPO size to $388 million and seeks a valuation of $1.4 billion after money

Two undergraduates have built an AI speech model comparable to Notebooklm

character. AI announces Avatarfx, an AI video model for creating realistic chatbots

Adaptive Computer wants to reinvent PCs with “Vibe” which codes non-programmers

Crowd-sourced AI benchmarks have serious flaws, some experts say

Manychat taps $140 million to boost its business messaging platform with AI

Libian elects Cohere CEO to board with latest signal EV makers are bullish with AI

Chatgpt search is growing rapidly in Europe, Openai data suggests

Marks & Spencer confirms cybersecurity incident amid ongoing chaos

The IMF warns that Trump’s tariffs promote an uncertain global economic outlook | International Trade

Indian EV Startup Ather cuts IPO size to $388 million and seeks a valuation of $1.4 billion after money

US Spot Bitcoin ETFs experience a surge of $381 million, but BTC holds the ground while selling Tradfi

Cancelling the Joy Reed Show is “mistakes”

Black melodrama has a possibility

The “Facts of Life” star died in 83

Cara Sophia Gascon joins Oscar despite social media controversy

Our Picks

Marks & Spencer confirms cybersecurity incident amid ongoing chaos

The IMF warns that Trump’s tariffs promote an uncertain global economic outlook | International Trade

Indian EV Startup Ather cuts IPO size to $388 million and seeks a valuation of $1.4 billion after money

Most Popular

TikTok announces it will go dark on Sunday without ‘definitive’ guarantees

President Trump mints $31 billion in new official $TRUMP crypto meme coin

El Salvador’s secret weapon? Stacey Herbert talks about the company’s extensive Bitcoin education program

Subscribe to Updates

What's Hot

Two undergraduates have built an AI speech model comparable to Notebooklm

Related Posts