The quality of the voice generated by AI is excellent enough, including creating audiobooks and podcasts, reading articles aloud, and basic customer support. However, many companies don’t think AI Voice Tech is reliable enough to deploy.
So, two MIT alumni, Moin Nadeem and Nikhil Murthy (pictured above), founded Phonic, which provides an end-to-end voice stack to increase the reliability of synthetic voices while reducing latency.
Nadeem and Murthy met at MIT and have known each other for over seven years. When the duo began building phonics last year, they felt that there weren’t many companies creating a complete voice technology solution.
“Voice AI is in a place where you can tie up and integrate intelligence from automated speech recognition (and text to speech), and more,” Murthy told TechCrunch. “But when I spoke to real customers, I found out there was a lack of reliable (solutions) on a large scale.”
Nadeem, who previously worked for MosaicML, a company that Databricks acquired for $1.3 billion in 2023, said many companies building in Voice AI space (VAPI) are stitching together separate AI models to create workflows.
Phonic takes a different approach. Train your models end-to-end in-house. Mercy said this has several benefits.
“Owning a model allows you to deeply integrate some (…) reliability into (the model itself),” he said. “If you don’t own that layer (…), you just tie different pieces together that don’t actually fit seamlessly.”
Murthy added that Phonic’s method also allows the company to cost-effectively host and run the models. He claims that Phonic trains the model with a variety of recordings, including accented speech recordings, to make it very robust.
Phonic is currently working with a limited set of partners, including companies in the insurance and medical spaces, but plans to launch a wide range of products in a few months. Soon, future clients will be able to try out phonic technology from their website, Nadeem said.
Phonic raised $4 million in the seed round led by Lux. We embraced replica co-founder Amjad Masad, Face co-founder Clem Delangue, Applied Intuition co-founder Qasar Younis, and Modal Labs founder Erik Bernhardsson.
Lux Capital partner Grace Isford said the company’s method of in-house training model is appealing to investment companies.
“I think both Moyne and Nikyl are incredible technicians,” she said. “They founded a machine learning club at MIT. And they have been working on training models for a while. What’s more, their approach is to combine the spreading model of the voice AI sector with the proprietary model is novel.”