Google’s latest video generation AI model, VEO 3, allows you to create audio to match the clips you generate.
On Tuesday during the Google I/O 2025 developer meeting, Google announced VEO 3. It claims to accompany sound effects, background noise, and even videos that generate and create dialogue. VEO 3 improves its predecessor, VEO 2, in terms of the quality of the footage that can be generated, says Google.
VEO 3 will be available for subscribers to AI Ultra Plans, which is $249.99 per month, starting Tuesday, on Google’s Gemini chatbot app.
Demis Hassabis, CEO of Google Deepmind, Google’s AI R&D division, said during a press conference, “it’s been appearing for the first time since the quieter days of video generation,” Demis Hassabis said. “We’ll give you a prompt (which can appear in VEO 3) explaining the character and the environment, and suggesting an explanation and dialogue of how it sounds.”
The wide range of tools available to build video generators has led to an explosion of providers that are saturated with space. Startups like Runway, Lightricks, Genmo, Pika, Higgsfield, Kling, Luma, and tech giants like Openai and Alibaba have released their models with high-speed clips. In many cases, it is rare to distinguish one model from another.
If Google can fulfill its promise, the audio output is a major differentiator for the VEO 3. AI-powered sound generation tools are not novel, nor are they models that create video sound effects. However, VEO 3 can automatically sync raw pixels from videos according to Google and sync generated sounds.
Here is a sample clip for the model:
VEO 3 may have been made possible by previous research on DeepMind, an AI “video to audio”. Last June, DeepMind revealed that it was developing AI Tech to generate soundtracks for videos by training models on sound and dialog transcripts, video clip combinations.
DeepMind doesn’t say exactly where they sourced the content to train VEO 3, but YouTube is a strong possibility. Google owns YouTube, and DeepMind previously told TechCrunch that Google models like VEO’s “May” will be trained with YouTube material.
To mitigate the risk of deepfakes, DeepMind says it uses its proprietary watermarking technology, SynthID, to embed markers that are invisible in Veo 3-generated frames.
Companies like Google Veo 3 pitch as a powerful creative tool, but many artists are naturally wary of them. A 2024 study commissioned by the Animation Guild, a union representing Hollywood animators and cartoonists, estimates that by 2026 more than 100,000 US-based film, television and animation jobs will be destroyed by AI.
Google today launched a new feature in VEO 2. This includes features that allow users to improve consistency in model images for characters, scenes, objects, and styles. The latest VEO 2 can understand camera movements such as rotation, dolly, and zoom, allowing users to add or erase objects from the video, or enlarge the frame of the clip, turning it from portraits to landscapes, for example.
Google says all of these new VEO 2 features will appear on the Vertex AI API platform in the coming weeks.