On Wednesday, Google rolled out several updates to its first-party media-generated AI models available through the Vertex AI Cloud platform.
Lyria, a Google text-to-music model, is now available in preview for selected customers. The company’s VEO 2 video creation model is enhanced with new editing and visual effects customization options. The company has also launched a voice cloning feature powered by Chirp 3, Google’s audio understanding model for “Allow-Listed” users. And Imagen 3 Image Generator now offers what the company describes as “significantly” better performance.
The next cloud-timed update is the latest push Google can push the enterprise market for AI generated by. The company is probably the most directly competing with Amazon. Amazon offers a comparable cloud AI platform called Bedrock with its own set of generated AI models.
Google is pitching Lyria as an alternative to royalty-free music libraries. Using the model, customers can create songs in a variety of styles and genres, from jazzy piano solos to lo-fi tracks, the company said.
Chirp 3, on the other hand, can synthesize speech in about 35 languages. First previewed earlier this year, the Chirp 3 drives instant custom audio. It is now generally available. This model supports new tools that will be launched in preview. It will be released in a preview called transcription by day, which separates and identifies the recording speaker from multiple participants.
To prevent abuse, instant custom audio is subject to a “hard work” process to verify “appropriate voice use permission,” Google says.
With regard to VEO 2, the model can now remove background images, logos, and objects from existing videos and extend the frames of video footage (e.g. convert landscape video to portraits). You can also adjust the camera angle and pacing in the AI-generated scene to create time, drone-style clips, and more, interpolate between specified starting and end frames.
These VEO features are available in preview for now.
Regarding the aforementioned Imagen 3 upgrade, Google said it would improve the model’s ability to remove objects and reconstruct missing or damaged parts of the image.
All media generated by Imagen, Veo, and Lyria (not Chirp) are used to water out Google’s SynthID technology. The company said all generation AI models have “built-in protection” to protect against the creation of harmful content.
Google has historically not shown which particular data trains the model. And the tech giant stuck to that precedent today. Training data tends to be a controversial subject for IP-related reasons. Some companies train models on copyrighted works without first obtaining permission from the rights holder. These companies argue that the US doctrine of fair use protects practices, but some creators naturally disagree. Many fight vendors in court.
Google previously told TechCrunch that it offers an opt-out mechanism for model training and a compensation policy that protects Google Cloud and Vertex AI customers from AI-related copyright disputes.