Google Deepmis Hassabis, a podcast co-hosted by LinkedIn co-founder Reid Hoffman, said Google is planning to combine Gemini AI models with VEO video generation models to improve understanding of the former physical world.
“We’ve always built the foundation model, Gemini, to make it multimodal from the start,” Hassavis said.
The AI industry is gradually moving towards the “Omni” model. Google’s latest Gemini models can generate audio as well as images and text, but ChatGPT’s Openai default model allows native images to create images containing Studio Ghibli style art, of course. Amazon has also announced plans to launch an “Any-to-to” model later this year.
These OMNI models require a lot of training data, such as images, video, audio, text, and more. Hassabis implies that VEO’s video data comes primarily from YouTube, a platform owned by Google.
“Essentially, by watching YouTube videos, you can watch a lot of YouTube videos (Veo 2) and get a sense of the physics of the world,” says Hassabis.
Google previously told TechCrunch that “May Be,” a model trained on “Some” YouTube content in accordance with an agreement with YouTube creators. Google reportedly expanded its terms of use last year, allowing its company to tap more data and train AI models.