On Wednesday, Meta unveiled the new V-Jepa 2 AI model, a “world model” designed to allow AI agents to understand the world around them.
The V-JEPA 2 is an extension of the V-JEPA model released by Meta last year, and was trained on over a million hours of video. This training data will help robots or other AI agents work in the physical world, and will help you understand and predict how concepts like gravity will affect what happens next.
These are the types of common sense connections that little children and animals create as their brains develop. For example, when fetching with a dog, the dog will understand how to rebound upwards by bouncing the ball back onto the ground, or how to run towards where the ball lands.
Meta shows examples where a robot could face a robot, for example, holding a plate and spatula and walking towards a stove with cooked eggs. AI can predict that the next action is to use a spatula to move the eggs to the plate:
According to Meta, the V-Jepa 2 is 30 times faster than Nvidia’s Cosmos model. This seeks to strengthen intelligence related to the physical world. However, it is possible that Meta is evaluating its own model according to a different benchmark from NVIDIA.
“We believe that world models will lead a new era of robotics, allowing real-world AI agents to assist with chores and physical tasks without the need for astronomical amounts of robot training data,” explained Yann Lecun, chief AI scientist at Meta, in a video.