Google’s most expensive AI model appears to have surpassed the major milestone of beating a video game from 29 years ago.
Last night, Google CEO Sundar Pichai posted a victory for X.
To be clear, Gemini plays a live stream of Pokemon (in his own words) by a “30-year-old software engineer with no connection to Google” passing through Joel Z.
For example, Google AI Studio’s product lead Logan Kilpatrick posted last month that Gemini “has made great strides in completing Pokémon” and “earned the fifth badge.”
Why Pokemon? In February, humanity highlighted the advances the Claude AI models were making in “Pokemon Red,” writing that Claude’s “Expanded Thinking and Agent Training” gives a “big boost” with “more unexpected” tasks, such as playing classic games. (“Pokémon Red” and “Blue” are different versions of the Game Boy title, first released in 1996, tied to the long-term Pokémon franchise). There is also Claude playing the Pokemon Twitch Channel, which Joel Z quoted as inspiration.
Despite that progress, Claude still doesn’t seem to have beaten “Pokemon Red.” Does that mean that Gemini is objectively superior in the game? Joel Z urged viewers to: “Don’t think of this as a benchmark about how well LLM plays Pokemon. You can’t compare directly. Gemini and Claude have different tools and receive different information.”
Also, both AI models need help to play the game. So the aforementioned agent harness comes in, providing a game screenshot with additional information overlaid on the model, determining how the model responds (which may involve calling special agents), and pressing the button that corresponds to the AI instructions.
TechCrunch Events
Berkeley, California
|
June 5th
Book now
Joel Z admitted that there were other “development interventions” to help Gemini complete the game, but argued that it was not fraudulent.
“My intervention improves Gemini’s overall decision-making and reasoning ability,” he says. “I don’t give any specific hints. There are no walkthroughs or direct instructions for certain challenges like Mt. Moon.
Additionally, he said, “Gemini is still actively developing playing Pokemon, and the framework continues to evolve.”