AI companies are fighting to dominate the industry, but sometimes they are also fighting at Pokemon gyms.
As both Google and humanity are studying how modern AI models navigate early Pokemon games, the results can be just as interesting as enlightening. This time, Google Deepmind writes in a report that the Gemini 2.5 Pro relies on panic when Pokémon is nearing death. This means that AI performance, according to the report, may experience “qualitatively observable degradation in the model’s inference ability.”
AI benchmarks – or the process of comparing the performance of different AI models – are suspicious art that provides little context for the actual functionality of a particular model. However, some researchers believe it may be useful to study how AI models play video games (or at least a kind of funny).
Over the past few months, two non-related developers of Google and humanity have set up their own Twitch streams, called “Gemini Plays Pokémon” and “Claude Plays Pokémon.”
Each stream displays the AI’s “inference” process, or a natural language translation of how the AI evaluates the problem and reaches the response. It gives insight into how these models work.

The progress of these AI models is impressive, but I’m not very good at playing Pokemon yet. It takes hundreds of hours through a game that a Gemini can complete in an exponentially short time.
What’s interesting about watching AI navigate Pokemon games is not the time to complete, but how it behaves along the way.
“In the playthrough process, the Gemini 2.5 Pro falls into a variety of situations and simulates ‘panic’ in the model,” the report states.
This “panic” state can cause a deterioration in model performance as AI can suddenly stop using certain tools that are free to use for a set of gameplay. AI does not think or experience emotions, but their actions mimic the way humans make poor and hurry decisions under stress.
“This behavior occurred in enough individual instances enough that Twitch chat members were actively aware of when they were occurring,” the report states.
Claude also showed some strange behavior on his journey across Kant. In one example, AI took up the pattern in which once all Pokemon have exhausted their health, the player’s character “whiteouts” and returns to the Pokemon Center.
When Claude gets stuck in Moon Cave Mountain, he mistakenly hypothesized that if it intentionally disappoints all of its Pokemon, it will be transported across the cave to the next town’s Pokemon Center.
But that’s not how the game works. When all Pokemon die, you will return, not geographically, not the closest to the recently used Pokemon Center. Viewers were watching in horror as the AI essentially tried to kill itself in the game.
Despite its drawbacks, there are several ways in which AI is better than human players. At the time of Gemini 2.5 Pro’s release, AI can solve puzzles with impressive accuracy.
AI, with the help of several human beings, created an agent tool – prompted an instance of Gemini 2.5 Pro for a specific task – solved the game’s boulder puzzle and found an efficient route to reach its destination.
“Just explaining Boulder’s physics and how to verify valid paths, the Gemini 2.5 Pro can take a shot of some of these complex boulder puzzles needed to go on the winning path,” the report says.
Gemini 2.5 Pro did a lot of work to create these tools on its own, so Google theorizes that current models can create these tools without human intervention. Perhaps Gemini will treat themselves to create a “don’t panic” module.