Google's Gemini panic when playing Pokemon

AI companies are fighting to dominate the industry, but sometimes they are also fighting at Pokemon gyms.

As both Google and humanity are studying how modern AI models navigate early Pokemon games, the results can be just as interesting as enlightening. This time, Google Deepmind writes in a report that the Gemini 2.5 Pro relies on panic when Pokémon is nearing death. This means that AI performance, according to the report, may experience “qualitatively observable degradation in the model’s inference ability.”

AI benchmarks – or the process of comparing the performance of different AI models – are suspicious art that provides little context for the actual functionality of a particular model. However, some researchers believe it may be useful to study how AI models play video games (or at least a kind of funny).

Over the past few months, two non-related developers of Google and humanity have set up their own Twitch streams, called “Gemini Plays Pokémon” and “Claude Plays Pokémon.”

Each stream displays the AI’s “inference” process, or a natural language translation of how the AI evaluates the problem and reaches the response. It gives insight into how these models work.

The progress of these AI models is impressive, but I’m not very good at playing Pokemon yet. It takes hundreds of hours through a game that a Gemini can complete in an exponentially short time.

What’s interesting about watching AI navigate Pokemon games is not the time to complete, but how it behaves along the way.

“In the playthrough process, the Gemini 2.5 Pro falls into a variety of situations and simulates ‘panic’ in the model,” the report states.

This “panic” state can cause a deterioration in model performance as AI can suddenly stop using certain tools that are free to use for a set of gameplay. AI does not think or experience emotions, but their actions mimic the way humans make poor and hurry decisions under stress.

“This behavior occurred in enough individual instances enough that Twitch chat members were actively aware of when they were occurring,” the report states.

Claude also showed some strange behavior on his journey across Kant. In one example, AI took up the pattern in which once all Pokemon have exhausted their health, the player’s character “whiteouts” and returns to the Pokemon Center.

When Claude gets stuck in Moon Cave Mountain, he mistakenly hypothesized that if it intentionally disappoints all of its Pokemon, it will be transported across the cave to the next town’s Pokemon Center.

But that’s not how the game works. When all Pokemon die, you will return, not geographically, not the closest to the recently used Pokemon Center. Viewers were watching in horror as the AI essentially tried to kill itself in the game.

Despite its drawbacks, there are several ways in which AI is better than human players. At the time of Gemini 2.5 Pro’s release, AI can solve puzzles with impressive accuracy.

AI, with the help of several human beings, created an agent tool – prompted an instance of Gemini 2.5 Pro for a specific task – solved the game’s boulder puzzle and found an efficient route to reach its destination.

“Just explaining Boulder’s physics and how to verify valid paths, the Gemini 2.5 Pro can take a shot of some of these complex boulder puzzles needed to go on the winning path,” the report says.

Gemini 2.5 Pro did a lot of work to create these tools on its own, so Google theorizes that current models can create these tools without human intervention. Perhaps Gemini will treat themselves to create a “don’t panic” module.

Source link

What's Hot

Unlock scaling growth in TC at all stages and earn $210 for an additional 6 days

Tumblr’s content filtering system is incorrectly flagging posts as “mature”, users blame AI

A comprehensive list of 2025 tech layoffs

Google’s Gemini panic when playing Pokemon

Sam Altman says Meta tried to poach open eye talent with a $100 million offer

Police have closed Cluely’s party, “All Cheats” startup

Openai’s $200 million DOD deal could narrow down Frenemy Microsoft

Sequoia-backed Crosby launches a new kind of AI-powered law firm

Intel laids off up to 20% of Intel Foundry workers

Anysphere launches a 200$ nonth cursor AI coding subscription

Unlock scaling growth in TC at all stages and earn $210 for an additional 6 days

Tumblr’s content filtering system is incorrectly flagging posts as “mature”, users blame AI

A comprehensive list of 2025 tech layoffs

EVs dominate the index of most American-made cars, but it’s not just Tesla

Cancelling the Joy Reed Show is “mistakes”

Black melodrama has a possibility

The “Facts of Life” star died in 83

Cara Sophia Gascon joins Oscar despite social media controversy

Our Picks

Unlock scaling growth in TC at all stages and earn $210 for an additional 6 days

Tumblr’s content filtering system is incorrectly flagging posts as “mature”, users blame AI

A comprehensive list of 2025 tech layoffs

Most Popular

TikTok announces it will go dark on Sunday without ‘definitive’ guarantees

President Trump mints $31 billion in new official $TRUMP crypto meme coin

El Salvador’s secret weapon? Stacey Herbert talks about the company’s extensive Bitcoin education program

Subscribe to Updates

What's Hot

Google’s Gemini panic when playing Pokemon

Related Posts