Humanity has used Pokemon to benchmark the latest AI models

Humanity has used Pokemon to benchmark the latest AI models. Yes, really.

In a blog post published Monday, Anthropic said it had tested its latest model, the Claude 3.7 Sonnet, with Game Boy Classic Pokémon Red. The company equips the model with basic memory, screen pixel input, and function calls to press buttons, allowing it to navigate around the screen and play Pokemon continuously.

A unique feature of Claude 3.7 Sonnet is its ability to engage in “extended thinking.” Like Openai’s O3-Mini and Deepseek’s R1, Claude 3.7 Sonnet can “infer” through challenging problems by applying more computing and spending more time.

It was obviously convenient in Pokemon Red.

Compared to the previous version of Claude 3.0 sonnet, Claude 3.7 sonnet, unable to leave the house in Palette Town, where the story begins, fought against three Pokemon Gym leaders and won a badge.

Human Pokémon Red — Image credits: Humanity

Currently, it is not clear how much computing it takes for Claude 3.7 sonnets to reach those milestones, and how long each took. Humanity only said that the model took 35,000 actions to reach the last gym leader, the Surge.

Until some enterprising developers are found, that’s definitely not the case.

Pokemon Red is the benchmark for toys above all else. However, there is a long history of games used for AI benchmark purposes. In the past few months alone, many new apps and platforms have emerged to test the model’s gameplay abilities in titles ranging from Street Fighter to Pictory.

Source link

What's Hot

US judge blocks Trump’s efforts to ban Harvard University from registering foreign students | Education News

Openai goes with Jony Ive when Google plays AI catchup

TechCrunch Mobility: AI BET from Uber Freight, Robotaxi Caweat from Tesla and Nikola trucks hit auction blocks

Humanity has used Pokemon to benchmark the latest AI models

Openai goes with Jony Ive when Google plays AI catchup

Microsoft says that Aurora AI can accurately predict air quality, typhoons, etc.

Only 3 days left to save up to $900 to destroy the 2025 pass

Sessions at Iliana Quinonez on Scaling AI Startup on Google Cloud: AI

After Klarna, Zoom CEO will also use AI avatars in Quarterly Call

Human CEOs argue that AI models are less hallucinating than humans

US judge blocks Trump’s efforts to ban Harvard University from registering foreign students | Education News

Openai goes with Jony Ive when Google plays AI catchup

TechCrunch Mobility: AI BET from Uber Freight, Robotaxi Caweat from Tesla and Nikola trucks hit auction blocks

Microsoft says that Aurora AI can accurately predict air quality, typhoons, etc.

Cancelling the Joy Reed Show is “mistakes”

Black melodrama has a possibility

The “Facts of Life” star died in 83

Cara Sophia Gascon joins Oscar despite social media controversy

Our Picks

US judge blocks Trump’s efforts to ban Harvard University from registering foreign students | Education News

Openai goes with Jony Ive when Google plays AI catchup

TechCrunch Mobility: AI BET from Uber Freight, Robotaxi Caweat from Tesla and Nikola trucks hit auction blocks

Most Popular

TikTok announces it will go dark on Sunday without ‘definitive’ guarantees

President Trump mints $31 billion in new official $TRUMP crypto meme coin

El Salvador’s secret weapon? Stacey Herbert talks about the company’s extensive Bitcoin education program

Subscribe to Updates

What's Hot

Humanity has used Pokemon to benchmark the latest AI models

Related Posts