People benchmark AI by bounce the ball in a rotated form.

The unofficial strange AI benchmark list is growing.

For the past few days, some of the X’s AI community was absorbed in testing how various AI models, especially the so -called inference models, processing such a prompt. 。 Rotate the shape slowly and make sure that the ball remains in the shape. “

Some models are better managed with this “rotated ball” benchmark than other models. According to one user of X, R1, which is freely available in AI Lab Deepseek in China, has swept the floor in O1 Pro mode, O1 Pro mode of $ 200 per month, as part of the Openai Chatgpt Pro plan.

👀 DeepSeek R1 (right) Pulverine O1-PRO (left) 👀

Prompt: “Write a Python script to bounce a yellow ball in a square, and process the collision detection properly. Rotate the square slowly. Mounting in Python. Confirm that the ball is a square. Pic.twitter.com/3sad9efpez

-Ivan fioravantiᯅ (@ivanfioRavanti) January 22, 2025

According to another X poster, Anthropic Claude 3.5 Sonnet and Google’s Gemini 1.5 Pro model misunderstood physics, and the ball escaped the shape. Other users reported that Google’s Gemini 2.0 Flash Thinking Expertional and Openai’s old GPT-4O have repeatedly evaluated at once.

Test 9 AI models in the physical simulation task: rotating triangle +bounce ball. result:

Deepseek-R1
🥈 Sonar giant
GPT-4O

worst? Openai O1: I completely misunderstood the task

The first line of the video below = inference model, REST = base model. pic.twitter.com/eoyrhvnazr

-Aadhithya D (@aadhithya_d2003) January 22, 2025

But what can AI can codish or cannot code?

Now, simulating bounce balls is a classic programming task. The accurate simulation incorporates a collision detection algorithm. This tries to identify the time when two objects (such as the ball and shape side) collide. An unwritten algorithm can affect the performance of the simulation or a clear physics mistake.

The X User N8 program is a resident of AI Startup NOUS RESEARCH and it took about 2 hours to program the rotating corps ball with zero. “It is necessary to track how the collisions in multiple coordinates and systems are performed and design the code to be robust from the beginning.”

However, bouncing the ball and rotating the shape is a reasonable test of programming skills, but they are not very empirical AI benchmarks. Even the slight fluctuation of the prompt can get different results. For this reason, some of X reports that O1 has more luck, but other users say that R1 is lacking.

If anything, these virus stests indicate that it is difficult to handle a useful measurement system for AI models. It is often difficult to tell something that distinguishes a model from another model outside the esoteric benchmark, which has nothing to do with most people.

As in the ARC-Agi benchmark and the last exam of humanity, many efforts are underway to build better tests. We look at those fare -and in the meantime, the ball GIF bounces in a rotated form.

Source link

What's Hot

We attack Yemen again after at least 80 people have died in Hodeida | Israeli-Palestinian conflict news

Canadian Prime Minister Carney plans a stronger defense and plans wider trade amid the US rift | Election News

Congress has questions about 23 and ME bankruptcy

People benchmark AI by bounce the ball in a rotated form.

Famous AI researchers launch controversial startups to replace all human workers everywhere

Openai’s new inference AI model shows even more hallucinations

ChatGpt refers to users by undeclared names, and some people find them “creepy”

ChatGPT now uses “memory” to personalize web searches

Is the Spack back? | TechCrunch

Openai is reportedly in talks to buy Windsurf for $3 billion, with news forecasts expected later this week

We attack Yemen again after at least 80 people have died in Hodeida | Israeli-Palestinian conflict news

Canadian Prime Minister Carney plans a stronger defense and plans wider trade amid the US rift | Election News

Congress has questions about 23 and ME bankruptcy

Anti-Trump Management Protesters Turned out for a Rally in the US | Donald Trump News

Cancelling the Joy Reed Show is “mistakes”

Black melodrama has a possibility

The “Facts of Life” star died in 83

Cara Sophia Gascon joins Oscar despite social media controversy

Our Picks

We attack Yemen again after at least 80 people have died in Hodeida | Israeli-Palestinian conflict news

Canadian Prime Minister Carney plans a stronger defense and plans wider trade amid the US rift | Election News

Congress has questions about 23 and ME bankruptcy

Most Popular

TikTok announces it will go dark on Sunday without ‘definitive’ guarantees

President Trump mints $31 billion in new official $TRUMP crypto meme coin

El Salvador’s secret weapon? Stacey Herbert talks about the company’s extensive Bitcoin education program

Subscribe to Updates

What's Hot

People benchmark AI by bounce the ball in a rotated form.

Related Posts