Google’s AI R&D Lab DeepMind says it has developed a new AI system to tackle the problem of “machine rated” solutions.
In the experiment, a system called Alphaevolve could help optimize some of the infrastructure Google uses to train AI models, DeepMind said. The company says it is building a user interface to interact with AlphaeVolve and plans to launch an early access program for selected academics ahead of a wider deployment.
Most AI models hallucinations. Because of their stochastic architecture, they sometimes construct things with confidence. In fact, new AI models like Openai’s O3 hallucination show more challenging nature of the problem than their predecessors.
Alphaevolve introduces a clever mechanism for reducing hallucinations: automated rating systems. The system uses the model to generate, critique, reach a pool of possible answers to a question, and automatically evaluate and score answers for accuracy.

Alphaevolve is not the first system to obtain this tack. Researchers, including the Deepmind team several years ago, applied similar techniques in various mathematical domains. However, Deepmind claims that Alphaevolve uses “state-of-the-art” models, particularly Gemini models.
To use AlphaeVolve, users need to urge the system with problems, including optional details such as instructions, equations, code snippets, and related literature. It also needs to provide a mechanism to automatically evaluate system responses in the form of a formula.
Because Alphaevolve can only solve problems that can be self-evaluated, a system only works with certain types of problems, especially issues such as computer science and system optimization. Another major limitation is that Alphaevolve can describe a solution as an algorithm only, making it unsuitable for non-numeric problems.
To benchmark Alphaevolve, DeepMind attempted to curate the system with approximately 50 mathematical problems ranging from geometry to combinations. Alphaevolve managed to “rediscover” the most well-known answer to the problem, revealing improved solutions in 20% of cases.
DeepMind evaluated Alphaevolve for practical issues, such as increasing the efficiency of Google’s data centers and speeding up model training execution. According to the lab, Alphaevolve has generated an algorithm that averages 0.7% of Google’s global computational resources, continually recovers. The system also proposed an optimization that reduces the overall time it takes for Google to train a Gemini model by 1%.
To be clear, Alphaevolve has not made any groundbreaking discoveries. In one experiment, the system was able to find improvements to Google’s TPU AI accelerator chip design, which had previously been flagged by other tools.
However, DeepMind creates the same cases that many AI labs do for their systems. AlphaeVolve is a way to save time while freeing up professionals to focus on other more important tasks.