One of Google's recent GeminiAI models is getting worse safety

According to the company’s internal benchmarks, the recently released Google AI model is worse than its predecessor in certain safety tests.

In a technical report released this week, Google revealed that the Gemini 2.5 Flash model is more likely to generate text that violates safety guidelines than Gemini 2.0 Flash. With two metrics: “text-to-text safety” and “picture-to-text safety,” Gemini 2.5 regresses 4.1% and 9.6%, respectively.

Text-to-text Safety Measurement Prompt measures how often a model violates Google’s guidelines, and Image-to-text Safety assesses whether the model closely adheres to these boundaries when prompted using the image. Both tests are automated and not human supervision.

In an email statement, a Google spokesperson confirmed that Gemini 2.5 Flash is “deteriorating text-to-text safety and image-to-image safety.”

The results of these surprising benchmarks are because AI companies are unlikely to move to make their models more tolerant, in other words, refuse to respond to controversial or sensitive subjects. For the latest crops in the llama model, Meta said the model has supported “some views on others” and coordinated it to respond to more “discussed” political prompts. Openai said earlier this year that it will fine-tune future models to avoid taking an editorial stance, providing multiple perspectives on controversial topics.

Sometimes those tolerance efforts backfired. TechCrunch reported Monday that Openai’s ChatGPT-powered default model allows minors to generate erotic conversations. Openai condemned the “bug” behavior.

According to Google’s technical report, Gemini 2.5 Flash, which is still previewing, follows the instructions more faithfully than Gemini 2.0 Flash, which includes instructions to cross the problem line. The company argues that regressions can be partially attributable to false positives, but also acknowledges that Gemini 2.5 flashes can generate “content of violation” when explicitly asked.

TechCrunch Events

Berkeley, California
|
June 5th

Book now

“Of course there is tension between sensitive topics and the following instructions regarding safety policy violations, which is reflected in our assessment,” the report reads.

The scores from SpeechMap, a benchmark that explores how models respond to sensitive and controversial prompts, suggest that Gemini 2.5 flashes are much less likely to refuse to answer more controversial questions than Gemini 2.0 flashes. Testing TechCrunch’s model through the AI platform OpenRouter found that writing essays is undoubtedly written in favour of replacing human judges with AI, weakening US due process protections and implementing a wide range of legitimate government surveillance programs.

Thomas Woodside, co-founder of Secure AI Project, said the limited details Google provided in its technical report indicate the need for transparency in model testing.

“There is a trade-off between following guidance and policy follow, as some users may request content that violates the policy,” Woodside told TechCrunch. “In this case, Google’s latest flash model is in compliance with the instructions, while violating the policy. Google has not provided many details about the particular cases in which the policy has been compromised.

Google has previously been attacked with model safety reporting practices.

The most capable model is the Gemini 2.5 Pro. When the report was finally published, we initially omitted key safety test details.

On Monday, Google released a more detailed report with additional safety information.

Source link

What's Hot

Iran reaffirms its uranium enrichment rights as further US discussions have been delayed | Nuclear News

Review Week: Apple won’t raise prices – yet

CME Group Crypto Derivatives volume rose 129% in April, with ETH leading the price

One of Google’s recent GeminiAI models is getting worse safety

Google’s Gemini defeated Pokemon Blue (support a little)

The co-founder of Instagram says that AI chatbots are “juice engagement” rather than useful.

Google’s NoteBooklm Android and iOS apps can be pre-ordered

Openai pledges to make changes to prevent future Chatgpt sycophancy

Airbnb quietly deploys AI customer service bots in the US

Reddit’s AI Play is for Google audiences, not just community Sklar

Iran reaffirms its uranium enrichment rights as further US discussions have been delayed | Nuclear News

Review Week: Apple won’t raise prices – yet

CME Group Crypto Derivatives volume rose 129% in April, with ETH leading the price

Google’s Gemini defeated Pokemon Blue (support a little)

Cancelling the Joy Reed Show is “mistakes”

Black melodrama has a possibility

The “Facts of Life” star died in 83

Cara Sophia Gascon joins Oscar despite social media controversy

Our Picks

Iran reaffirms its uranium enrichment rights as further US discussions have been delayed | Nuclear News

Review Week: Apple won’t raise prices – yet

CME Group Crypto Derivatives volume rose 129% in April, with ETH leading the price

Most Popular

TikTok announces it will go dark on Sunday without ‘definitive’ guarantees

President Trump mints $31 billion in new official $TRUMP crypto meme coin

El Salvador’s secret weapon? Stacey Herbert talks about the company’s extensive Bitcoin education program

Subscribe to Updates

What's Hot

One of Google’s recent GeminiAI models is getting worse safety

Related Posts