Deepseek may have trained the latest models using Google's Gemini

Last week, Chinese lab Deepseek released an updated version of the R1 Reasoning AI model that works well with many mathematics and coding benchmarks. The company did not reveal the source of the data it used to train the models, but some AI researchers speculate that at least partially came from AI in Google’s Gemini family.

Sam Paech, a Melbourne-based developer who creates AI’s “emotional intelligence” assessment, has published what he claims is evidence that Deepseek’s latest model has been trained for output from Gemini. The Deepseek model, called the R1-0528, prefers words and expressions similar to Google’s Gemini 2.5 Pro favours, Paech said in the X-Post.

If you’re wondering why the new Deepseek R1 sounds a little different, I think they’ve probably switched from training with synthetic Openai to synthetic Gemini output. pic.twitter.com/oex9roapnv

– Sam Paech (@sam_paech) May 29, 2025

It’s not a smoking gun. However, he pointed out that another developer, the trace of the Deepseek model, the pseudonym creator of AI’s “free speech assessment,” called SpeechMap, the “thinking” that the model generates when it works towards conclusions, “read like traces of Gemini.”

Deepseek has previously been accused of training on data from rival AI models. In December, developers observed that Deepseek’s V3 model often identifies as ChatGpt, Openai’s AI-powered Chatbot platform, suggesting that it may be trained in the ChatGPT chat log.

Earlier this year, Openai told the Financial Times that it found evidence linking Deepseek to the use of distillation. According to Bloomberg, Microsoft, a collaborator and investor at Openai, detected a large amount of data was being excluded through its Openai developer account in late 2024. Openai believes it is affiliated with Deepseek.

Distillation is not an uncommon practice, but Openai’s terms of service prohibit customers from using company model output to build competing AI.

To be clear, many models misidentify themselves and converge to the same word and phrases of turn. This is because Open Web, a place where AI companies source most of their training data, is scattered with AI slops. Content Farms are using AI to create ClickBait, and bots are flooding Reddit and X.

This “contamination” made it extremely difficult to thoroughly filter the AI output from the training dataset if so.

Still, AI experts like Nathan Lambert, a researcher at the non-profit AI Institute AI2, don’t think Deepseek trained data from Google’s Gemini out of trouble.

“If I were Deepseek, I would definitely create a ton of synthetic data from the best API models out there,” Lambert wrote in X’s post.

If I were deepseek, I would definitely create a ton of synthetic data from the best API models out there. They are short on the GPU and flush with cash. It’s literally more efficient for them more calculations. Yes, about Gemini Distill’s questions.

– Nathan Lambert (@Natolambert) June 3, 2025

In some cases, AI companies are increasing their security measures to prevent distillation.

In April, OpenAI began requesting organizations to complete the identity verification process to access certain advanced models. This process requires a government-issued ID from one of the countries supported by Openai’s API. China is not on the list.

Elsewhere, Google recently launched a “summary” of traces generated by models available through the AI Studio Developer Platform. In May, humanity said it would begin summarizing traces of its own model, citing the need to protect “competitive benefits.”

I will contact Google for comment and update this article if I receive a reply.

Source link

What's Hot

New details appear on the scale of Meta’s $14.3 billion contract

Founder Experience at TechCrunch All Stage: Building for those who build the following

Boston Dynamics Robots dance to “Don’t Stop Me Now” for “American Got Talent” audition

Deepseek may have trained the latest models using Google’s Gemini

Clay will secure a new round at a $300 million valuation, sources say

New York passes bill to prevent AI fuel disasters

Google Tests the Audio Summary for Search Queries

Meta’s Big AI Bet and Our Not So Hot Takes at Fintech IPOS

Scale AI confirms “major” investments from Meta, CEO Alexandre Wang says he’s gone

Scale AI confirms “major” investments from Meta, says CEO Alexanr Wang is leaving

New details appear on the scale of Meta’s $14.3 billion contract

Founder Experience at TechCrunch All Stage: Building for those who build the following

Boston Dynamics Robots dance to “Don’t Stop Me Now” for “American Got Talent” audition

Zevo’s EV-only Car Share Fleet helps Tesla owners make money

Cancelling the Joy Reed Show is “mistakes”

Black melodrama has a possibility

The “Facts of Life” star died in 83

Cara Sophia Gascon joins Oscar despite social media controversy

Our Picks

New details appear on the scale of Meta’s $14.3 billion contract

Founder Experience at TechCrunch All Stage: Building for those who build the following

Boston Dynamics Robots dance to “Don’t Stop Me Now” for “American Got Talent” audition

Most Popular

TikTok announces it will go dark on Sunday without ‘definitive’ guarantees

President Trump mints $31 billion in new official $TRUMP crypto meme coin

El Salvador’s secret weapon? Stacey Herbert talks about the company’s extensive Bitcoin education program

Subscribe to Updates

What's Hot

Deepseek may have trained the latest models using Google’s Gemini

Related Posts