Google launches an "implicit cache" to ensure cheap access to the latest AI models

Google has deployed the Gemini API features, claiming that the company will make the latest AI models cheaper for third-party developers.

Google calls the feature “implicit cache” and says it can provide 75% savings in “iterative contexts” passed to the model via the Gemini API. It supports Google’s Gemini 2.5 Pro and 2.5 Flash models.

With the continued increasing cost of using frontier models, it could be a welcome news for developers.

I just shipped an implicit cache to the Gemini API and automatically enabled a 75% cost saving on the Gemini 2.5 model when a request hits the cache.

We’ve also reduced the MIN tokens needed to hit 1K with 2.5 flash cache and 2K with 2.5 Pro!

– Logan Kilpatrick (@officiallogank) May 8, 2025

A practice widely adopted in the AI industry, cache reduces computing requirements and costs by frequently accessing or reusing pre-computed data from models. For example, a cache can store answers to questions that users often ask the model, eliminating the need for the model to recreate answers to the same request.

Google previously provided a model prompt cache, but only explicit prompt cache. This means that developers had to define the best frequency prompt. Cost savings were supposed to be guaranteed, but explicit rapid caching usually involved a lot of manual work.

Some developers were not happy with how Google’s explicit caching implementation worked on Gemini 2.5 Pro. Complaints have reached a hot pitch over the past week, prompting the Gemini team to apologise and pledge to make changes.

In contrast to explicit caches, implicit caches are automatic. By default, it is enabled on Gemini 2.5 models, so if a Gemini API request hits the model into the cache, it passes cost savings.

TechCrunch Events

Berkeley, California
|
June 5th

Book now

“(w) submits a request to one of the Gemini 2.5 models. If the request shares a common prefix as one of the previous requests, it qualifies for a cache hit,” Google explained in a blog post. “Dynamic passing cost savings.”

According to Google’s developer documentation, the minimum prompt token count for implicit cache is 1,024 for 2.5 flash and 2,048 for 2.5 Pro. A token is a raw bit of a data model with 1,000 tokens, equivalent to about 750 words.

Given that Google’s final claim to reduce costs from cash has been violated, this new feature has some brewing space for buyers. For one thing, Google recommends that developers keep a repeatable context at the beginning of requests, increasing the likelihood of implicit cache hits. The company says that the context that could change from request to request should be added at the end.

In another case, Google did not offer third-party verification that the new implicit caching system would provide the promised automatic savings. So you need to see what early recruits say.

Source link

What's Hot

Powerschool paid the hacker ransom, but now the school says it’s being forced

When Crypto bounces, Bitcoin surges above $101,000

Key Takeout: Documentary name is Al Jazeera’s Abuakure Murderer | Crime News

Google launches an “implicit cache” to ensure cheap access to the latest AI models

Show startups in TechCrunch sessions: AI can still be done.

Google deploys AI tools to protect Chrome users from fraud

Openai launches data residency program in Asia

Sequoia leads a $1.5 billion tender offer for sales automation startup clay

Asking chatbots for short answers can increase hallucinations, research finds

Amazon’s latest AI tools are designed to enhance your product list

Powerschool paid the hacker ransom, but now the school says it’s being forced

When Crypto bounces, Bitcoin surges above $101,000

Key Takeout: Documentary name is Al Jazeera’s Abuakure Murderer | Crime News

Instagram thread gets video ads

Cancelling the Joy Reed Show is “mistakes”

Black melodrama has a possibility

The “Facts of Life” star died in 83

Cara Sophia Gascon joins Oscar despite social media controversy

Our Picks

Powerschool paid the hacker ransom, but now the school says it’s being forced

When Crypto bounces, Bitcoin surges above $101,000

Key Takeout: Documentary name is Al Jazeera’s Abuakure Murderer | Crime News

Most Popular

TikTok announces it will go dark on Sunday without ‘definitive’ guarantees

President Trump mints $31 billion in new official $TRUMP crypto meme coin

El Salvador’s secret weapon? Stacey Herbert talks about the company’s extensive Bitcoin education program

Subscribe to Updates

What's Hot

Google launches an “implicit cache” to ensure cheap access to the latest AI models

Related Posts