Openai launches a program to design new "domain-specific" AI benchmarks

Openai believes its AI benchmark is broken. The company is currently starting a program to modify the grading method of AI models.

The new Openai Pioneers program will focus on creating an evaluation of AI models that “set based on what good people look like,” as Openai said in a blog post.

“As the pace of AI adoption accelerates across the industry, we need to understand and improve its impact around the world,” the company continued in its post. “Creating domain-specific ebells is one way to better reflect real-world use cases and help teams assess the performance of their models in a practical, high-stakes environment.”

As the recent controversy between crowdsourced benchmark LM Arena and the Maverick model of Meta shows, it is difficult these days to know exactly what distinguishes one model from another. Many widely used AI benchmarks measure performance of esoteric tasks, such as solving doctoral level mathematics problems. Others may not match well whether they let them play games or fit most people’s preferences.

Through the Pioneers program, Openai wants to create benchmarks for specific domains such as Legal, Finance, Insurance, Healthcare, and Accounting. The lab says it will work with “multiple companies” to design tailored benchmarks in the coming months, and will eventually publish these benchmarks along with “industry-specific” ratings.

“The initial cohort will focus on startups that will help lay the foundations for the Openai Pioneers program,” Openai wrote in a blog post. “We’re selecting a small number of startups in this first cohort, each working on high-value applied use cases where AI can drive real impact.”

The program’s companies will also have the opportunity to work with Openai’s team to create improvements to the model through Renuforation Fine Tuning. This is a technique that optimizes models for narrow task sets.

The big question is whether the AI community will accept the benchmarks created by Openai. Openai previously supported the benchmark effort financially and designed its own assessment. However, partnering with customers to release AI testing can be considered an ethical bridge.

Source link

What's Hot

TechStars will increase startup funding to $220,000, reflecting the YC structure

The new kids show will come with a crypto wallet when it debuts this fall

A comprehensive list of 2025 tech layoffs

Openai launches a program to design new “domain-specific” AI benchmarks

Openai’s new inference AI model shows even more hallucinations

ChatGpt refers to users by undeclared names, and some people find them “creepy”

ChatGPT now uses “memory” to personalize web searches

Is the Spack back? | TechCrunch

Openai is reportedly in talks to buy Windsurf for $3 billion, with news forecasts expected later this week

Openai pursued cursor maker before giving a lecture to buy Windsurf for $3 billion

TechStars will increase startup funding to $220,000, reflecting the YC structure

The new kids show will come with a crypto wallet when it debuts this fall

A comprehensive list of 2025 tech layoffs

Trump’s tariffs could ultimately turn Bitcoin into an inflation hedge, Messalianaists say

Cancelling the Joy Reed Show is “mistakes”

Black melodrama has a possibility

The “Facts of Life” star died in 83

Cara Sophia Gascon joins Oscar despite social media controversy

Our Picks

TechStars will increase startup funding to $220,000, reflecting the YC structure

The new kids show will come with a crypto wallet when it debuts this fall

A comprehensive list of 2025 tech layoffs

Most Popular

TikTok announces it will go dark on Sunday without ‘definitive’ guarantees

President Trump mints $31 billion in new official $TRUMP crypto meme coin

El Salvador’s secret weapon? Stacey Herbert talks about the company’s extensive Bitcoin education program

Subscribe to Updates

What's Hot

Openai launches a program to design new “domain-specific” AI benchmarks

Related Posts