Openai believes its AI benchmark is broken. The company is currently starting a program to modify the grading method of AI models.
The new Openai Pioneers program will focus on creating an evaluation of AI models that “set based on what good people look like,” as Openai said in a blog post.
“As the pace of AI adoption accelerates across the industry, we need to understand and improve its impact around the world,” the company continued in its post. “Creating domain-specific ebells is one way to better reflect real-world use cases and help teams assess the performance of their models in a practical, high-stakes environment.”
As the recent controversy between crowdsourced benchmark LM Arena and the Maverick model of Meta shows, it is difficult these days to know exactly what distinguishes one model from another. Many widely used AI benchmarks measure performance of esoteric tasks, such as solving doctoral level mathematics problems. Others may not match well whether they let them play games or fit most people’s preferences.
Through the Pioneers program, Openai wants to create benchmarks for specific domains such as Legal, Finance, Insurance, Healthcare, and Accounting. The lab says it will work with “multiple companies” to design tailored benchmarks in the coming months, and will eventually publish these benchmarks along with “industry-specific” ratings.
“The initial cohort will focus on startups that will help lay the foundations for the Openai Pioneers program,” Openai wrote in a blog post. “We’re selecting a small number of startups in this first cohort, each working on high-value applied use cases where AI can drive real impact.”
The program’s companies will also have the opportunity to work with Openai’s team to create improvements to the model through Renuforation Fine Tuning. This is a technique that optimizes models for narrow task sets.
The big question is whether the AI community will accept the benchmarks created by Openai. Openai previously supported the benchmark effort financially and designed its own assessment. However, partnering with customers to release AI testing can be considered an ethical bridge.