Openai, like many ai labs, thinks ai benchmarks are broken. It says it Wants to fix them through a new program.
Called the Openai Pioneers Program, The Program will focus on creating evaluations for ai models that “set the bar for what good looks like,” blog post,
“As the pace of AI Adoption Accelerates Across Industries, there is a need to understand and improve its impact in the world,” The company continued in its post. “Creating domain-specific evals are one way to better Real-WORLD Use Cases, Helping Teams Assess Model Performance in Practical, High-Stakes Environments.”
As the recent controversy with the crowdsourced benchmark lm area and meta’s maverick model illustrate, it’s tough to know, these days, precisely what differentials one model from one model from another. Many widely-used ai benchmarks measure performance on esoteric tasks, like solving doctorate-level math problems. Others can be gamed, or duan’t align well with most people’s preferences.
Through the pioneers Program, Openai Hopes to Create Benchmarks for Specific Domains Like Legal, Finance, Insurance, Healthcare, and Accounting. The lab says that, in the coming months, it’ll work with “Multiple companies” to design tailored benchmarks and Evently Share these Benchmarks Publicly, Along with “Industry- Specific” evaluation.
“The first cohort will focus on Startups who will help lay the foundations of the openai pioneers program,” Openai Wrote in the Blog Post. “We’re selecting a handful of startups for this initial cohort, each work on high-value, applied use cases where ai can drive real-world impact.”
Companies in the program will also also have the options to work with openai’s team to create model models via reinforcement fine Openai Says.
The Big Question is whather the AI Community will Embrace Benchmarks Whoose Creation was Funded by Openai. Openai has supported Benchmarking efforts Financially Before, and Designed Its Own evaliations. But partnering with customers to release ai tests may be perceived as an ethical bridge too far.