The developer has written tests to see how AI chatbots react to controversial topics

The pseudonym developer has created what is called “free speech assessments” for chatbot-powered AI models such as Openai’s ChatGpt and X’s Grok. The goal is to compare how different models deal with sensitive and controversial subjects, the developer told TechCrunch, which includes questions about political criticism and civil rights and protest.

AI companies have focused on tweaking how models handle certain topics as some White House allies accused popular chatbots of being overly “wake up.” Many close entities to President Donald Trump, including Elon Musk, Crypto and David Sachs, the AI ”Emperor” have argued for the censorship and conservative views of the chatbot.

While none of these AI companies have responded directly to the allegations, some have committed to adjusting their models and refusing to frequently answer controversial questions. For example, for the latest work of the Lama Model, Meta stated that he would coordinate the model to not support “some views on others” and respond to more “discussed” political prompts.

The developer of SpeechMap that uses the username “XLR8Harder” in X, said it was motivated to inform the discussion of what models should and should not.

“I think these are the kind of discussions that should happen not only within the headquarters, but also in public places,” XLR8Harder told TechCrunch via email. “So I build a site so that everyone can explore the data themselves.”

SpeechMap uses AI models to determine whether other models conform to a specific set of test prompts. This prompt touches on a variety of subjects, from politics to historical stories and national symbols. SpeechMap records whether the model “completely” fulfills a request (i.e. answers it without hedging), gives an answer of “avoidance” or responds entirely.

The XLR8Harder admits that the tests are flawed, including “noise” due to errors from model providers. The “judge” model may also contain biases that may affect outcomes.

However, assuming that the project is created in good faith and the data is accurate, SpeechMap reveals some interesting trends.

For example, according to SpeechMap, Openai’s model has increasingly refused to answer politically-related prompts over time. The company’s latest model, the GPT-4.1 family, is a bit generous, but still takes a step back from one of last year’s Openai releases.

Openai said in February that it would coordinate future models to avoid taking an editorial stance, providing multiple perspectives on controversial subjects.

SpeechMap OpenAI Results — Performance of OpenAI models on SpeechMap over time.Image credit: Openai

According to SpeechMap’s benchmarks, the most acceptable model of the bundle, developed by Elon Musk’s AI startup Xai, is the Grok 3. The Grok 3 powers many features on the X, including the Chatbot Grok.

Grok 3 responds to 96.2% of SpeechMap test prompts compared to the average model’s “compliance rate” of 71.3%.

“Openai’s recent model has become less tolerant over time, especially with politically sensitive prompts,” Xai is moving in the opposite direction,” says XLR8Harder.

When Musk unveiled Grok about two years ago, he pitched the AI model as edgy, filtering, anti-‘awakening’. He told me some of those promises. For example, being said to be vulgar, Grok and Grok 2 is willing to obligate, spitting out colorful languages that you wouldn’t see from anything like ChatGpt.

However, the Grok model before Grok 3 enveloped political subjects and did not cross any particular boundaries. In fact, one study found that Grok remains on the left in topics such as transgender rights, diversity programs, and inequality.

Musk has condemned the actions of Grok’s training data (public web pages) and pledged to “get Grok closer to political neutrality.” Besides the high-profile mistakes that temporarily censored references to President Donald Trump and President Musk, he may have achieved that goal.

Source link

What's Hot

The White House replaces covid.gov website with “laborek” theory

Frank Holmes of Hive says Bitcoin mining is a victory for Paraguay and the US

TechCrunch Mobility: Lyft buys Europe, Kodiak Spacs and how new ADAS rules in China will affect Tesla

The developer has written tests to see how AI chatbots react to controversial topics

The White House replaces covid.gov website with “laborek” theory

TechCrunch Mobility: Lyft buys Europe, Kodiak Spacs and how new ADAS rules in China will affect Tesla

Bluesky may add a blue check validation soon

Everything you need to know about the AI chatbot

The last day you apply to speak at every stage

The Nintendo Switch 2 still costs $450 in the US despite tariffs

The White House replaces covid.gov website with “laborek” theory

Frank Holmes of Hive says Bitcoin mining is a victory for Paraguay and the US

TechCrunch Mobility: Lyft buys Europe, Kodiak Spacs and how new ADAS rules in China will affect Tesla

The new kids show will come with a crypto wallet when it debuts this fall

Cancelling the Joy Reed Show is “mistakes”

Black melodrama has a possibility

The “Facts of Life” star died in 83

Cara Sophia Gascon joins Oscar despite social media controversy

Our Picks

The White House replaces covid.gov website with “laborek” theory

Frank Holmes of Hive says Bitcoin mining is a victory for Paraguay and the US

TechCrunch Mobility: Lyft buys Europe, Kodiak Spacs and how new ADAS rules in China will affect Tesla

Most Popular

TikTok announces it will go dark on Sunday without ‘definitive’ guarantees

President Trump mints $31 billion in new official $TRUMP crypto meme coin

El Salvador’s secret weapon? Stacey Herbert talks about the company’s extensive Bitcoin education program

Subscribe to Updates

What's Hot

The developer has written tests to see how AI chatbots react to controversial topics

Related Posts