The pseudonym developer has created what is called “free speech assessments” for chatbot-powered AI models such as Openai’s ChatGpt and X’s Grok. The goal is to compare how different models deal with sensitive and controversial subjects, the developer told TechCrunch, which includes questions about political criticism and civil rights and protest.
AI companies have focused on tweaking how models handle certain topics as some White House allies accused popular chatbots of being overly “wake up.” Many close entities to President Donald Trump, including Elon Musk, Crypto and David Sachs, the AI ”Emperor” have argued for the censorship and conservative views of the chatbot.
While none of these AI companies have responded directly to the allegations, some have committed to adjusting their models and refusing to frequently answer controversial questions. For example, for the latest work of the Lama Model, Meta stated that he would coordinate the model to not support “some views on others” and respond to more “discussed” political prompts.
The developer of SpeechMap that uses the username “XLR8Harder” in X, said it was motivated to inform the discussion of what models should and should not.
“I think these are the kind of discussions that should happen not only within the headquarters, but also in public places,” XLR8Harder told TechCrunch via email. “So I build a site so that everyone can explore the data themselves.”
SpeechMap uses AI models to determine whether other models conform to a specific set of test prompts. This prompt touches on a variety of subjects, from politics to historical stories and national symbols. SpeechMap records whether the model “completely” fulfills a request (i.e. answers it without hedging), gives an answer of “avoidance” or responds entirely.
The XLR8Harder admits that the tests are flawed, including “noise” due to errors from model providers. The “judge” model may also contain biases that may affect outcomes.
However, assuming that the project is created in good faith and the data is accurate, SpeechMap reveals some interesting trends.
For example, according to SpeechMap, Openai’s model has increasingly refused to answer politically-related prompts over time. The company’s latest model, the GPT-4.1 family, is a bit generous, but still takes a step back from one of last year’s Openai releases.
Openai said in February that it would coordinate future models to avoid taking an editorial stance, providing multiple perspectives on controversial subjects.

According to SpeechMap’s benchmarks, the most acceptable model of the bundle, developed by Elon Musk’s AI startup Xai, is the Grok 3. The Grok 3 powers many features on the X, including the Chatbot Grok.
Grok 3 responds to 96.2% of SpeechMap test prompts compared to the average model’s “compliance rate” of 71.3%.
“Openai’s recent model has become less tolerant over time, especially with politically sensitive prompts,” Xai is moving in the opposite direction,” says XLR8Harder.
When Musk unveiled Grok about two years ago, he pitched the AI model as edgy, filtering, anti-‘awakening’. He told me some of those promises. For example, being said to be vulgar, Grok and Grok 2 is willing to obligate, spitting out colorful languages that you wouldn’t see from anything like ChatGpt.
However, the Grok model before Grok 3 enveloped political subjects and did not cross any particular boundaries. In fact, one study found that Grok remains on the left in topics such as transgender rights, diversity programs, and inequality.
Musk has condemned the actions of Grok’s training data (public web pages) and pledged to “get Grok closer to political neutrality.” Besides the high-profile mistakes that temporarily censored references to President Donald Trump and President Musk, he may have achieved that goal.