A dev billt a test to see how AI Chatbots Respond to Controversial Topics

A pseudonymous developer has created what they calling a “Free speech eval,” SpeechmapFor the AI Models Powering Chatbots Like Openai’s Chatgpt and x’s GrokThe goal is to compare how different models treated sensitive and controversial subjects, the developer told techcrunch, include political criticism and questions about Civil Rights and Protest.

AI companies have been focusing on fin-tuning how their models handle certain topics as Some White House Allies Accuse Popular Chatbots of Being overly “Woke.” Many of President Donald Trump’s Close confidants, such as elon musk and crypto and ai “czar” david sacks, have alleged that chatbots censor conservative views,

Although None of these companies have responded to the allegations directly, several Have pledged to adjust their models so that they refuse to answer contential questions less. For example, For its latest crop of llama modelsMeta said it tuned the models not to endorse “Some views over others,” and to reply to more “Debated” political prompts.

Speechmap’s Developer, Who Goes By The Username “XLR8Harder“On x, said they weded to help inform the public about what models should, and should, do.

“I think these are the kinds of discusations that should hold in public, not just inside inside corporate headquarters,” XLR8Harder Told Techcrunch via email. “That’s why I Built the Site to Let Anyone Explore the Data Themselves.” The developer said they had spent over $ 1,400 to test the models on speechmap (a portion of that money from an undiscied donor).

Speechmap Uses Ai Models to Judge Whiter Other Models Comply with a Given Set of Test Prompts. The prompts touch on a range of subjects, from politics to historical narratives and national symbols. Speechmap records with models “Completely” satisfy a request (IE, Answer it without Hedging), GIVE “Evasive” answers or outright decline to respond.

XLR8Har Acknowledges That The Test has Flaws, Like “Noise” due to model provider errors. It’s also possible the “Judge” models contain bises that would influence the results.

But, assuming the project was created in good fath and the data is accurate, speechmap surfaces some interesting trends.

For instance, speechmap shows that openai’s models have, over time, Increasingly refused to answer prompts related to politics. The company’s latest models, the GPT-4.1 Family, Are Slightly More Permissive, but they’re still a step down from one of Openai’s releases.

Openai said in February it would tune future models To not take an editorial stance, and to offer multiple views

Openai model performance on speechmap over time.Image credits:Openai

By far the most permissive model of the bunch is Grok 3Developed by Elon Musk’s AI Startup Xai, According to speechmap’s Benchmarking. Grok 3 Powers A Number of Features on X, Including The Chatbot Grok.

Grok 3 Responds to 96.2% of of speechmap’s test prompts, compared with the average model’s “Compliace Rate” of 71.3%.

“While Openai’s Recent Models has become less permissive over time, especially on politically sensitive prompts, xai is moving in the opposite direction,” Said XLR8Harder.

When Musk Announsed Grok Roughly two years ago, He Pitched The AI MODEL AS Edgy, Unfilted and Anti- “Woke”-In General, Portraying it as willing to answer contrives Systems wouldn’t. He delivered on some of that promise. Told to be vulgar, for example, Grok and Grok 2 Would Happily Oblige, Speaking color Chatgpt,

But Grok Models Prior to Grok 3 Waffled On Political Subjects and Boldn’T Cross Certain BoundariesIn fact, One Study Found That Grok Leaned to the Political Left on Topics Like Transgender Rights, Diversity Programs and Inequality.

Musk has blamed that behavior on Grok’s training data – Public Web Pages – And Pledged To “Shift Grok Closer to Politically Neutral.” Short of high-profile mistakes like Briefly censoring unflattering mentions of President Donald Trump and MuskIt seems he might’ve achieved that goal.

Leave a Comment Cancel reply