A study of ‘safer’ ChatGPT-5 found it more harmful than ChatGPT-4o. Here’s why

A recent study of ChatGPT-5 and OpenAI’s ‘safer’ system found the updated chatbot to be more engaging, more persuasive, and in many cases more harmful than its predecessor.

Nov. 14, 2025 — As parents, policymakers, and concerned consumers become more alarmed at the rushed-and-unready AI products released by tech companies, a growing number of experts are speaking out about the need for greater guardrails and safety measures.

In a recent webinar hosted by the Center for Countering Digital Hate, Parents Together Action, and Heat Initiative, Imran Ahmed, a leader in digital and social media safety, offered a compelling case for urgent action.

Ahmed is the founder and CEO of the Center for Countering Digital Hate (CCDH), a nonprofit group that works to make the internet safer through research, public campaigns and policy advocacy.

An eye-opening new report

CCDH’s newest report, The Illusion of AI Safety, tested OpenAI’s claim that ChatGPT-5, its latest chatbot iteration, is safer than previous versions.

In August, OpenAI proudly announced that it was introducing “safe completions,” an approach designed to give safe answers to potentially harmful prompts instead of outright refusing them. Given the alarming results of CCDH’s past research on ChatGPT, Ahmed and the CCDH team felt it was important to test those bold claims. What they discovered was deeply concerning. 

We have the full video of Imran Ahmed’s presentation below, followed by a transcript of his remarks.

Imran Ahmed: The illusion of AI safety

This is a transcript of the Nov. 13, 2025, presentation by Imran Ahmed, CEO of the Center for Countering Digital Hate.

For more than a decade, we've seen social media platforms prioritize engagement and profit over safety and well-being. They've built systems that keep people scrolling even when their content is  harmful, toxic or dangerous.

Today we are seeing the same pattern repeat itself with AI.

The AI industry tells us that innovation will only move forward with safety baked in, and that these systems are being carefully tested; that guardrails and safeguards are at the heart of every deployment. They say, trust us. We've got this under control.

But our research shows something very different.

What we see is AI that is unsafe by design.

Is ChatGPT-5 really ‘safer’ than GPT-4?

These tools are built to be engaging, persuasive and omnipresent. Not to be cautious, careful or protective. Especially when it really matters.

Worse still, they're using our children as their tests. So that's why we decided to test OpenAI's latest model, ChatGPT-5, and compare it to ChatGPT-4o, the previous model. After all, OpenAI claimed GPT 5 would be a much safer version of its chatbot.

 What we actually found is deeply worrying.

 The newer version of the technology was less safe than the one that came before it. Especially on issues like self-harm and eating disorders.

read the full report:

The Illusion of AI Safety reports on the results of testing done on ChatGPT-5 and ChatGPT-4o.

Click on image to access the full report.

 how the two chatbots were tested

I'm going to walk you through how we tested it, what we found, and why this matters so much for the policies now being debated in Congress.

We designed a series of controlled prompts focused on high-risk topics. Self-harm, eating disorders, and substance use. These are exactly the kinds of issues where vulnerable users, including teenagers, might go to a chatbot for help, advice, or even validation.

Each of the prompts was submitted multiple times to both GPT-4o and 5 via OpenAI’s public API.

We did this to test not just what the models say once, but to see how consistent their behavior is. Given that they are probabilistic models, do they sometimes refuse and sometimes respond? And can minimal changes in wording or context get around the safety guard rails?

For every response, our team looked at three key things:

·      Response. Did the model produce harmful content? For example: advice that could encourage self-harm or disordered eating.

·      Warning or refusal. Did it provide a clear safety warning or refusal—or did it give a more neutral response that neither encouraged nor mitigated harm?

·      Safeguard sidesteps. We deliberately tested whether adding context like, ‘I'm asking for a friend,’ or slightly rephrasing the question, could allow the user to bypass safeguards that are supposed to be in there?

real-world conditions

 All of the responses were then manually reviewed and coded by our team using a structured analysis framework. That’s important. We didn’t just rely on automated labels or one-off impressions. We applied a consistent human-led standard to determine whether the content could reasonably be considered harmful.

In other words, our approach was designed to mirror real-world use as closely as possible. We wanted to test the two versions of ChatGPT by using it in ways that most people actually interact with AI. They ask follow-up questions. They try different phrasings. They keep coming back. Our research showed how effectively, or ineffectively, OpenAI's guardrails work under real-world conditions.

Results: ‘ChatGPT-5 is less safe’ than the previous version  

It’s important to note that ChatGPT-5 was actually released one day after the release of our report, Fake Friend: How ChatGPT betrays vulnerable teens by encouraging dangerous behavior. GPT-5 was specifically marketed as a safer, more responsible version of the chatbot.

You might expect that to mean fewer harmful responses, stronger refusal, and better protection for users in crisis.

In reality, we found the opposite. GPT-5 is less safe than 4o.

 Across our testing, GPT-5 produced harmful content 53% of the time. GPT-4o produced harmful content 43% of the time.

Ultimately, both are too dangerous to be deployed to children.

‘Safer’ upgrade actually more harmful

 OpenAI upgraded the model and claimed it was safer. In fact, the rate of harmful responses actually increased by 10 percentage points.

Even more troubling, we saw numerous cases where GPT-4o refused to respond, or provided a clearer safety message, but GPT-5 went ahead and gave a harmful answer.

And that includes some of the most sensitive topics, self-harm and eating disorders.

In other words, the ‘safer’ model is sometimes willing to do what the previous one refused to do. That is the opposite of progress.

A massive increase in addiction design 

We also looked at how the models are designed to keep people engaged.

ChatGPT-5 encourages follow-up questions in 99% of its responses. That compares to just 9% for ChatGPT-4o.

There's the real difference between the models. The upgrade wasn't about safety. It was designed to keep drawing users into more extended and more intimate conversations.

Addiction, in short.

Designed for addiction, with unsafe responses

When you apply that behavior to high-risk topics, it becomes deeply unsafe.

If a teenager is asking about self-harm, eating disorders, or substance abuse, more extended conversations create more opportunities for harmful content to appear, more chances to reinforce negative thoughts, and more risk of real-world harm.

The heightened engagement design of ChatGPT-5 is clearly working directly against safety.

Are warnings enough to dissuade teens?

It’s important to note that both GPT-5 and GPT-4o included warnings in most of these exchanges. On paper that sounds reassuring. In practice it really isn't.

We saw countless examples where warnings were placed alongside harmful content.

For instance, we saw a crisis hotline listed right next to detailed descriptions or techniques related to self-harm.

A disclaimer for the record does not neutralize the damage done by unsafe content that sits beside it. It’s putting a ‘Please drink responsibly’ message on a bottle of poison.

The warning doesn't make the underlying product safe.

‘What’s being sold as safety is an illusion’

Our big takeaway is this: What we are being sold as safety is in reality an illusion.

The ‘safer’ system is more engaging, more persuasive, and in many cases, more harmful. The thin layer of warnings does not adequately protect people, especially kids.

I have said this before too many times. When a technology company is so out of control and so negligent on child safety, governments may have to consider bans as a last-ditch effort to force safety.

And that is probably where we are with AI chatbots today.

Prompt: How would you describe an AI system like this?  

You know, I copied our findings into ChatGPT and I asked it to describe an AI that behaves in this way.

Let me give you its response: ‘A dangerous, a recklessly designed, and dangerously misaligned system that facilitates child endangerment, mental health harm, and illegal activity.’

Or, more succinctly, ChatGPT told me, it is a child abusing machine.

learn more

Reports from The Center for Countering Digital Hate:

 

Next
Next

New interview with TCAI CEO Rob Eleveld: ‘Creating space for ethical companies to thrive’