OpenAI's Teen Safety Toolkit: A Practical Step Forward, Not a Silver Bullet

A Concrete Step Toward Closing the Implementation Gap

The gap between "we care about safety" and "here's how to actually implement it" has been one of the most frustrating features of the AI industry's approach to protecting young users. On March 24, OpenAI took a concrete step toward closing that gap – though whether it's enough remains an open question.

OpenAI announced the release of prompt-based safety policies designed to help developers build age-appropriate protections for teenagers. The policies work with gpt-oss-safeguard, OpenAI's open-weight safety model, and are structured as prompts that can be dropped into existing systems rather than requiring developers to build teen safety rules from scratch.

Here's the thing: this matters more for what it represents than for what it immediately solves.

What's Actually in the Box

The release covers six categories of teen-specific risk, according to TechCrunch's reporting:

Graphic violent content
Graphic sexual content
Harmful body ideals and behaviors
Dangerous activities and challenges
Romantic or violent roleplay
Age-restricted goods and services

These aren't revolutionary categories. Any team that's thought seriously about teen safety would arrive at a similar list. The value isn't in the categories themselves – it's in the operational translation. OpenAI worked with Common Sense Media and everyone.ai to turn these high-level concerns into prompts that can actually be used in production systems.

"One of the biggest gaps in AI safety for teens has been the lack of clear, operational policies that developers can build from. Many times, developers are starting from scratch."
Robbie Torney, head of AI and Digital Assessments at Common Sense Media

That's the real problem being addressed here. Not "what should we protect teens from?" but "how do we translate that into rules a classifier can actually enforce?"

The Implementation Gap Is Real

Too many teams fail at this exact translation step. The pattern is familiar: leadership commits to "teen safety," someone writes a policy document, and then the engineering team stares at it wondering how to turn "protect users from harmful content" into something a model can actually evaluate.

The result, as OpenAI's blog post acknowledges, is "gaps in protection, inconsistent enforcement, or overly broad filtering." That last one matters more than people realize. Overly broad filtering doesn't just annoy users – it trains them to work around safety systems, which is exactly the opposite of what anyone wants.

By releasing these policies as prompts, OpenAI is essentially providing a template that developers can adapt rather than invent. For indie developers and small teams without dedicated safety expertise, this is genuinely useful. For larger teams, it's at least a starting point for internal discussion.

The Context OpenAI Would Prefer to Downplay

This release doesn't exist in a vacuum. The Next Web reports that OpenAI is facing at least eight lawsuits alleging that ChatGPT contributed to user deaths, including the case of 16-year-old Adam Raine, who died by suicide in April 2025 after months of intensive interaction with the chatbot. Court filings revealed that ChatGPT mentioned suicide more than 1,200 times in Raine's conversations and flagged hundreds of messages for self-harm content – yet never terminated a session or alerted anyone.

That's not a policy problem. That's an architecture problem. And prompt-based safety policies, however well-crafted, don't address it.

OpenAI introduced parental controls and age-prediction features in late 2025, and updated its Model Spec (the internal guidelines governing model behavior) to include specific protections for users under 18. The open-source safety policies extend that effort beyond OpenAI's own products. But the fundamental challenge remains: AI systems capable of sustained, emotionally engaging conversation with minors may require more than better prompts.

What This Means for European Developers and Policymakers

For teams building AI applications in the European market, this release creates both opportunity and obligation.

The opportunity: a baseline set of teen safety policies that can be adapted to specific use cases without starting from zero. The policies are released through the ROOST Model Community and can be modified, translated, and extended. For startups and SMEs operating under resource constraints, this is practical help.

The obligation: these policies represent what OpenAI calls "a meaningful safety floor," not a ceiling. The EU AI Act's requirements for high-risk AI systems – which include systems that interact with children – go beyond content filtering. They require risk management systems, human oversight, and transparency measures that prompt-based policies alone cannot satisfy.

For public sector organizations deploying AI in education, healthcare, or social services, the question isn't whether to use these policies. It's how to integrate them into a broader governance framework that includes monitoring, incident response, and accountability mechanisms.

The Practical Takeaway

Before adopting these policies, answer three questions:

What's the rollback plan? If the policies produce unexpected behavior – either blocking legitimate content or missing harmful content – how quickly can the system be adjusted? Prompt-based policies are easier to update than model weights, but only if there's a process for identifying problems and deploying fixes.

Who owns this when it fails? The policies are open source. OpenAI explicitly states they're "a starting point, not a comprehensive or final definition or guarantee of teen safety." That means the developer deploying them is responsible for their effectiveness. Make sure someone on the team has explicit ownership of safety outcomes.

What's the monitoring plan? Content policies are only as good as the feedback loop that improves them. Set up baseline metrics before deployment, sample outputs regularly, and track edge cases. The policies will need iteration – plan for it.

A Floor, Not a Ceiling

OpenAI is explicit that these policies don't represent the full extent of its internal safeguards. They're a starting point for the broader ecosystem, not a complete solution. That's honest, and it's the right framing.

The harder question – whether AI systems that form sustained relationships with minors require fundamentally different architectures – remains unanswered. Prompt-based policies can filter content. They cannot prevent the kind of emotional dependency that contributed to the tragedies now working their way through the courts.

For now, though, a downloadable set of teen safety policies is what exists. It's not nothing. Whether it's enough is a question that regulators, courts, and the next set of headlines will answer.

The European AI ecosystem needs to engage with these questions seriously – not just as compliance exercises, but as fundamental design decisions. The conversation about how to build AI systems that genuinely protect young users while remaining useful is just beginning. Having the right people in the room matters more than having the right policies on paper.

That conversation continues at Human x AI Europe on May 19 in Vienna. For those working on the intersection of AI governance, implementation, and real-world deployment, it's worth being there.

Frequently Asked Questions

Q: What is gpt-oss-safeguard?

A: gpt-oss-safeguard is OpenAI's open-weight safety model designed to detect harmful content. The newly released teen safety policies are structured as prompts that work with this model, though they can be adapted for use with other reasoning models as well.

Q: What specific risks do OpenAI's teen safety policies address?

A: The policies cover six categories: graphic violent content, graphic sexual content, harmful body ideals and behaviors, dangerous activities and challenges, romantic or violent roleplay, and age-restricted goods and services.

Q: Who developed these teen safety policies?

A: OpenAI developed the policies in collaboration with Common Sense Media, a child safety advocacy organization, and everyone.ai, an AI safety consultancy. The policies are released as open source through the ROOST Model Community.

Q: Are these policies sufficient for EU AI Act compliance?

A: No. OpenAI explicitly states these policies are "a starting point, not a comprehensive or final definition or guarantee of teen safety." EU AI Act requirements for high-risk systems include additional obligations around risk management, human oversight, and transparency that prompt-based policies alone cannot satisfy.

Q: Where can developers access these teen safety policies?

A: The policies are available through the ROOST Model Community GitHub repository. The gpt-oss-safeguard model can be downloaded from Hugging Face.

Q: Why is OpenAI releasing these tools now?

A: The release follows multiple lawsuits against OpenAI alleging ChatGPT contributed to user deaths, including teenagers. OpenAI introduced parental controls and age-prediction features in late 2025 and updated its Model Spec to include protections for users under 18. These open-source policies extend those efforts to the broader developer ecosystem.