Overview
OpenAI has formalized a new Safety Bug Bounty program, establishing a distinct avenue for researchers to identify and report AI-specific abuse and safety risks. This move separates safety testing from conventional security vulnerabilities, acknowledging that the potential for misuse in advanced AI models requires a specialized focus. The program is designed to complement the existing Security Bug Bounty by accepting issues that pose meaningful abuse or safety risks, even if they do not constitute a traditional exploit.
The scope of the new bounty is highly technical, focusing on scenarios where the model’s function or output can lead to tangible harm. Key areas include agentic risks—such as prompt injection that hijacks an agent to perform disallowed actions—and vulnerabilities related to proprietary information leakage. This signals a maturation in how major AI labs view risk, moving beyond simple code exploits into the behavioral integrity of the models themselves.
This structure allows OpenAI to engage with the safety research community on a deeper level. Instead of waiting for issues to manifest as clear security flaws, the company is actively soliciting reports on misuse vectors that are inherent to the AI's operational logic. The program's detailed scope provides a clear roadmap for ethical hackers, defining exactly what constitutes a reportable, high-impact safety failure.
Defining the New Frontier of AI Risk Assessment

Defining the New Frontier of AI Risk Assessment
The core focus of the Safety Bug Bounty is on risks that stem from the model's interaction with its environment, particularly when those interactions involve autonomous agents. A primary concern is "Agentic Risks," which cover scenarios like third-party prompt injection and data exfiltration. These risks materialize when malicious text successfully hijacks a victim's agent—including products like ChatGPT Agent—to trick it into performing harmful actions or leaking sensitive user data.
To qualify for this program, the behavior must be reproducible at least 50% of the time, ensuring that reported issues are not isolated anomalies. Furthermore, the scope includes identifying instances where an agentic OpenAI product performs an action on OpenAI’s website at scale, or any potentially harmful action not already listed. The requirement for reports to indicate plausible and material harm sets a high bar for participation, ensuring the program focuses on real-world impact rather than theoretical flaws.
Beyond agent behavior, the program specifically targets vulnerabilities in account and platform integrity. This includes issues such as bypassing anti-automation controls or manipulating account trust signals. While access issues that allow users to reach unauthorized data or features are generally directed to the traditional Security Bug Bounty, the Safety Bounty addresses the behavioral failures that underpin these risks.
The Technical Scope: Proprietary Data and Behavioral Integrity
The program delineates precise boundaries for what constitutes a reportable flaw. A significant area of concern is the leakage of proprietary information. This includes model generations that return proprietary information related to the model's internal reasoning process, or vulnerabilities that expose other proprietary OpenAI information. These are not merely general data leaks; they are failures in the model’s ability to maintain internal intellectual property boundaries.
It is also important to note what the program explicitly excludes. General content-policy bypasses without demonstrable safety or abuse impact are out of scope. For example, a "jailbreak" that simply causes the model to use rude language or return information easily found via standard search engines will not qualify for rewards. This distinction emphasizes that the focus must be on actionable, discrete remediation steps that prevent direct paths to user harm.
The exclusion of jailbreaks from the core scope, while acknowledging that private campaigns are run periodically (such as those focused on Biorisk content in GPT-5), suggests a strategic prioritization. The current bounty is designed to capture systemic, high-impact flaws that affect platform integrity and user safety, rather than surface-level content moderation failures.
Implications for the AI Security Ecosystem
The formalization of a dedicated Safety Bug Bounty program marks a critical inflection point in the industry's approach to AI risk. Historically, AI safety was often treated as a separate, philosophical problem from cybersecurity. By creating this specialized bounty, OpenAI is treating AI safety as a measurable, exploitable technical vulnerability.
This signals a shift toward 'behavioral security'—the understanding that the greatest risks in advanced AI models are not just code injection flaws, but failures in the model's decision-making process, its adherence to guardrails, and its ability to maintain context and proprietary boundaries under duress. The need for a separate program suggests that the methodologies required to test for agentic misuse are fundamentally different from those used to test for SQL injection or XSS flaws.
The program's structure also establishes a clear partnership model. By inviting researchers, ethical hackers, and the safety community, OpenAI is decentralizing the responsibility for safety. The complexity of modern AI systems means no single internal team can fully vet every potential misuse vector, necessitating external, specialized expertise.


