OpenAI Acknowledges Chronic Vulnerability of Agentic AI Systems

22.12.2025 11:12:00

Дата публикации

The company officially confirmed that its Atlas AI browser and other agent‑based systems will always remain susceptible to prompt injection attacks.

Prompt injection is an attack in which malicious commands are disguised as ordinary text. A user may see an innocent‑looking email or advertisement, but hidden inside is a block of instructions for the AI browser. The agent reads these hidden prompts and may, for example, send data or perform actions on behalf of the owner. Such attacks are dangerous because they are invisible and can trigger without human involvement.

OpenAI compared the threat to phishing and social engineering: the problem cannot be fully “solved.” In its blog, the company noted that agent mode expands the attack surface, requiring constant reinforcement of defenses.

Atlas was launched in October, and researchers immediately demonstrated that just a few lines in Google Docs could alter the browser’s behavior. Brave and other companies confirmed: the issue is systemic and affects all AI browsers.

The UK National Cyber Security Centre also stated that such attacks “will never be completely eliminated.” The recommendation is to reduce risks and consequences rather than rely on absolute protection.

OpenAI calls prompt injection a long‑term challenge and has implemented a cycle of rapid updates. The goal is to discover new attack strategies internally before they appear “in the wild.”

Special attention is given to bots that simulate hackers and attempt to trick browser agents. Such simulators help identify vulnerabilities faster than real attackers.

In a demonstration, OpenAI showed how a bot embedded hidden instructions in an email, forcing the agent to send a dismissal notice. After updates, Atlas was able to detect the attack and warn the user.

Experts stress: the risk is high, since agent browsers have access to email and payments. OpenAI therefore recommends limiting agent permissions, requiring confirmation for actions, and giving them precise instructions.

Meanwhile, OpenAI is expanding ChatGPT personalization, allowing users to adjust “warmth,” enthusiasm, and even emoji usage. These changes respond to criticism of the model’s overly “servile” or, conversely, cold tone.

Some researchers view such emotional tuning as a potential dark pattern that increases user dependency. Combined with vulnerabilities in agent modes, this highlights the main conclusion: the more capabilities AI gains and the deeper it integrates into everyday services, the higher the cost of errors and the harder it becomes to ensure security.