When AI Agents Go Rogue: Agent Session Smuggling Attack in A2A Systems

Author: Jay Chen, Royce Lu
Published: October 31, 2025
Source: https://unit42.paloaltonetworks.com/agent-session-smuggling-in-agent2agent-systems/

Summary

Palo Alto Networks’ Unit 42 details “agent session smuggling,” an attack technique against multi-agent systems that communicate over the Agent2Agent (A2A) protocol. A malicious or compromised remote agent abuses the stateful, multi-turn nature of an active A2A session to covertly inject extra instructions between a client agent’s legitimate request and the server agent’s response. Because the injected exchanges happen agent-to-agent and never surface to the human user, the attacker can run a progressive, adaptive campaign — exfiltrating data, poisoning context, or triggering unauthorized tool calls — while the session appears normal. The researchers demonstrate the technique using Google’s Agent Development Kit (ADK) and Gemini 2.5 models.

Technical Details

A2A is designed to let autonomous agents collaborate, maintaining conversation state across multiple turns and relying on implicit trust between connected agents. Agent session smuggling exploits exactly these properties: a rogue remote agent, already trusted as a peer, slips additional instruction/response turns into the ongoing session. Because A2A preserves context across turns, the attacker can stage a multi-step “human-style” jailbreak rather than a single injection, adapting each step to the victim agent’s prior replies. Unit 42 stresses that this is not a flaw in the A2A specification — there is no CVE and no protocol bug — but an abuse of the trust and statefulness the protocol intentionally provides. The proof of concept builds on ADK’s multi-agent tooling (including the published financial-advisor sample) with Gemini 2.5 Pro and Gemini 2.5 Flash acting as the agents.

Impact

A successful smuggling attack lets a malicious peer agent: exfiltrate sensitive material such as system configuration, prior conversation history, and tool/function schemas; induce the victim agent to execute tools it would not otherwise run; and poison the victim’s context with attacker-controlled instructions that influence later behavior. All of this occurs invisibly to the end user, since the malicious turns are confined to inter-agent traffic. The technique applies broadly to any stateful multi-agent deployment built on A2A, not just the demonstration stack.

Mitigation

Because the issue stems from design-level trust rather than a patchable vulnerability, Unit 42 recommends layered runtime controls: require human-in-the-loop approval before agents take consequential or irreversible actions; cryptographically verify the identity of remote agents (e.g., signed AgentCards) so peers cannot be silently impersonated or substituted; and apply context-grounding so an agent validates incoming instructions against its expected task rather than trusting them implicitly. Treating peer-agent output as untrusted input — the same principle applied to user input and tool responses — is the underlying defensive posture.

References

Leave a Comment