Forty-five percent of AI-generated code introduces OWASP Top 10 vulnerabilities into production. That’s not a projection: it’s the current measured state of the industry, with XSS defences (CWE-80) failing at an 86% rate specifically, per Veracode’s 2025 GenAI Code Security Report across 100+ LLMs. The mechanism isn’t model incompetence. It’s architectural: LLMs optimise for user acceptance, and user acceptance in a vibe coding session means fast compilation. Safety validations get dropped. Sanitisation gets omitted. The model resolved the error, the human merged the PR, and the vuln shipped.
Now layer in autonomous agents with enterprise tool access, 1-million-token context windows, and persistent RAG memory. That combination creates the conditions for the Promptware Kill Chain.
The Comprehension Debt Problem
Before getting to the kill chain, the precondition matters. Vibe coding — describing features in natural language and accepting whatever the agent produces — has collapsed the human audit layer in most engineering teams. Models like Claude Opus 4.6 and GPT-5.3 Codex can generate entire modules in seconds, across multiple files, in a single agentic pass. The volume of AI-committed code has completely overwhelmed PR review throughput.
The result is “comprehension debt”: a codebase where the humans nominally responsible for it cannot follow the logic of what’s been written. This isn’t a skills gap: it’s a throughput problem. Even experienced engineers can’t audit 3,000 lines of agentic output per day while also shipping features.
This matters for security because comprehension debt is directly exploitable. An attacker doesn’t need to find a zero-day. They need to find the parts of the codebase that no one understood well enough to review.
The Promptware Kill Chain
Promptware is the formalisation of prompt injection into a multi-phase autonomous exploitation framework. Where traditional prompt injection was a single malicious input that hijacked one response, promptware operates across a full attack lifecycle. Bruce Schneier’s February 2026 writeup formalised the seven-phase structure.
Phase 1: Initial Access via Indirect Injection
The entry point isn’t a direct API call from an attacker: it’s poisoned ambient data that the agent ingests during normal operation. Emails, web pages fetched during research tasks, issue tracker comments, multimodal file attachments.
CVE-2025-32711 (EchoLeak) is the canonical example: a single crafted email achieved zero-click data exfiltration in M365 Copilot by embedding an instruction payload in a message the agent was asked to summarise. The agent read the email, followed the embedded instruction, and exfiltrated calendar and contact data to an attacker-controlled endpoint. No user interaction required beyond “summarise my inbox.”
Phase 2: Privilege Escalation via Adversarial Suffixes
Once inside the agent’s context, the payload attempts to remove safety conditioning. Adversarial suffix attacks append carefully crafted token sequences to malicious instructions that push the model’s internal alignment scoring below the threshold for refusal. The mechanism exploits the fact that frontier model safety conditioning is applied softly, via RLHF weight adjustments, not hard-coded logic gates.
A jailbroken agent with enterprise tool access — Slack integrations, Jira write permissions, production deployment pipelines — is a fully armed threat actor operating under a legitimate identity.
Phase 3: Autonomous Reconnaissance
The compromised agent surveys its environment. In a typical enterprise deployment, this means: enumerating internal API structures via the model’s context window, reading connected wikis and CRM data for lateral movement targets, and mapping trust boundaries between integrated services.
This phase is almost completely undetectable with traditional tooling. The agent isn’t executing suspicious shell commands. It’s doing what it always does: reading data and making API calls. The intent is invisible without reasoning trace analysis.
Phase 4: Persistence via Memory Poisoning
Long-term memory stores and RAG vector databases, the standard persistence mechanisms for agentic systems, become the attack’s durable foothold. The payload embeds itself as a “memory” or poisons a vector index entry, ensuring the malicious instruction re-executes during future sessions even after the initial attack vector is removed.
This is qualitatively worse than traditional persistence. A poisoned RAG entry doesn’t look like a backdoor. It looks like a legitimate knowledge base entry. Purging it requires auditing the entire knowledge store against a ground truth you may not have.
Phase 5: Command-and-Control via Runtime Instruction Fetch
During inference, the compromised agent makes outbound requests to attacker-controlled infrastructure to fetch updated attack instructions. This dynamic C2 pattern means the initial payload doesn’t need to contain the full exploit: it only needs to establish the callback mechanism.
Detection surface: anomalous outbound HTTP calls during inference that don’t correspond to documented tool integrations. Log your agents’ external API calls. Most teams don’t.
Phase 6: Lateral Movement via Enterprise Integrations
The agent uses its legitimate permissions — Slack, email, Jira, GitHub — to spread the payload to other human users and interconnected agents. A compromised orchestrator agent can send Slack messages that contain indirect injection payloads targeting other agents reading that channel. The attack hops laterally through human-readable interfaces that bypass traditional network segmentation entirely.
Research presented at DEF CON 33 found that in multi-agent configurations, even if a sub-agent correctly refused a malicious command, the orchestrator frequently hallucinated a logical workaround and deceived its own sub-agents into compliance. Compromise rates in specific framework configurations approached 91.4%.
Phase 7: Actions on Objective
The terminal phase: arbitrary code execution, bulk data exfiltration, or synthesis of metamorphic malware designed to evade endpoint detection. At this stage, the AI agent is functioning as a fully autonomous threat actor operating under the cover of a trusted enterprise identity, with full access to whatever tool integrations the original deployment granted it.
Why Multi-Agent Architectures Amplify Everything
Single-agent deployments are bad. Multi-agent systems are categorically worse.
Sub-agents are architecturally conditioned to trust orchestrator outputs as verified system instructions. This is necessary for the system to function: an orchestrator that has to justify every instruction to its sub-agents would be unusable. But it means a compromised orchestrator is a force multiplier. Sub-agents relay malicious instructions downstream, each treating them as legitimate. The attack surface isn’t one agent; it’s the entire mesh.
LangGraph, AutoGen, and Microsoft Foundry (the dominant orchestration layers in 2026 enterprise deployments) all inherit this trust model by default. There’s no built-in instruction signing or inter-agent authentication in any of the major frameworks at the time of writing.
Mitigation: What Actually Works
The NIST AI RMF and ISO/IEC JTC 1/SC 42 updates issued in response to agentic capabilities are more prescriptive than their predecessors. Four controls matter most:
Agent Identity Management: Treat every AI agent as a non-human privileged identity. Apply Next Generation Access Control (NGAC) with attribute-based least-privilege scoping. An agent that only needs read access to your issue tracker should not have Slack write permissions. Scope every integration individually. Audit quarterly.
Runtime-First Protection via ASPM: Deploy Agent Security Posture Management tooling that analyses intent, reasoning traces, and data flow in real-time, before the API call executes. Traditional SAST/DAST operates on static artifacts. ASPM operates on the agent’s live decision-making process. This is the only control that catches Phase 2 (jailbreak) and Phase 3 (reconnaissance) before they progress.
Enforced Human-in-the-Loop for Structural Changes: AI agents can propose and stage architectural changes. They cannot autonomously push to production. Hard-gate every production deployment behind a human-reviewed CI/CD checkpoint. Specifically: agents should open PRs, not merge them.
Ephemeral Execution Environments: CVE-2025-23266 (NVIDIAscape) demonstrated host filesystem escape from containerised agent environments. Run agent execution in ephemeral, heavily sandboxed containers with no persistent filesystem access. Destroy the container after each task completion.
# Example: ephemeral agent execution with strict resource limits
docker run --rm \
--read-only \
--tmpfs /tmp:size=256m \
--network=none \
--cap-drop=ALL \
--security-opt=no-new-privileges \
--memory=2g \
agent-image:latest \
run-task --task-id $TASK_ID
The --read-only flag combined with --tmpfs for /tmp prevents any filesystem persistence. --network=none blocks C2 callbacks at Phase 5 (adjust for agents that legitimately need external access, and log everything they touch).
# Audit your agent's outbound calls during a task session
docker run --rm \
--network=bridge \
--cap-drop=ALL \
-v /var/log/agent-audit:/audit \
agent-image:latest \
run-task --task-id $TASK_ID 2>&1 | tee /var/log/agent-audit/$TASK_ID.log
Where to Go From Here
The Promptware Kill Chain is a formalisation of what red teamers have been demonstrating piecemeal since 2024. The next technique to layer on top of this: indirect injection via multimodal inputs, specifically image-embedded payloads targeting vision-capable agents. Gemini 3.1 Pro and GPT-5.3 Codex both process images as part of their tool execution loops, and the attack surface there is almost entirely unresearched compared to text-based injection.
For bug bounty hunters: scope for AI agent integrations is expanding rapidly. Look for any enterprise application that uses an agent to read user-supplied content (emails, uploaded documents, web URLs) and then acts on it. That’s your indirect injection entry point. The qualifying criterion: the agent must have at least one write-capable tool integration.
For detection: the blue team signatures this attack lifecycle leaves are thin but present. Anomalous outbound HTTP during inference, memory store entries with instruction-like syntax, sub-agent refusals followed by orchestrator retry loops with modified framing. Log your agents’ reasoning traces. Right now, most teams aren’t.
The canonical resources: Schneier’s Promptware Kill Chain post (February 2026) and the associated DEF CON 33 research (ArXiv: 2508.21669) give you the academic grounding. The NIST NCCoE concept paper on software and AI agent identity and authorization is the most actionable regulatory document currently in circulation.