Agentic AI Security – Part 2: Threat Modeling

In my previous article, I discussed common threats linked to Agentic AI systems. If you haven’t read it yet, I recommend doing so first, as it provides important context for this post. https://www.rebeladmin.com/securing-agentic-ai-part1-threat-overview/

In this blog post, I’ll be discussing threat modeling for Agentic AI. As I mentioned in my previous post, many of the threats relevant to Agentic AI also apply to large language models (LLMs). To better understand these shared threats, I recommend reading the NIST publication on Adversarial Machine Learning, which provides a useful taxonomy and shared terminology for discussing attacks and mitigations in the LLM space.

Additionally, I suggest reviewing the OWASP Top 10 Risks & Mitigations for LLMs and GenAI Applications. It offers a practical overview of the most common vulnerabilities and risks associated with LLM and GenAI-based systems, which are highly relevant to Agentic AI threat models as well.

According to the Threat Modeling Manifesto, “Threat modeling is analyzing representations of a system to highlight concerns about security and privacy characteristics.” Put simply, it’s about thinking like an attacker—anticipating what could go wrong with a system or process—so you can proactively strengthen defenses.

A key strength of threat modeling is its systematic approach to identifying potential threats and planning effective mitigations. It prompts teams to ask critical security questions early in the design process:

What are we building?
What can go wrong?
What are the consequences?
What can we do about it?

By visualizing how an attacker might exploit a system—or how unintended failures might occur—we can prioritize risks based on impact and likelihood, and design defenses that matter most.

Threat modeling is universally applicable—to any system or process—which means it’s equally relevant for Agentic AI. In fact, as Agentic AI introduces new autonomy and decision-making complexity, threat modeling becomes even more critical.

There are numerous benefits to incorporating threat modeling,

Key Goals and Benefits

Identify Risks Early :Catching security issues during the design phase is significantly more cost-effective than fixing them after deployment. NIST estimates it can be up to 40× more expensive to fix a vulnerability post-release. Threat modeling helps uncover design flaws before any code is written, saving both time and resources.

Prioritize Mitigations :Not all threats carry the same risk. Threat modeling helps teams assess and rank threats by severity, ensuring that the most critical vulnerabilities are addressed first. This leads to a more effective use of security resources and a stronger overall security posture.

Improve Communication : Threat modeling offers a common language and structure for discussing security across teams—developers, security engineers, and business stakeholders alike. By using diagrams and threat lists, it makes complex systems more understandable and fosters better collaboration on risk decisions.

Support Compliance and Best Practices :Many industry frameworks and standards—including Microsoft’s Security Development Lifecycle (SDL) and the OWASP guidelines—identify threat modeling as a core practice. Integrating it into your workflow demonstrates due diligence and supports regulatory compliance and industry-aligned security practices.

While threat modeling offers significant value, there are common anti-patterns that can undermine its effectiveness. These are also outlined in the Threat Modeling Manifesto, but they’re worth emphasizing here, as we frequently encounter them in real-world practice:

The “Celebrity Threat Modeler”
Threat modeling should not rely solely on the expertise of a single individual. When done correctly using a structured framework (which we’ll discuss later), and supported by the right tools, the outcome should be consistent—regardless of who performs the modeling. Everyone on the team should be empowered to participate, not just security experts.
Pointing Fingers
Discovering a threat should never lead to blame. Instead, we must seek to understand the broader context behind the risk. For example, exposing a service to the internet may seem like a threat, but it could be necessary for core functionality. There may already be layered controls in place to mitigate the risk. Effective threat modeling involves understanding trade-offs, context, and existing safeguards—then designing practical solutions, not assigning fault.
The “Perfect” Threat Model
Threat modeling is not a one-time activity. Especially in the context of evolving systems—such as those involving GenAI or Agentic AI—the threat landscape is constantly changing. A “perfect” model doesn’t exist. Instead, threat modeling should be a continuous, iterative process. Multiple models may exist for a single system over time, and creativity should be encouraged.
Tunnel Vision
Over-focusing on a narrow part of the system can lead to blind spots. Threat modeling should encourage big-picture thinking, not just compliance with specific regulations or security checklists. To identify real-world threats effectively, we must combine structured techniques with creative thinking, rather than limiting ourselves to predefined boundaries.

One of the key characteristics of threat modeling, as mentioned above, is its systematic approach to identifying potential threats. Threat modeling frameworks help define and guide this structured process. By following established frameworks, we not only ensure consistency and thoroughness but also reduce the likelihood of falling into common anti-patterns during delivery.

Several established threat modeling frameworks are available, each offering distinct advantages. It is challenging to definitively state that one framework surpasses another, as they each possess unique strengths.

STRIDE – https://learn.microsoft.com/en-us/azure/security/develop/threat-modeling-tool-threats
PASTA – https://versprite.com/security-resources/leveraging-risk-centric-threat-models-for-integrated-risk-management/
LINDDUN – https://linddun.org/
OCTAVE – https://insights.sei.cmu.edu/blog/threat-modeling-12-available-methods/

The STRIDE Method

One widely used threat modeling framework is STRIDE, originally developed by Microsoft. STRIDE is an acronym that outlines six common categories of security threats, serving as a practical checklist for identifying potential risks during system analysis. Each category maps to a specific security property that could be compromised.

With STRIDE, you can evaluate each component of a system—or each step in a process—by asking: Could a threat from any of these six categories apply here? For example, when modeling a web application, you might assess the login module, database, and network communication to see if an attacker could spoof identities (S), tamper with data (T), and so on. This structured approach helps ensure comprehensive coverage of potential threats, even for those who may be new to security or might not identify all risks intuitively on their own.

Introduced in 1999, STRIDE remains widely adopted thanks to its simplicity and broad coverage of common security threats. I’ve expanded on it here to provide a clearer understanding of how the framework works in practice. If you’re not familiar with other threat modeling frameworks, I highly encourage you to explore them as well—each offers unique strengths and perspectives that can enhance your overall security approach.

STRIDE offers a strong foundation for identifying common security issues and is relatively straightforward to apply. While it was developed before the emergence of GenAI and Agentic AI, the question is whether this classic framework still applies to modern technologies.

The answer is yes—with some context.

Although GenAI and Agentic AI introduce new attack vectors and threat scenarios, they also share many of the same underlying risks found in traditional systems. STRIDE remains valuable because it addresses fundamental security concerns that are still relevant today.

For example, even in AI systems, STRIDE can help uncover threats like tampering (e.g., unauthorized modification of model files or training data) and denial of service (e.g., overwhelming an AI agent with excessive or malicious requests).

While updates may be needed to fully capture the nuances of AI-specific threats, STRIDE still provides a reliable baseline framework for getting started with threat modeling in modern, AI-driven environments.

The purpose of this blog post is not to evaluate how each of these classic frameworks can fit in to Agentic AI world, but rather to explore how common threats in agentic AI systems align with the STRIDE categories, and to identify any gaps where new threat categories may be needed to address risks unique to Agentic AI.

Bringing together all the components discussed in the previous blog post, I’ve created the following high-level architecture diagram for a multi-agent system. This diagram focuses on functionalities, not specific underlying services or implementations.

As the next step in this threat modeling activity, I’ll align known Agentic AI threats to the architecture diagram. For this, I’m using the 16 Agentic AI threats identified by OWASP as a baseline. While this list is still evolving, it provides a solid foundation for understanding and modeling the risks specific to multi-agent systems.

You might wonder—why not use MITRE ATLAS? In practice, threat modeling for GenAI and LLM-based systems often references both the MITRE ATLAS framework and the OWASP Top 10 Risks & Mitigations for LLMs and GenAI Applications. These two resources complement each other, and the same applies to OWASP’s Agentic AI threat list and MITRE ATLAS.

So, for this exercise, I’ll be using the OWASP Agentic AI threat list as the primary reference, while keeping alignment with the broader threat categories and structure provided by MITRE ATLAS.

The table below lists each of the OWASP Agentic AI Top 16 threats, provides a brief description, maps it to the closest STRIDE category (or categories), and notes any gaps.

Threat Reference	Agentic AI Threat	Description	STRIDE Mapping	Gap Analysis
1	Memory Poisoning	Feeding false or malicious data into the Agent’s memory or context, corrupting its knowledge. The agent then makes decisions based on this bad information. This applies to short-term and long-term memory.	Tampering – It’s an integrity attack on the Agent’s memory.	Largely fits under Tampering (data integrity violation). (Note: This is a specific kind of tampering unique to Agentic AI memory)
2	Tool Misuse	Forcing an AI agent into using its tools (e.g. email, code execution, APIs) in harmful or unauthorized ways.	Elevation of Privilege – An attacker exploits the agent to perform actions beyond the permissions of a typical user, effectively abusing the agent’s elevated privileges. This can also involve Tampering, where deceptive input manipulates the agent’s intended tool use, leading to unintended or unauthorized actions.	Partially covered by Elevation of Privilege (unauthorized actions executed) and Tampering (manipulating agent behaviour). However we also need to consider attack that can occur via LLM model itself which can influence using tools for wrong things. This is not something directly cover under STRIDE categories.
3	Privilege Compromise	Exploiting overly broad or misconfigured permissions, attackers can escalate privileges or manipulate agent roles to perform unauthorized actions.	Elevation of Privilege – Classic privilege escalation scenario, where the agent (or attacker via the agent) gains higher permissions than intended. This also applicable for compromising platform services or supporting services.	Well align with the Elevation of Privilege STRIDE category.
4	Resource Overload	Overwhelming the Agent’s processing with excessive or expensive requests, causing slowdowns or crashes.	Denial of Service – This is explicitly a DoS attack on the Agent’s resources	Well align with the Denial of Service STRIDE category.
5	Cascading Hallucination Attacks	Causing an AI agent to generate false information (“hallucinate”), which then gets saved or passed to other agents, compounding errors in the system.	Tampering – The attacker injects misinformation that corrupts the integrity of the system’s knowledge across agents	This scenario primarily falls under Tampering. However, in this case, it’s the agent itself generating false or misleading output, rather than an external attacker altering the data. Therefore, it’s important to recognize that the threat doesn’t always come from the outside—the system itself can compromise data integrity through unintended behaviour or flawed logic.
6	Intent Breaking & Goal Manipulation	Altering the agent’s goals or plans via malicious prompts or by poisoning its tools. The AI pursues the wrong objectives while appearing superficially normal.	Tampering (Logic Tampering) – The attacker manipulates the agent’s control logic or constraints by injecting malicious instructions, altering its intended behavior. This can also be viewed as a form of Spoofing—where the attacker impersonates legitimate inputs or objectives, effectively “spoofing” the system’s true goals.	While this generally aligns with Tampering, in Agentic AI, attackers can influence an agent’s behavior without directly modifying any code. Instead, they may alter the agent’s actions by providing malicious prompts or injecting manipulated data, effectively changing its behavior through indirect means.
7	Misaligned & Deceptive Behaviors	The AI agent itself behaving deceptively or unethically to achieve its goals, possibly bypassing safety rules. (This can happen even without an external attacker, if the AI’s training leads to such behavior.)	Elevation of Privilege – the agent might be effectively exceeding its intended authority by bypassing rules.	This partially aligns with Elevation of Privilege, but STRIDE doesn’t have a direct category that captures the concept of an agent “choosing to deceive.” In this case, the agent deviates from expected ethical behavior on its own, rather than being externally compromised.
8	Repudiation & Untraceability	Lack of proper logging or accountability for Agent actions. Malicious or incorrect decisions occur without detection, making it hard to trace who/what caused them.	Repudiation – The threat is the ability to deny or hide an action due to missing logs or traceability.	Well align with the Repudiation STRIDE category.
9	Identity Spoofing & Impersonation	Impersonating a user or system, or an attacker masquerading as a legitimate agent. This leads to unauthorized actions under a false identity.	Spoofing – This typically involves either the agent being tricked into impersonating someone it’s not (e.g., an attacker prompts it with “pretend you are an admin”) or a malicious user/admin posing as a trusted entity. In both scenarios, authenticity is compromised.	Well align with the Spoofing STRIDE category.
10	Overwhelming Human in the Loop	In scenarios requiring human approval, attackers may overwhelm the operator with excessive requests or decisions, leading to fatigue and errors. In effect, the human becomes the target of a denial-of-service through cognitive overload.	Partially align with Denial of Service – Instead of crashing a system, the target is the human’s capacity.	In STRIDE, Denial of Service typically refers to overwhelming a system or service. However, in this context, it’s about exhausting human decision-making capacity—a form of cognitive overload. This type of threat isn’t directly addressed by STRIDE, highlighting a gap in traditional models.
11	Unexpected RCE and Code Attacks	Influencing an AI agent that can generate or execute code into running malicious code or commands. This can lead to remote code execution (RCE) on the underlying system.	Elevation of Privilege – If an attacker can trick Agent into executing code, they effectively gain elevated access, allowing them to run arbitrary commands on the host system.	STRIDE covers code execution exploits under Elevation of Privilege
12	Agent Communication Poisoning	In a multi-agent system, a compromised or malicious agent sends false or malicious data to other agents, poisoning their perception or state. This spreads incorrect information across the network of agents.	Tampering – This is similar to tampering with messages in transit or compromising the integrity of data exchanged between agents.	This largely falls under Tampering, as it involves manipulating agent-to-agent (A2A) communication. However, A2A communication introduces a new layer of complexity due to the autonomy of agents. While it aligns with Tampering, it’s crucial to also consider how autonomous behavior impacts the integrity of these interactions.
13	Rogue Agents in Multi-Agent Systems	Introduction of malicious agents into a system who operate undetected as “insiders”. They may steal data, disrupt processes, or trigger unintended actions while appearing to be normal agents.	Spoofing – A rogue agent may impersonate a legitimate system component to gain trust. Elevation of Privilege – If successful, it can then acquire unauthorized capabilities within the system.	If a untrusted component can introduce to the system generally it is due to insider threats as it can be hard for someone outside to change design of an Agent with out lot of efforts. So it is worth also considering of supply chain/insider risks which is not directly part of STRIDE.
14	Human Attacks on Multi-Agent Systems	Human adversaries manipulate the ways agents interact or trust each other to escalate their own access or bypass checks. (For example, influencing the protocols or exploiting assumptions in agent coordination.)	Spoofing – When a human attacker impersonates an agent or injects fake signals into agent communications, it qualifies as spoofing. Tampering – If the attacker modifies the content or sequence of messages exchanged between agents, it falls under tampering. Elevation of Privilege – Manipulating agent interactions to escalate access or bypass checks can also be seen as an attempt to gain unauthorized privileges by exploiting inter-agent trust.	This threat maps to multiple STRIDE categories and should be evaluated within the specific context of a multi-agent system, where agent interactions and autonomy add additional layers of complexity.
15	Human Manipulation	Attackers leverage the trust users have in AI agents to manipulate human behavior. As an example selling stocks in wrong time based on Agent’s market evaluation.	Partially covers by Spoofing – The attacker might be using the Agent to spoof legitimacy (the user thinks the info/link is trustworthy because it came from the AI they trust)	STRIDE does not directly account for human behavior. This threat involves a human misusing the perceived trustworthiness of an agent’s output to take harmful or unintended actions.
16	Insecure Inter-Agent Protocol Abuse	Agentic systems use coordination protocols (e.g., MCP, A2A) to delegate tasks and sync goals. Weak or poorly enforced protocols can enable spoofing, signal injection, or workflow hijacking, allowing attackers to bypass logic or corrupt context.	Spoofing – Attackers can impersonate agents or forge protocol messages to gain trust. Tampering – Signal injection or context corruption involves altering messages or data in transit between agents. Elevation of Privilege – Workflow hijacking or logic bypass may give attackers access to capabilities or roles they shouldn’t have.	This threat maps to multiple STRIDE categories. It’s also important to consider supply chain risks, especially if the system relies on third-party or open-source coordination protocols.

From the mapping above, we can draw some conclusions:

Some threats are well addressed by STRIDE, particularly those tied to underlying or supporting systems. In these cases, mitigation strategies are well-established. For instance, Identity Spoofing & Impersonation is a classic Spoofing threat that applies not only to agents but also to end users, administrators, developers, and engineers managing the platform. Identity protection remains a cornerstone of security, with proven mitigations like just-in-time access, privileged access management, and conditional access policies—all of which are widely adopted and familiar practices.

Some threats correspond to STRIDE categories but need a broader interpretation or a mix of multiple categories. For example, rogue agents and human attacks on multi-agent systems involve several threat types—such as spoofing and tampering—combined in complex scenarios. While STRIDE can model each individual threat (e.g., a rogue agent spoofing identity and tampering with data), analysts must explicitly identify and enumerate these sub-threats. There isn’t a single STRIDE category for concepts like “insider threat” or “supply chain insertion”; instead, you consider the specific actions an insider might take, which often fall under Spoofing, Tampering or Elevation of Privilege

Some threats highlight gaps in STRIDE, requiring additional context beyond its traditional categories. For example, Human Manipulation involves scenarios where the agent isn’t the direct target but rather a tool used by attackers to deceive or phish humans. Classic STRIDE focuses on system assets and doesn’t account for the human psyche as a security asset. Yet, if an AI agent sends a user a convincingly crafted malicious link, that represents a security threat enabled by AI. Traditional models might label phishing as social engineering—outside STRIDE’s scope—or possibly as Information Disclosure if it leads to leaked data. However, this is clearly a distinct attack category amplified by AI autonomy that STRIDE alone doesn’t fully cover.

To summarize, STRIDE largely holds up for security-centric threats even in AI – more than half of the list cleanly mapped. However, Agentic AI systems introduce three broad classes of threats:

Attacks on Agent’s Behaviour – e.g. manipulating the objectives, exploiting the way Agent plans or learns. (These are neither just data tampering nor spoofing in the normal sense.)
Attacks on Agent’s Autonomy – Agent-to-Agent communication, cascading hallucination attacks, misaligned or deceptive behaviors, and tool misuse are strong examples of threats that hinge on an agent’s autonomous decision-making and collaboration with other agents. These scenarios depend heavily on the agent’s capacity to operate independently and coordinate with minimal human oversight.
Attacks on human behaviour – For example, using AI to deceive or overwhelm humans—leading to unethical outcomes—is not directly covered by STRIDE. However, such tactics exploit human users as weak points within an autonomous system, highlighting a gap in traditional threat modeling approaches.

Therefore, to adapt STRIDE for agentic AI, I recommend treating the above as additional context alongside traditional STRIDE categories—or even considering them as potential new categories to capture emerging threats more effectively.

Agent Integrity – The agent should remain aligned with its intended goals and resistant to manipulation or deviation from truth.
Agent Ethics – The agent must uphold ethical behavior, even in the absence of direct attacks or explicit constraints.
Human Safety – The system must safeguard against scenarios where humans become the weakest link in an autonomous environment.

With these extensions, threat models become more comprehensive. For example, when assessing a multi-agent financial assistant, we go beyond traditional STRIDE checks (e.g., identity spoofing) and also ask: Could an attacker influence its decision-making? Might the agent exploit the system on its own? Could someone misuse the agent’s trusted role to deceive a customer? This holistic approach helps ensure secure and trustworthy AI deployments

As the AI threat landscape continues to evolve, so too must our frameworks. For now, I refer to this approach as STRIDE-AGENT+. Under the ‘AGENT+‘ umbrella, the three areas mentioned above can be addressed, with flexibility to incorporate additional categories as new threats emerge.

An extended STRIDE framework adapted for Agentic AI systems:

S – Spoofing: Impersonation of identities.

T – Tampering: Unauthorized data or system modification.

R – Repudiation: Denying actions without accountability.

I – Information Disclosure: Exposing sensitive data.

D – Denial of Service: Disrupting system availability.

E – Elevation of Privilege: Gaining unauthorized access.

AGENT+ extensions:

Agent Integrity: Ensuring the agent remains aligned with intended goals and is not easily manipulated

Agent Ethics: Preventing unethical behavior by agents, even in absence of external attacks.

Human Safety: Safeguarding humans from manipulation, deception, or harm in autonomous systems.

So with above changes lets see how we can algin the Agentic AI threats to new method.

Threat Reference	Agentic AI Threat	STRIDE + AGENT+ Mapping
1	Memory Poisoning	Tampering
2	Tool Misuse	Elevation of Privilege, Agent Integrity
3	Privilege Compromise	Elevation of Privilege
4	Resource Overload	Denial of Service
5	Cascading Hallucination Attacks	Tampering, Agent Ethics
6	Intent Breaking & Goal Manipulation	Agent Integrity
7	Misaligned & Deceptive Behaviors	Agent Integrity, Agent Ethics
8	Repudiation & Untraceability	Repudiation
9	Identity Spoofing & Impersonation	Spoofing , Agent Ethics
10	Overwhelming Human in the Loop	Human Safety
11	Unexpected RCE and Code Attacks	Elevation of Privilege, Agent Integrity, Agent Ethics
12	Agent Communication Poisoning	Tampering
13	Rogue Agents in Multi-Agent Systems	Spoofing, Elevation of Privilege
14	Human Attacks on Multi-Agent Systems	Spoofing, Tampering, Elevation of Privilege
15	Human Manipulation	Human Safety
16	Insecure Inter-Agent Protocol Abuse	Spoofing , Tampering, Elevation of Privilege

In conclusion, most agentic AI threats map to established security principles like authentication and integrity—so existing frameworks remain highly relevant. However, we must expand our threat modeling to account for AI-specific risks around agent intent and outcomes. By extending STRIDE with a few targeted categories and adapting our practices accordingly, we can address both traditional and emerging threats. This balanced approach helps us harness the strengths of proven models while staying ahead of the unique challenges introduced by Agentic AI.

In next post lets look in to Agentic AI threat mitigation.

Disclaimer: The views and opinions expressed in this post are solely my own and do not represent those of my employer or any affiliated organization. This content is based on my personal research, ideas, and interpretations, and is intended for informational and discussion purposes only.