Critical Flaw in NVIDIA AI Toolkit Flaw Puts Cloud Services at Risk


Cybersecurity researchers from Wiz have uncovered a severe flaw, now identified as CVE-2025-23266 and nicknamed NVIDIAScape, that could allow attackers to escape container boundaries and gain full root access to the host machine.

The bug affects all versions of the NVIDIA Container Toolkit up to 1.17.7 and has been rated 9.0 (Critical) on the CVSS severity scale. It also impacts NVIDIA GPU Operator versions up to 25.3.0, widely used to manage GPU containers in Kubernetes clusters.

The vulnerability has particularly serious implications for managed AI cloud services, which let customers run their own AI containers on shared GPU infrastructure. In these multi-tenant environments, a single malicious container could compromise data and models belonging to other users on the same machine.

According to Wiz, the issue affects an estimated 37% of cloud environments, including setups used by major cloud providers.

Table of Contents

Details about how the flaw works

As Wiz researchers explained in their breakdown, the flaw stems from how the toolkit handles OCI (Open Container Initiative) hooks, notably the createContainer hook. When triggered, this hook inherits environment variables from the container image, a behavior that opens the door for exploitation.

By setting the LD_PRELOAD environment variable in a Dockerfile and including a malicious .so file, an attacker can inject code into privileged processes on the host system.


What NVIDIA recommends

NVIDIA confirmed the flaw in a security bulletin, warning it could lead to “escalation of privileges, data tampering, information disclosure, and denial-of-service.” The company also patched the vulnerability in version 1.17.8 of the Container Toolkit and version 25.3.1 of the GPU Operator.

NVIDIA recommends all users upgrade immediately, regardless of whether the host is internet-facing. Attackers could gain access through social engineering, poisoned container images, or compromised repositories.

For systems where immediate updates aren’t possible, NVIDIA recommends disabling the enable-cuda-compat hook, which is at the heart of the problem.

Security teams are advised to prioritize patching hosts that run containers built from untrusted or public images, especially in shared GPU environments. It’s also important to note that internet exposure is not required for exploitation; attackers can use social engineering or supply chain infiltration to deliver the malicious image.

A pattern of infrastructure weaknesses

This isn’t the first time the NVIDIA Container Toolkit has come under fire. In 2024, Wiz Research uncovered CVE-2024-0132, another container escape flaw affecting the same toolkit. Experts say these incidents highlight how foundational infrastructure, not just futuristic AI misuse, poses the most immediate risks to AI systems.

“While the hype around AI security risks tends to focus on futuristic, AI-based attacks, “old-school” infrastructure vulnerabilities in the ever-growing AI tech stack remain the immediate threat that security teams should prioritize,” the research team wrote.

NVIDIAScape is a reminder that as AI continues to evolve its supporting infrastructure must not be overlooked. With NVIDIA GPUs serving as the engine behind much of today’s AI development, flaws in the systems that manage them represent a critical risk to the broader digital ecosystem.


Share this content:

I am a passionate blogger with extensive experience in web design. As a seasoned YouTube SEO expert, I have helped numerous creators optimize their content for maximum visibility.

Leave a Comment