Python targets phantom dependencies threat with SBOM proposal

Spread the love


A whitepaper from the Python Software Foundation’s (PSF) own Security Developer-in-Residence, Seth Larson, sounds the alarm on “phantom dependencies” and offers a solution with the PEP 770 proposal for a SBOM.

This work, sponsored by the Alpha-Omega initiative, addresses an issue first brought to mainstream attention by Endor Labs in September 2023. They came up with the name ‘Phantom Dependency’ to describe software components that are included in packages but are not listed anywhere in the official metadata, manifests, or lock files.

Because they are undocumented, these phantom dependencies are effectively invisible to the standard Software Composition Analysis (SCA) tools that developers rely on for spotting vulnerabilities and ensuring compliance.

Table of Contents

Why Python is haunted by phantom dependencies

So, why is Python’s house so particularly haunted by these phantoms? The Endor Labs report named Python as one of the most affected ecosystems , and there are a few key reasons why.

First, Python is brilliant at talking to other programming languages. It can “wrap” around code written in C, C++, Rust, and more, presenting it with a simple Python interface. This is a huge reason why it has become the language of choice for demanding fields like AI and scientific computing, which need the performance of these other languages.

Second, Python’s “wheel” distribution format is designed for convenience. It’s essentially a zip file that gets unpacked, meaning any compiled code has to be included in a pre-compiled state, ready to go. This combination of wrapping other languages and using pre-compiled binaries creates the perfect conditions for phantom dependencies to thrive.

This has real-world consequences. Take the popular image-processing library, ‘Pillow’. When you install it, you also get a host of other well-known libraries like libjpeg, libpng, and xz-utils (which appears as liblzma) copied to your system.

If you run a standard security scanner like Syft or OSV-Scanner on your project, they won’t see any of them; they will only report the top-level Python packages. This blind spot can be dangerous.

The Pillow package, for example, bundled a version of libwebp that contained an actively exploited vulnerability known as CVE-2023-4863. Since libwebp was a phantom dependency, users had no idea it was even there, let alone that it was vulnerable and required an urgent update to a newer version of Pillow.

An issue of immense scale

Just how widespread is this problem? The whitepaper reveals some rather concerning numbers from an analysis of the top 5,000 packages on the Python Package Index (PyPI).

  • System libraries: Using a common tool called auditwheel to bundle libraries is a major source of phantom dependencies. This method was used in 212 of the top 5,000 packages. Just one of those bundled libraries, libgcc_s, is found in 112 projects that are downloaded 2.75 billion times every month.
  • Other languages: The inclusion of code from other languages is rampant. Within the top 5,000 projects, C and C++ code appears in 567 packages, which together are downloaded 10 billion times a month. Rust is present in 95 of those top packages, accounting for 1.7 billion monthly downloads.
  • Bundled Python: Even other Python libraries get bundled. A technique called “vendoring” is popular in core tools like pip and setuptools. Although only used by 11 of the top 5,000 projects, those 11 projects are so fundamental that they are installed over 50 billion times per month.

Banishing phantom dependencies with a SBOM

The answer to this problem is a new proposal authored by Larson called PEP 770. It introduces a way to embed a Software Bill-of-Materials, or SBOM, directly into Python packages.

An SBOM is essentially a detailed ingredients list for software; it lists every single component, its version, origin, and other critical data in a format that any tool can understand. By making SBOMs a standard part of a package, PEP 770 makes the invisible visible.

When the Pillow example was tried again, this time using a package rebuilt to include PEP 770 metadata, the security scanner was suddenly able to see the complete list of bundled libraries, from libbrotli1 to libzstd1, because their details were in the SBOM. Now, scanners can fetch the correct vulnerability data and properly alert users when their software needs an upgrade.

PEP 770 has been designed for simple and quick adoption. It is backwards compatible, so it won’t break older tools, and it can be enabled by default so maintainers won’t need to manually opt-in. Patches have already been sent to key tools like auditwheel to get them generating SBOMs automatically.

This isn’t just a win for Python and the team behind the whitepaper recognises that other open-source ecosystems have similar struggles with phantom dependencies. They are inviting others to adopt or adapt Python’s approach, offering guidance in the hope of making the entire software supply chain a little safer.

(Photo by Febe Vanermen)

See also: RubyGems malware campaign steals passwords

Want to learn more about cybersecurity and the cloud from industry leaders? Check out Cyber Security & Cloud Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Digital Transformation Week, IoT Tech Expo, Blockchain Expo, and AI & Big Data Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.


Share this content:

I am a passionate blogger with extensive experience in web design. As a seasoned YouTube SEO expert, I have helped numerous creators optimize their content for maximum visibility.

Leave a Comment