License Scanning and Compliance

Every open source component in your software carries a license that dictates how you can use, modify, and distribute it. Violating those terms exposes your organization to litigation, injunctions, and forced disclosure of proprietary source code. License scanning identifies every license in your dependency tree so you can enforce policy before a legal obligation becomes a legal liability.

What Is License Scanning?

Understanding why automated license detection is a business requirement.

License scanning is the automated process of identifying the software licenses attached to every component in a codebase, container image, or binary artifact. Unlike vulnerability scanning, which looks for known security flaws, license scanning answers a different question: are you legally allowed to use this software the way you are using it?

The average enterprise application contains between 500 and 1,500 open source dependencies. Each one carries a license. Some of those licenses are permissive and require nothing more than an attribution notice. Others are copyleft and require you to release your own source code under the same terms if you distribute the combined work. A single copyleft dependency buried in a transitive dependency chain can create a legal obligation that applies to your entire product.

Manual license review does not scale. Developers add dependencies constantly, transitive dependencies pull in dozens of packages the developer never explicitly chose, and license terms change between versions. Elasticsearch famously switched from Apache-2.0 to SSPL in 2021, meaning that organizations upgrading without checking the new license suddenly had a completely different set of obligations.

License scanning matters in three concrete scenarios: M&A due diligence (acquirers will audit your open source usage), compliance with internal policies (most enterprises have approved license lists), and regulatory requirements (the EU Cyber Resilience Act and US Executive Order 14028 both mandate software supply chain transparency, which includes license data in SBOMs).

How ScanRook Detects Licenses

Extraction from package metadata across every major ecosystem.

ScanRook extracts license information from the authoritative metadata source for each package ecosystem. Rather than guessing licenses from file contents or relying on heuristic text matching, ScanRook reads the structured fields that package managers themselves use to record license data.

RPM-based distributions (RHEL, CentOS, Fedora, Rocky, Alma)

ScanRook reads the RPM database at /var/lib/rpm/Packages and extracts the License header tag from each installed package. RPM packages use Fedora license identifiers (e.g., GPLv2+, MIT, ASL 2.0), which ScanRook normalizes to SPDX identifiers.

APK-based distributions (Alpine Linux)

Alpine packages store metadata in /lib/apk/db/installed. ScanRook parses the L: field from each package entry, which contains the SPDX license expression directly. Alpine adopted SPDX identifiers early, making this the most straightforward ecosystem to parse.

dpkg-based distributions (Debian, Ubuntu)

Debian packages store license information in /usr/share/doc/<package>/copyright files. ScanRook reads each copyright file and extracts the license identifier from the machine-readable DEP-5 format. When a copyright file uses the older free-text format, ScanRook falls back to pattern matching against known license texts.

npm (Node.js)

ScanRook reads the license field from each package's package.json. The npm registry requires packages to declare a license using an SPDX identifier. Older packages may use the deprecated licenses array format, which ScanRook also handles.

pip (Python)

Python packages installed via pip store metadata in METADATA files within site-packages/<package>.dist-info/. ScanRook reads the License header and the Classifier entries that begin with License :: OSI Approved ::. When both are present, the classifier takes precedence as it is more structured.

Cargo (Rust)

Rust crates declare their license in Cargo.toml using the license field, which accepts SPDX expressions (e.g., MIT OR Apache-2.0). ScanRook parses compiled crate metadata and Cargo.lock to resolve the license for each dependency in the tree.

License Risk Levels

How ScanRook classifies licenses by commercial risk.

ScanRook assigns every detected license to one of five risk tiers. These tiers reflect the degree of obligation the license imposes on organizations distributing or deploying software that contains the licensed component.

Critical RiskAGPL-3.0, SSPL-1.0

Network copyleft and source-available licenses that require disclosure of your entire application source code if you offer the software as a service or over a network. Most commercial organizations treat these as hard blockers. The AGPL-3.0 triggers source disclosure for any user who interacts with the software over a network, which includes virtually all SaaS applications. The SSPL goes further, requiring disclosure of your entire service stack including management, monitoring, and deployment tooling.

High RiskGPL-2.0, GPL-3.0, BSL-1.1

Strong copyleft and source-available licenses that impose significant obligations. GPL-2.0 and GPL-3.0 require that any distributed derivative work be released under the same license, including your proprietary code if it is linked with the GPL component. BSL-1.1 (Business Source License) restricts production use until a specified change date, after which the code converts to an open source license. These licenses require legal review before adoption.

Medium RiskLGPL-2.1, LGPL-3.0, MPL-2.0, EPL-2.0

Weak copyleft licenses that limit the copyleft obligation to the licensed component itself, not your entire application. The LGPL allows you to use the library without disclosing your own source code, provided you link dynamically and allow users to replace the LGPL library. MPL-2.0 applies copyleft at the file level -- you must share modifications to MPL-licensed files, but your own files remain under your chosen license. These are generally acceptable for commercial use with minor compliance effort.

Low RiskApache-2.0

Permissive with a patent grant. Apache-2.0 allows unrestricted commercial use, modification, and distribution. The main obligation is including the original license text and NOTICE file, plus providing attribution. The license includes an explicit patent grant from contributors, which provides additional legal protection. The patent retaliation clause terminates the patent grant if the licensee initiates patent litigation against the licensor.

None RiskMIT, BSD-2-Clause, BSD-3-Clause, ISC, Unlicense, CC0-1.0

Maximally permissive licenses with minimal obligations. MIT and BSD require only that you include the original copyright notice and license text. ISC is functionally equivalent to MIT with simpler language. The Unlicense and CC0-1.0 are public domain dedications that waive all copyright, imposing no obligations at all. These licenses are safe for any commercial use.

Copyleft vs Permissive Licenses

The fundamental distinction that determines your compliance obligations.

The single most important distinction in open source licensing is between permissive and copyleft licenses. Understanding this distinction is the foundation of any license compliance program.

Permissive licenses

Permissive licenses (MIT, BSD, Apache-2.0, ISC) grant broad freedoms with minimal conditions. You can use the code in proprietary software, modify it without sharing your changes, and distribute it commercially. The typical obligation is limited to preserving the original copyright notice and license text somewhere in your distribution -- usually in a NOTICES file or an "About" dialog. Approximately 70% of packages on npm and over 60% of packages on PyPI use permissive licenses.

Copyleft licenses

Copyleft licenses (GPL, LGPL, AGPL, MPL) require that derivative works be distributed under the same or a compatible license. The practical effect for commercial software is that if you include a GPL-licensed component and distribute the combined work, you must make your entire application's source code available under the GPL. This is sometimes called the "viral" effect of copyleft, though the Free Software Foundation considers this a feature, not a bug.

Why copyleft matters for proprietary software

If your business model depends on keeping your source code proprietary, copyleft licenses create a direct conflict. A single GPL dependency statically linked into your application can require you to release the source code for the entire application. This is not theoretical -- the gpl-violations.org project has documented hundreds of enforcement actions, and the Software Freedom Conservancy has pursued multiple high-profile cases against companies using Linux kernel code in proprietary products without complying with GPL-2.0 requirements.

License Compatibility

Which licenses can coexist in the same project.

License compatibility determines whether code under two different licenses can be combined into a single work. Incompatible licenses create legal conflicts that cannot be resolved by technical means -- if two dependencies in your project have incompatible licenses, you must remove one of them or find an alternative.

Common compatibility rules

  • Permissive + Permissive -- Always compatible. MIT code and Apache-2.0 code can be combined freely.
  • Permissive + Copyleft -- Generally compatible, but the combined work must be distributed under the copyleft license. MIT code can be included in a GPL project, but the result is GPL.
  • GPL-2.0 + Apache-2.0 -- The FSF considers Apache-2.0 incompatible with GPL-2.0 due to the patent retaliation clause, but compatible with GPL-3.0. This is one of the most common compatibility pitfalls.
  • GPL-2.0 + GPL-3.0 -- Incompatible unless the GPL-2.0 code uses the "or later" clause (GPL-2.0+). Code licensed as "GPL-2.0-only" cannot be combined with GPL-3.0 code.
  • AGPL-3.0 + Proprietary -- Incompatible. You cannot combine AGPL code with proprietary code and distribute the result without releasing everything under AGPL-3.0.

ScanRook detects these conflicts by analyzing the full license set across your dependency tree and flagging combinations that create compatibility issues. For a complete reference of license types and their terms, see the License Types guide.

Compliance Obligations by License Type

What you actually have to do for each license category.

Understanding your obligations is the difference between being compliant and being exposed. Here is what each major license category requires when you distribute software containing the licensed component:

MIT / BSD / ISC -- Attribution

Include the original copyright notice and license text in your distribution. This typically means adding a NOTICES or THIRD-PARTY-LICENSES file to your release artifacts. No source code disclosure is required. No restrictions on commercial use.

Apache-2.0 -- Attribution + NOTICE + Patent Grant

Include the license text and any NOTICE file from the original project. If you modify Apache-2.0 files, you must state that you changed them. The patent grant gives you a license to any patents held by contributors that cover the contribution, but this grant terminates if you sue the licensor for patent infringement.

LGPL -- Dynamic Linking + Replacement

You must allow users to replace the LGPL library with a modified version. In practice, this means dynamically linking against the LGPL component and distributing your application in a form that permits relinking. You must also provide the LGPL library's source code (or a written offer to provide it) if you distribute the library itself.

MPL-2.0 -- File-Level Copyleft

If you modify files that are under MPL-2.0, you must make the modified source code of those specific files available under MPL-2.0. Your own new files can remain under any license. This makes MPL-2.0 one of the most commercially friendly copyleft licenses.

GPL -- Full Source Disclosure

If you distribute a combined work that includes GPL-licensed code, you must make the complete corresponding source code of the entire combined work available under the GPL. This applies to statically linked components, and for GPL-3.0, also to "Installation Information" (signing keys, etc.) needed to install modified versions on consumer devices.

AGPL-3.0 -- Network Disclosure

All GPL obligations apply, plus: if users interact with the software over a network (including via a web browser or API), you must provide the complete source code to those users. This effectively requires SaaS providers to release their entire application source code if any AGPL component is included.

Using License Policies in ScanRook

Enforce approved license lists and block risky licenses before deployment.

ScanRook allows organizations to define license policies that automatically flag or block packages based on their license. This turns license compliance from a manual review process into an automated gate.

  • Blocklist mode -- Specify licenses that are never allowed (e.g., AGPL-3.0, SSPL-1.0, GPL-3.0). Any scan that detects a blocked license will be flagged as a policy violation. This is the most common approach for commercial software organizations.
  • Allowlist mode -- Specify the only licenses that are permitted (e.g., MIT, BSD-2-Clause, Apache-2.0). Any license not on the list is flagged. This is stricter but provides the strongest guarantee of compliance.
  • Risk threshold mode -- Flag any license at or above a specified risk level (e.g., flag all Medium and above). This uses ScanRook's built-in risk tiers to automatically catch licenses that need review.

License policies can be configured per-organization in the ScanRook dashboard and enforced in CI/CD pipelines via the GitHub Actions or GitLab CI integrations.

Further reading

Related guides and documentation.