Scanning concepts

Installed-State Scanning vs. Advisory Matching: Reducing False Positives

Name: ScanRook
Author: ScanRook

Not all vulnerability findings are equal. The method a scanner uses to determine what software is present in an artifact directly affects the accuracy of its results. This article explains the difference between installed-state scanning and advisory matching, and why it matters for security teams.

The Problem: Too Many Findings, Not Enough Trust

A common complaint about vulnerability scanners is noise. A scan of a standard container image might produce hundreds of findings, many of which are irrelevant or inaccurate. When security teams cannot trust their scan results, they either spend excessive time on manual triage or start ignoring findings altogether. Both outcomes are harmful.

The root cause of this noise is often how the scanner determines what software is actually present. There are two fundamentally different approaches: advisory matching based on file paths and heuristics, and installed-state scanning that reads the actual package manager databases.

Advisory Matching: The Heuristic Approach

Many scanners detect packages by scanning the filesystem for known file patterns. They look for lock files (package-lock.json, Gemfile.lock, go.sum), manifest files (pom.xml, requirements.txt), and file paths that suggest the presence of specific software. This approach is fast and works across many ecosystems, but it has significant limitations:

Intermediate layers -- In a multi-stage Docker build, a package might exist in an intermediate layer but not in the final image. File-based scanning of the full layer stack can detect packages that were removed before the final image was assembled.
Build-time dependencies -- A requirements.txt file might list build-time dependencies that are not installed in the running container. Flagging these creates findings for software that is not actually present.
Version ambiguity -- File path heuristics do not always correctly determine the installed version, especially when multiple versions coexist or when distribution patching changes the effective version.
Name mismatches -- A file on disk might not correspond to the package the scanner thinks it does, especially for common library names that appear in multiple ecosystems.

Installed-State Scanning: Reading the Source of Truth

The alternative is to read the actual package manager databases that exist inside the container. Every Linux distribution maintains a database of installed packages:

RPM-based (RHEL, CentOS, Fedora, Rocky, Alma) -- /var/lib/rpm/Packages
APK-based (Alpine) -- /etc/apk/installed or /lib/apk/db/installed
dpkg-based (Debian, Ubuntu) -- /var/lib/dpkg/status
pacman-based (Arch) -- /var/lib/pacman/local/

These databases are the authoritative record of what the package manager believes is installed. They include exact package names, versions, architectures, and dependency relationships. Reading them gives the scanner a ground-truth view of the software inventory.

Confidence Tiers: Classifying Finding Quality

ScanRook takes the installed-state approach further by classifying every finding into one of two confidence tiers:

ConfirmedInstalled

The package was detected by reading an actual package manager database (RPM, APK, dpkg, etc.). The scanner has high confidence that this package is truly installed in the final artifact state.

HeuristicUnverified

The package was detected via file path heuristics, manifest files, or other indirect methods. The finding may be valid but the scanner cannot confirm that the package is actually installed in the running state.

This classification lets security teams make informed decisions. ConfirmedInstalled findings can be acted on with high confidence. HeuristicUnverified findings may warrant manual verification before remediation effort is invested.

Learn more about confidence tiers in the confidence tiers documentation.

Why This Matters in Practice

Consider a multi-stage Docker build where the build stage installs development tools and compilers that are not copied to the production image. A scanner using file-path heuristics across all layers might flag vulnerabilities in those build tools, generating findings for software that does not exist in the deployed artifact.

An installed-state scanner reads the package manager database in the final layer and finds only the packages that are actually present. The build-time tools are not in the database because they were never installed in the final image. The result is a cleaner, more accurate finding set that teams can trust and act on.

How ScanRook Implements This

ScanRook's container scanning pipeline works as follows:

Extract all layers from the container tar, respecting layer ordering and whiteout files.
Locate package manager databases in the extracted filesystem.
Parse the databases to build the package inventory with exact names and versions.
Enrich the inventory against OSV, NVD, EPSS, and CISA KEV.
Classify each finding as ConfirmedInstalled or HeuristicUnverified based on the detection method.

This pipeline supports RPM, APK, dpkg, npm, pip, Go modules, and several other ecosystems. For language-level packages where no system-level package manager database exists, ScanRook reads lock files and classifies those findings as HeuristicUnverified while still providing full enrichment.