Enrichment

Name: ScanRook
Author: ScanRook

Enrichment is the process of taking a raw package inventory (names and versions found in a container, binary, or SBOM) and matching it against vulnerability databases to produce actionable findings. ScanRook queries multiple sources in a defined pipeline order, merging results and deduplicating across providers.

What enrichment means

Turning a list of packages into a list of vulnerabilities.

When ScanRook scans an artifact, it first extracts a package inventory: the list of installed software with ecosystem, name, and version. This inventory by itself has no security information. Enrichment is the step that queries external vulnerability databases to determine which of those packages have known CVEs.

Each enrichment source contributes different data. OSV provides broad ecosystem coverage and affected version ranges. NVD adds authoritative CVSS scores and CPE-based matching. Distro feeds provide fix status specific to the Linux distribution. The scanner merges all of this into a single unified finding per CVE-package pair.

Enrichment pipeline

The order in which ScanRook queries vulnerability data sources.

Package Inventory

container.rs / sbom.rs

OSV Batch Query

vuln.rs

NVD CPE Match

vuln.rs

Distro Feed

vuln.rs

Red Hat CSAF

redhat.rs

EPSS Enrich

vuln.rs

CISA KEV

vuln.rs

Deduplicate + Merge

vuln.rs

Report

report.rs

Each active enrichment step (highlighted) queries an external API. Results are cached locally and in PostgreSQL/Redis when configured.

Enrichment sources

Detailed description of each vulnerability data source in the pipeline.

Open Source Vulnerabilities (OSV)

osv

Google's open-source vulnerability database. Covers the broadest set of ecosystems via a single batch API.

What it provides

Batch queries packages by ecosystem/name/version. Returns matched advisories with affected ranges, severity, and fix versions.

When it activates

Always queried first. Primary source for npm, PyPI, Go, Rust, Ruby, Maven, NuGet, DPKG, APK, and RPM packages.

National Vulnerability Database (NVD)

nvd

NIST's authoritative CVE dictionary. Provides CVSS scores, CPE matching, and detailed advisory metadata.

What it provides

Per-CVE lookup by ID, plus CPE-based product/version matching. Returns CVSS v3.1 base scores, vector strings, references, and CWE classifications.

When it activates

Used as a second-pass enrichment after OSV. Adds CVSS scores to OSV findings and discovers additional CVEs via CPE matching. Requires NVD_API_KEY for higher rate limits.

Red Hat CSAF / Security Data API

redhat

Red Hat's security data API provides fix status, errata, and CSAF advisories for RHEL packages.

What it provides

Queries per-CVE fix status for RPM packages. Returns fix state (affected, fixed, not affected), errata IDs, and fixed-in versions.

When it activates

Automatically activated for RPM packages detected in RHEL-based container images. Also invoked when --oval-redhat is provided.

Distro security feeds

ubuntu, debian, alpine, amazon, oracle, wolfi, chainguard

Distribution-specific security trackers that provide precise fix status for packages in their repositories.

What it provides

Maps CVEs to distro package versions with fix status (fixed, not-affected, needs-triage). Provides distro-specific severity and urgency ratings.

When it activates

Activated based on detected OS in container scans. Ubuntu CVE Tracker for Ubuntu/DPKG, Debian Security Tracker for Debian/DPKG, Alpine SecDB for Alpine/APK, Amazon Linux for AL2/AL2023, Oracle Linux, Wolfi SecDB, and Chainguard advisories.

EPSS (Exploit Prediction Scoring System)

epss

FIRST's model that predicts the probability a CVE will be exploited in the next 30 days.

What it provides

Returns a probability score (0.0-1.0) and percentile for each CVE, indicating real-world exploit likelihood. Results are cached for 24 hours.

When it activates

Always active. Applied to all findings after vulnerability matching. Batch queries api.first.org for all CVE IDs in the report.

CISA KEV (Known Exploited Vulnerabilities)

kev

CISA's catalog of vulnerabilities known to be actively exploited in the wild.

What it provides

Boolean flag: is this CVE in the KEV catalog? Also provides the date added, required remediation date, and ransomware campaign association.

When it activates

Always active. Downloads the full KEV catalog (cached as a HashSet), then flags any finding whose CVE ID appears in the catalog.

Deduplication and merging

How ScanRook handles overlapping results from multiple sources.

When the same CVE is reported by multiple sources (for example, both OSV and NVD report CVE-2024-12345 for the same package), ScanRook merges them into a single finding. The merge logic:

Uses the highest CVSS score from any source
Combines evidence items from all sources
Prefers distro-specific fix status over generic fix versions
Retains all references and advisory URLs
Sets the confidence tier based on the strongest evidence available

This approach ensures that findings are both comprehensive and deduplicated, avoiding duplicate alerts for the same vulnerability.