Caching

Name: ScanRook
Author: ScanRook

ScanRook makes live API calls to OSV, NVD, Red Hat, EPSS, and CISA to enrich findings with up-to-date vulnerability data. To avoid hitting rate limits and to make repeated scans of the same artifact fast, every API response is cached locally. This page explains how the three caching layers work and how to configure them.

Three-layer cache hierarchy

Responses are checked in order: in-memory → file → PostgreSQL → live API.

1File cache

~/.scanrook/cache/

•Always active by default
•Keyed by SHA256 of request params
•Stored as raw JSON bytes per entry
•Disable with SCANNER_SKIP_CACHE=1

2PostgreSQL

DATABASE_URL

•Opt-in via DATABASE_URL env var
•Shared across multiple worker pods
•Stores OSV advisories and Red Hat CVE data
•Schema auto-initialized on first use

3Redis

REDIS_URL

•Opt-in via REDIS_URL env var
•Fastest layer for multi-worker setups
•Used for NVD CPE lookups and rate-limit coordination
•Not required for single-machine use

When a vulnerability lookup is needed, ScanRook checks each layer in order. A cache hit in any layer skips all subsequent layers including the live API call. Responses fetched from the API are written back to all configured layers so subsequent requests are served from cache.

The file cache is always active. PostgreSQL and Redis layers are additive — configuring them speeds up multi-worker deployments where multiple scanner pods share the same artifact queue but don’t share a local filesystem.

Cache key format

Every cache entry is keyed by SHA256 of its request parameters.

Cache keys are computed by hashing a list of string parts together. For example, the Red Hat CVE API response for CVE-2024-1234 is stored under sha256("redhat_cve" + "CVE-2024-1234"). The file cache stores each entry as a single file named by the hex-encoded hash inside the cache directory. This makes cache lookups O(1) regardless of how many entries are stored.

Source	Key components	TTL	Layers
OSV batch query	sha256(ecosystems + package names)	7 days	File + PG
OSV advisory JSON	sha256('osv_advisory' + advisory_id)	7 days	File + PG
NVD CVE JSON	sha256('nvd_cve' + CVE-ID)	30 days	File + PG
Red Hat CVE JSON	sha256('redhat_cve' + CVE-ID)	dynamic (30 days default)	File + PG
Red Hat per-package CVE list	sha256('redhat_pkg_cves' + package_name)	30 days	File only
EPSS batch scores	sha256('epss_v1' + sorted CVE IDs)	1 day	File only
CISA KEV catalog	sha256('kev_catalog')	1 day	File only
OVAL XML auto-download	sha256('oval_auto' + distro_key)	7 days	File only

EPSS chunk keys include all sorted CVE IDs in the batch to ensure stable cache hits across repeated scans of the same artifact.

Dynamic TTL for Red Hat data

Recently-modified CVEs get shorter TTLs so fixes are surfaced quickly.

Red Hat CVE entries include a lastModified timestamp. The scanner uses this to compute a shorter cache TTL for recently-changed advisories. If a CVE was modified in the last 7 days, it is re-fetched after 1 day regardless of the base TTL. CVEs that haven’t changed in more than 90 days get a longer TTL of up to 90 days. This balances freshness with API load.

TTL logic

last_modified < 7 days ago → TTL = 1 day

last_modified 7–30 days ago → TTL = base_ttl (default 30d)

last_modified > 90 days ago → TTL = 90 days

Managing the cache

Use the db subcommand to inspect and refresh cached vulnerability data.

scanrook db sources

List all configured cache sources (file cache path, PostgreSQL URL if set, Redis URL if set).

scanrook db check

Show the number of cached entries per source, disk usage, and oldest/newest entries.

scanrook db update

Pre-warm the cache by downloading the latest KEV catalog, EPSS scores, and any pending OVAL files.

To clear the entire file cache: rm -rf ~/.scanrook/cache/. The next scan will re-populate it from live APIs.

Environment variables

All caching behaviour can be tuned without changing any config file.

Variable	Default	Description
SCANNER_CACHE	~/.scanrook/cache/	Override the file cache directory.
SCANNER_SKIP_CACHE	0	Set to 1 to bypass all file cache reads and writes. Forces fresh API calls on every scan.
DATABASE_URL	(unset)	PostgreSQL connection string. Enables the database cache layer for OSV advisories and Red Hat CVE data.
REDIS_URL	(unset)	Redis connection string (redis://host:port). Enables the in-memory cache layer for multi-worker deployments.
SCANNER_REDHAT_TTL_DAYS	30	How many days to treat Red Hat CVE API responses as fresh before re-fetching.
SCANNER_OSV_TTL_DAYS	7	How many days to treat OSV advisory responses as fresh.
SCANNER_EPSS_TTL_DAYS	1	How many days to treat EPSS scores as fresh (re-fetched daily since scores change).

Cache in CI/CD pipelines

Mount the cache directory as a persistent volume or artifact to speed up pipeline scans.

In GitHub Actions or GitLab CI, mount ~/.scanrook/cache as a cache artifact between runs. On the first pipeline run all API calls are made live. Subsequent runs for the same set of packages hit the file cache and complete much faster — typically in under 3 seconds for a fully warm cache versus 30–60 seconds for a cold scan.

GitHub Actions example

- uses: actions/cache@v4

with:

path: ~/.scanrook/cache

key: scanrook-cache-${{ hashFiles('**/package-lock.json') }}

restore-keys: scanrook-cache-

When DATABASE_URL is set, the PostgreSQL cache is shared across all worker pods automatically — no volume mounting needed. This is the recommended approach for self-hosted DeltaGuard deployments where multiple worker replicas scan different jobs concurrently.