Self-Hosted Deployment
ScanRook can be deployed entirely on your own infrastructure. Self-hosting gives you full control over your data, allows operation in air-gapped environments, and helps meet compliance requirements that prohibit sending artifacts or vulnerability data to third-party services. This guide covers the architecture, prerequisites, Kubernetes deployment, configuration, scaling, and offline operation.
Architecture overview
The ScanRook platform consists of five components that communicate via PostgreSQL and S3-compatible object storage.
- Web application (Next.js) -- Dashboard, API routes, scan job management, user authentication, and SSE progress streaming.
- Worker service (Go) -- Polls PostgreSQL for queued jobs, downloads artifacts from S3, executes the scanner binary, tails NDJSON progress, and uploads reports.
- Scanner binary (Rust) -- Core scanning engine. Auto-detects file types, extracts package inventories, and enriches findings from OSV, NVD, and distro feeds. Bundled inside the worker container image.
- PostgreSQL -- Job queue, scan events, user data, and optional CVE cache.
- S3-compatible storage -- MinIO, AWS S3, or any S3-compatible service. Stores uploaded artifacts and scan report JSON files.
Browser Infrastructure
| |
|-- presigned POST --> [ S3 (uploads bucket) ]
| |
|-- POST /api/jobs --> [ Web (Next.js) ] --> [ PostgreSQL ]
| | |
| | polls scan_jobs (status=queued)
| | |
| | [ Worker (Go) ]
| | | |
| | downloads from S3 executes scanner
| | | |
| | tails NDJSON [ Scanner (Rust) ]
| | |
| | inserts scan_events --> pg_notify
| | |
| | uploads report --> [ S3 (reports bucket) ]
| |
|<-- SSE /api/jobs/[id]/events ---- [ Web ] <-- polls scan_events
|<-- GET /api/jobs/[id]/report ---- [ Web ] <-- fetches from S3Prerequisites
What you need before deploying ScanRook.
- Kubernetes cluster (1.25+) or Docker Compose for single-node deployments
- PostgreSQL 15+ -- managed service or self-hosted (e.g. CloudNativePG, Amazon RDS)
- S3-compatible object storage -- MinIO (recommended for self-hosted), AWS S3, Google Cloud Storage with S3 compatibility, or DigitalOcean Spaces
- 4 GB RAM minimum (8 GB recommended for worker nodes running concurrent scans)
- Domain name with TLS certificate -- for the web dashboard. Use cert-manager with Let's Encrypt or provide your own certificate.
- Container registry access -- to pull ScanRook container images (ghcr.io/devinshawntripp/scanrook-web and ghcr.io/devinshawntripp/scanrook-worker)
Kubernetes deployment
Step-by-step instructions for deploying ScanRook on Kubernetes.
1. Create the namespace
kubectl create namespace scanrook2. Create secrets
Store database credentials, S3 keys, and auth secrets. Replace the placeholder values with your actual credentials.
apiVersion: v1
kind: Secret
metadata:
name: scanrook-secrets
namespace: scanrook
type: Opaque
stringData:
DATABASE_URL: "postgres://user:pass@db-host:5432/scanrook?sslmode=require"
S3_ACCESS_KEY: "your-access-key"
S3_SECRET_KEY: "your-secret-key"
NEXTAUTH_SECRET: "generate-with-openssl-rand-base64-32"
NVD_API_KEY: "your-nvd-api-key" # optional but recommended3. Create ConfigMap
Non-sensitive configuration shared by the web and worker deployments.
apiVersion: v1
kind: ConfigMap
metadata:
name: scanrook-config
namespace: scanrook
data:
S3_ENDPOINT: "minio.scanrook.svc:9000"
S3_USE_SSL: "false"
S3_REGION: "us-east-1"
UPLOADS_BUCKET: "uploads"
REPORTS_BUCKET: "reports"
NEXTAUTH_URL: "https://scanrook.example.com"
SCANNER_PATH: "/usr/local/bin/scanrook"
SCRATCH_DIR: "/scratch"
WORKER_CONCURRENCY: "2"
WORKER_STALE_JOB_TIMEOUT_SECONDS: "1800"
HTTP_ADDR: ":8080"4. Web deployment
Three replicas are recommended for high availability. The web application serves the dashboard and API routes.
apiVersion: apps/v1
kind: Deployment
metadata:
name: scanrook-web
namespace: scanrook
spec:
replicas: 3
selector:
matchLabels:
app: scanrook-web
template:
metadata:
labels:
app: scanrook-web
spec:
containers:
- name: web
image: ghcr.io/devinshawntripp/scanrook-web:latest
ports:
- containerPort: 3000
envFrom:
- configMapRef:
name: scanrook-config
- secretRef:
name: scanrook-secrets
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "500m"
readinessProbe:
httpGet:
path: /api/health
port: 3000
initialDelaySeconds: 5
periodSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
name: scanrook-web
namespace: scanrook
spec:
selector:
app: scanrook-web
ports:
- port: 3000
targetPort: 30005. Worker deployment
Workers execute scans. Scale the replica count based on your expected scan volume. Each worker runs WORKER_CONCURRENCY parallel jobs.
apiVersion: apps/v1
kind: Deployment
metadata:
name: scanrook-worker
namespace: scanrook
spec:
replicas: 3
selector:
matchLabels:
app: scanrook-worker
template:
metadata:
labels:
app: scanrook-worker
spec:
containers:
- name: worker
image: ghcr.io/devinshawntripp/scanrook-worker:latest
envFrom:
- configMapRef:
name: scanrook-config
- secretRef:
name: scanrook-secrets
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "2Gi"
cpu: "1000m"
volumeMounts:
- name: scratch
mountPath: /scratch
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 10
periodSeconds: 30
volumes:
- name: scratch
emptyDir:
sizeLimit: 10Gi6. Ingress
Expose the web application with TLS. This example uses nginx-ingress with cert-manager.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: scanrook-ingress
namespace: scanrook
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
ingressClassName: nginx
tls:
- hosts:
- scanrook.example.com
secretName: scanrook-tls
rules:
- host: scanrook.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: scanrook-web
port:
number: 3000Environment variable reference
All environment variables used by the web and worker components.
| Variable | Purpose | Component | Required |
|---|---|---|---|
DATABASE_URL | PostgreSQL connection string | Web, Worker | Yes |
S3_ENDPOINT | S3-compatible object storage endpoint | Web, Worker | Yes |
S3_ACCESS_KEY | S3 access key ID | Web, Worker | Yes |
S3_SECRET_KEY | S3 secret access key | Web, Worker | Yes |
S3_USE_SSL | Enable TLS for S3 connections | Web, Worker | No |
S3_REGION | S3 region (e.g. us-east-1) | Web, Worker | No |
UPLOADS_BUCKET | Bucket for uploaded artifacts | Web, Worker | Yes |
REPORTS_BUCKET | Bucket for scan report JSON files | Web, Worker | Yes |
SCANNER_PATH | Path to the scanrook binary inside the worker container | Worker | No |
SCRATCH_DIR | Temporary directory for downloaded artifacts during scans | Worker | No |
WORKER_CONCURRENCY | Number of parallel scan jobs per worker pod | Worker | No |
WORKER_STALE_JOB_TIMEOUT_SECONDS | Seconds before a running job with no heartbeat is marked failed | Worker | No |
NEXTAUTH_URL | Canonical URL of the web application (e.g. https://scanrook.example.com) | Web | Yes |
NEXTAUTH_SECRET | Secret used to encrypt session tokens (generate with openssl rand -base64 32) | Web | Yes |
NVD_API_KEY | NVD API key for higher rate limits during enrichment | Worker | No |
HTTP_ADDR | Listen address for the worker health endpoint | Worker | No |
Scaling
Tuning ScanRook for high-volume scan workloads.
ScanRook scales horizontally at the worker layer. Each worker pod polls PostgreSQL for queued jobs using SELECT ... FOR UPDATE SKIP LOCKED, so multiple workers can safely process jobs in parallel without conflicts.
Worker concurrency
The WORKER_CONCURRENCY environment variable controls how many scans a single worker pod runs in parallel. The default is 2. For worker pods with 2 GB+ memory, you can safely increase this to 3-4. Total cluster scan throughput is replicas x WORKER_CONCURRENCY.
Horizontal pod autoscaling
For dynamic scaling, use a Kubernetes HPA based on CPU utilization or a custom metric derived from the scan_jobs queue depth.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: scanrook-worker-hpa
namespace: scanrook
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: scanrook-worker
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70Recommendations for high-volume environments
- Run 3+ worker replicas with
WORKER_CONCURRENCY=1each for better fault isolation - Use dedicated worker nodes with node selectors or taints to prevent scan workloads from competing with the web application
- Enable PostgreSQL CVE caching via
DATABASE_URLon the scanner to avoid redundant API calls across workers - Add Redis as a distributed cache layer for even faster lookups across multiple worker pods
- Monitor the
scan_jobstable for queue depth (jobs withstatus = 'queued') to detect backpressure
Air-gapped operation
Running ScanRook without internet access to external vulnerability databases.
ScanRook can operate in fully air-gapped environments by pre-seeding its vulnerability cache before deploying to the isolated network. The scanner checks its local file cache, then PostgreSQL, then Redis before making any external API calls. If the cache contains the needed data, no outbound requests are made.
Pre-seeding the cache
On a machine with internet access, use the scanrook db commands to warm the cache with vulnerability data for your target artifacts.
# On a machine with internet access:
# Warm cache for a specific artifact
scanrook db download --file ./myapp.tar
# Or update all sources broadly
scanrook db update --source all --file ./myapp.tar
# Check cache status
scanrook db check
# Package the cache directory for transfer
tar -czf scanrook-cache.tar.gz ~/.scanrook/cache/Deploying the cache
Transfer the cache archive to your air-gapped environment and mount it into the worker pods. Set the SCANNER_CACHE environment variable to point to the mounted path.
# Extract the cache on the air-gapped host
tar -xzf scanrook-cache.tar.gz -C /opt/scanrook/
# Mount as a volume in the worker deployment
volumes:
- name: vuln-cache
hostPath:
path: /opt/scanrook/cache
type: Directory
# Reference in the container spec
volumeMounts:
- name: vuln-cache
mountPath: /cache
readOnly: true
# Set the environment variable
env:
- name: SCANNER_CACHE
value: "/cache"Disabling external enrichment
To prevent the scanner from attempting outbound connections (which would fail and slow down scans), explicitly disable enrichment sources that require internet access.
env:
- name: SCANNER_NVD_ENRICH
value: "0"
- name: SCANNER_OSV_ENRICH
value: "0"
- name: SCANNER_SKIP_CACHE
value: "0" # ensure cache is usedWith enrichment disabled and a pre-warmed cache, scans will use only cached vulnerability data. Periodically refresh the cache on an internet-connected machine and transfer updated archives to the air-gapped environment.
Further reading
Related documentation for getting started and using the CLI.
- Quickstart -- Install ScanRook and run your first scan in under two minutes.
- CLI Reference -- Complete reference for all subcommands, flags, and environment variables.
- Enrichment -- How ScanRook queries vulnerability databases and merges findings.
- Data Sources -- Full provider table with ecosystem coverage and integration status.