Module 12 — Malware Artifacts in IR¶

Type 9 · Tool-Build — triage a suspicious PE with capa and author a YARA rule that matches a characteristic of the Latrodectus loader, shipping a reusable detection artifact plus an IR-context read of what the capability profile does and doesn't prove. (Secondary: Reconstruct — fit the capabilities into the incident's behavioral story.) Go to the hands-on lab →

Last reviewed: 2026-06

Digital Forensics & IR — triage a suspicious binary without running it — understand what it can do, and hand it off with something worth handing off.

Difficulty: Intermediate · Estimated time: ~3.5–5.5 hrs (study + lab) · Prerequisites: Foundations

In 60 seconds

When IR produces a binary, the team's job isn't to reverse it from scratch — it's to triage it fast enough to drive containment, without running it. CAPA statically maps the binary's code to named capabilities ("communicate over HTTP," "achieve persistence"); YARA turns one sample into a fleet-wide hunt and a shareable IOC. The two are complementary: CAPA scopes containment, YARA operationalizes it. Both tell you what a binary can do — never that it did; that requires the host and network execution evidence from earlier modules.

Why this matters¶

At some point in every significant IR engagement, someone produces a binary. It might be the svchost32.exe dropped by the attacker in the anchor case (the Latrodectus loader) — found in /tmp during live response, extracted from the HTTP session in network forensics, or recovered from unallocated space on the disk image. The IR team's job at that point is not to reverse-engineer it from scratch (that goes to the malware analysis team or Track 04); it's to triage it quickly enough to inform containment decisions. Can it achieve persistence? Does it have network capability? What host artifacts does it create? CAPA and YARA answer those questions in minutes, without running the binary.

A real example of why this triage drives containment: in The DFIR Report's IcedID-to-ransomware case, the initial dropper was an IcedID payload loaded via rundll32 from a batch file hidden inside a mounted ISO. A capability pass over that loader — "executes via rundll32, resolves APIs at runtime, communicates over HTTP, achieves persistence" — is exactly what tells a responder to block outbound to the C2 and hunt for the persistence entry before the attacker pivots to Cobalt Strike and ransomware. IcedID is also a real, abundantly-sampled malware family (search the file hashes from that report on VirusTotal), which makes it the kind of binary CAPA's capability rules and a hunting YARA rule are actually written against.

Objective¶

Run capa over sample PE binaries to identify their capability profile, write a YARA rule that matches a specific characteristic of the Latrodectus loader, and interpret the combined output in an IR context — what does the capability profile tell the triage analyst, and what does it not tell them?

The core idea¶

CAPA maps a binary's behaviour to a vocabulary of named capabilities — "execute via scheduled task," "create process," "resolve API at runtime," "communicate over HTTP." It does this statically: it reads the binary, identifies code patterns, and matches them against a library of hundreds of rules that map patterns to capabilities. No sandbox, no execution, no infection risk. The output is not "this is malware" — it's "this binary has the code to do these things." That distinction matters enormously: a benign installer may have the capability to modify the registry without being malicious. CAPA tells you what doors are open; it doesn't tell you whether the attacker walked through them.

The mental model

CAPA answers "what doors does this binary have?" — statically, no execution, no infection risk. YARA answers "where else does this exact thing live?" — a signature that turns one sample into a fleet-wide retroactive hunt and a shareable IOC. Triage with CAPA to scope containment; ship a YARA rule to operationalize it. Different questions, complementary tools.

YARA takes a different approach: it matches files against signatures you write. A YARA rule can match on a string ("the exact C2 URL"), a byte pattern ("the packer's entry point stub"), or a combination of conditions ("PE file AND imports CryptEncrypt AND contains the string 'update-cdn82.net'"). YARA is the tool that lets you turn a single sample into a hunt: after the IR team characterises the dropper, the YARA rule becomes the query that looks for it — or for related samples — across every binary on every managed endpoint and in every sample submitted to the threat intelligence feed. It is also the format in which threat intelligence is shared; a published YARA rule is an operationalisable IOC that any team can deploy.

The investigative workflow combines both. CAPA runs first as a broad triage: "does this binary have network capability? persistence capability? anti-analysis tricks?" If the answer is "yes and yes," you have a confirmed malicious dropper and the capability list tells you the scope of containment (block outbound, isolate, check for persistence entries). Then YARA produces the specific signature that operationalises the IOC for the threat intel team to distribute and for the EDR to search retroactively. The two tools address different questions and are complementary, not redundant.

The gotcha

Neither tool tells you whether a capability was exercised. "Can phone home" is not "did phone home" — a binary with network code may have made zero connections. Treat the capability profile as scope, not proof: the finding only becomes "this binary phoned home at 14:21" when you join CAPA's "can" to a network session in conn.log and an execution record (prefetch, 4688). Report "can," not "did," until the artifacts close the gap.

One thing neither tool tells you: whether the capability was actually exercised. A binary that can make network connections may have made none. Connecting the capability profile to actual execution requires the host and network artifacts from earlier modules — the process execution record (prefetch, event log 4688), the network session in conn.log, the file written to disk. The full picture is always the synthesis: CAPA says "this can phone home"; network forensics says "this IP at this time did phone home"; the merged finding says "this binary phoned home at 14:21."

flowchart LR
    C["CAPA: binary <i>can</i><br/>communicate over HTTP"] --> J{join}
    X["Execution evidence<br/>(prefetch / 4688)"] --> J
    N["Network session<br/>(conn.log)"] --> J
    J --> F["Finding: it <i>did</i><br/>— phoned home at 14:21"]

Go deeper: where YARA rules go wrong

A YARA rule lives or dies on its conditions. Too specific (a string lifted verbatim from one sample) and it misses the next variant that shifted two bytes; too broad (one common string) and it fires on thousands of benign files. The discipline is anchoring patterns to specific imports or sections, testing against benign samples first, and tuning to the false-positive rate your environment tolerates — a rule that fires on every PDF viewer is a noise machine, not an IOC.

AI caveat

YARA generation is a strong model use case — describe the sample's characteristics and get a draft rule — but the output is usually overfit (strings too specific) or too broad (one string matching thousands of benign files). Review every condition: test against the benign set, confirm it matches the target, and tune the false-positive rate yourself.

Learn (~2.5 hrs)¶

CAPA (~1 hr) - CAPA GitHub — README and documentation — start with the "Getting Started" section and run the provided examples. Understand the rule format so you can read what matches a finding (each CAPA result cites the rule that triggered it). - CAPA Rules repository — browse ten or fifteen rules to understand how capabilities are defined. Look at the "persistence" and "communication" categories — these are the first things to check in triage.

YARA (~1 hr) - YARA Documentation — Writing Rules — the authoritative reference: strings, conditions, modules. The "PE module" section is essential for writing rules that match specific PE characteristics (imports, sections, headers). - YARA Best Practices — Detecting specific strings and patterns — guide to avoiding false positives: use case-insensitive matching carefully, anchor patterns to specific imports or sections, and test rules against benign samples first.

IR integration (~0.5 hrs) - MITRE ATT&CK — T1059 (Command and Scripting Interpreter) and T1547 (Boot or Logon Autostart) — the ATT&CK pages for the techniques the loader implements; understanding the technique helps you interpret what CAPA is telling you. - The DFIR Report — "Malicious ISO File Leads to Domain Wide Ransomware" — a real intrusion that opens with an IcedID dropper (rundll32-loaded from a hidden batch file). Read the "Execution" and "Initial Access" sections (~15 min) to see the capability profile of a real dropper and the IOCs (hashes) that a YARA rule operationalises.

Key concepts¶

CAPA: static capability profiling — maps binary code patterns to named capabilities without execution
YARA: signature matching — strings, byte patterns, and conditions against binary files
CAPA output: capability name + mapped ATT&CK technique + code location; interpret as "can do X," not "did do X"
YARA rules: turn a single sample's characteristic into a retroactive hunt across the fleet
IR workflow: CAPA triage → YARA rule → threat intel share → EDR retroactive search
The full finding requires combining capability profile with host and network execution evidence
Real droppers (e.g. IcedID in the DFIR Report case) are abundantly sampled — the same capability/IOC triage maps onto genuine families, not just lab binaries

AI acceleration¶

YARA rule generation is an excellent model use case: describe the sample's characteristics ("PE file that imports CreateRemoteThread, contains the string 'update-cdn82.net', and has a section named .upx0") and ask the model to draft the rule. The model's output is usually structurally correct but often overfit to the exact sample (using strings that are too specific) or too broad (using a single string that would match thousands of benign files). Review every condition the model includes: test it against the benign sample set in the lab, confirm it matches the target, and tune for the false-positive rate your environment can tolerate. A YARA rule that fires on every PDF viewer in the enterprise is not an IOC — it's a noise machine.

Check yourself

CAPA reports a binary "can communicate over HTTP." Why is that not yet a finding that it phoned home, and what would make it one?
When do you reach for CAPA versus YARA in a triage workflow, and what does each produce?
A teammate's YARA rule matches the dropper but also fires on every UPX-packed installer. What went wrong, and how do you tighten it?

Comments

Sign in with GitHub to comment. Choose the type: Feedback (errors or suggestions on this page) · Hints (help for fellow learners — no spoilers) · General (anything else).