Lab 13 — Detection Authoring & Reporting¶

Hands-on lab · ← Back to the module concept

Setup¶

git clone https://github.com/plaintext-security/plaintext-labs
cd plaintext-labs/malware/13-detection-reporting
make up && make demo

make demo runs the worked-rule validation and then the eval harness: it scores a YARA rule against a held-out labelled corpus (eval/heldout/ — 5 known-malicious fixtures the rule must match, 8 benign files it must not, several of them deliberate near-misses) and applies a regression gate. You watch the gate go GREEN on the precise rule (recall 100%, precision 100%) and RED on a deliberately over-broad rule (eval/rules/loader-broad.yar — recall 100% but precision 55.6%: it false-positives on the benign look-alikes). make eval RULE=<your-rule.yar> scores your own rule + gate; make gate proves the gate catches the over-broad regression (exits non-zero). All offline and deterministic — the fixtures are tiny generated files with magic headers and representative strings, not real malware. Honor system: the gate guards you against a regression, it is not a grader.

Why synthetic here, when modules 02–12 use real samples? A regression gate must be reproducible and fully labelled — committable so a reviewer can re-run it and get the same score. Real MalwareBazaar samples are neither (you can't commit live malware, and a tag fetch pulls whatever is recent, so the corpus drifts run to run). The discipline this lab teaches — precision vs. recall on a held-out set, and a you-owned label set — depends on a controlled corpus; that is the point, not a shortcut. Want to score against real malware? Bring the real samples you fetched in modules 03–11 (Agent Tesla, GuLoader, the UPX-packed PE) as an extra malicious set, hand-label them, and add them to your own eval/ — but keep the shipped synthetic corpus as the committed, reproducible gate.

Scenario¶

You have completed the full malware analysis: static triage, dynamic behaviour, decompilation, unpacking, and IOC extraction. Now you close the loop. Your deliverables are a validated YARA rule, a validated Sigma rule, and a completed analysis report — all ready to hand to the detection team and the IR lead.

The lab ships: - data/samples/synth-loader.bin — the benign PE sample from Module 09 (the FNV-hash utility, compiled from source — not malware) - data/samples/clean-binaries/ — five benign Linux utilities for false-positive testing - data/examples/synth-loader.yar — a worked YARA rule you will study and extend - data/examples/synth-beacon.yml — a worked Sigma rule for the beacon behaviour - data/report-template.md — the analysis report template to fill in

Do¶

[ ] Study the worked YARA rule. Open data/examples/synth-loader.yar. Identify: the string identifiers, the condition logic, and the meta section. What does the uint16(0) == 0x5A4D condition test? What would you add to reduce false positives?

Hint: 0x5A4D is the MZ header — it restricts the rule to PE files. A second condition on file size limits scan time.

[ ] Validate the worked rule against the corpus. Run make demo to see the full validation flow. Then interactively:
```
yara /lab/data/examples/synth-loader.yar /lab/data/samples/synth-loader.bin
```
Should match. Then:
```
yara /lab/data/examples/synth-loader.yar /lab/data/samples/clean-binaries/
```
Should produce no matches. Record the result.
[ ] Write your own YARA rule. Create my-detection.yar. Your rule must:
Target the mutex string SynthLoader_Mutex as a primary string.
Include the XOR key byte sequence from the loader (use {41 ?? 41} as a hex pattern approximation).
Have a condition of 2 of them so both indicators must be present.
Include meta: author, date, description, ATT&CK technique IDs.

Test it: match on synth-loader.bin, no match on clean-binaries/.

[ ] Grade your rule on the held-out corpus — coverage is not precision. The clean-binaries/ test in step 3 is your tuning set: you wrote the rule against the sample you can see, so of course it passes. That is a demo, not a measurement. Score it instead against the held-out set in eval/heldout/ — fixtures the rule was never tuned on: 5 known-malicious (variants that drop the mutex, or drop the config header — the rule must still catch them) and 8 benign, several of which are near-misses that share one superficial feature with the family (a legit utility that uses the public FNV-1a basis constant; a config parser that reads CFG0-tagged files; an app whose mutex merely contains LoaderMtx; a plain-text incident note that quotes every IOC string but has no PE/ELF header).
```
make eval RULE=my-detection.yar
```
Read the scorecard: recall (malicious caught / all malicious), precision (of what fired, how much was truly malicious), and the confusion matrix. A rule that catches all 5 malicious but also fires on the near-misses has great recall and bad precision — and a high-recall, low-precision rule is worse than no rule: it floods the SOC with false positives until an analyst disables it, taking its real detections with it. Now run make gate and watch the shipped over-broad rule (eval/rules/loader-broad.yar) go RED on exactly that failure — the contrast is the lesson. Tune your rule (gate on the magic header; require a combination of durable IOCs, not a single shared feature) and re-run make eval until the gate is GREEN: 0 missed malicious, 0 false positives. A gate you have only ever seen pass is not a gate.
[ ] Study the worked Sigma rule. Open data/examples/synth-beacon.yml. Identify the detection.keywords or detection.selection block. What field does it match on? What condition logic does it use?
[ ] Validate the Sigma rule with sigma-cli. Run:
```
sigma check /lab/data/examples/synth-beacon.yml
```
Then convert it to Splunk SPL:
```
sigma convert -t splunk /lab/data/examples/synth-beacon.yml
```
Record the SPL output. Would this query fire on the sandbox report's registry event?
[ ] Write your own Sigma rule. Create my-persistence.yml — a Sigma rule that detects the Run key persistence observed in the sandbox report:
title: Synth Loader Persistence via Run Key
logsource: category registry_set, product windows
detection.selection: TargetObject containing \CurrentVersion\Run\SynthUpdater
Include ATT&CK tags: attack.t1547.001

Validate with sigma check and convert with sigma convert -t splunk.

[ ] Complete the analysis report. Fill in data/report-template.md using your findings from all modules (08–13). The executive summary is one paragraph; the technical findings table should include all IOCs from Module 12; the detection section should reference both rules you wrote.

Success criteria — you're done when¶

[ ] synth-loader.yar (worked example) matches synth-loader.bin and produces no hits on clean-binaries/.
[ ] Your my-detection.yar rule passes the same two-part validation.
[ ] You scored my-detection.yar on the held-out corpus (make eval RULE=my-detection.yar) and have a scorecard (recall + precision), not just a demo anecdote.
[ ] Your regression gate is GREEN on your rule (0 missed malicious, 0 false positives) and you have seen it go RED (make gate) on the over-broad rule — a gate you've only watched pass isn't a gate.
[ ] sigma check returns no errors for synth-beacon.yml and your my-persistence.yml.
[ ] sigma convert -t splunk produces valid SPL for your Sigma rule.
[ ] analysis-report.md is filled in and ready to submit.

Deliverables¶

Commit to your portfolio repo: - my-detection.yar — your YARA rule. - my-persistence.yml — your Sigma rule. - analysis-report.md — the completed report. - Your held-out scorecard + the regression gate (the eval.py + make eval target) — committed so the rule can't silently regress. - Do not commit the benign sample binaries (they are in .gitignore), the Sigma backend output, or the ATT&CK Navigator JSON from Module 12.

Automate & own it¶

Required. Don't stop at a one-shot validation script — turn your rule into a regression gate so it can't silently rot into a false-positive machine. Wrap the held-out scoring in an eval.py that scores any .yar against your held-out corpus and exits non-zero when it misses a malicious fixture (a false negative) or fires on a benign one (a false positive) — exactly as a unit test fails on a broken function. Prove it both ways: GREEN on your precise rule, and RED on the deliberately over-broad rule (the lab ships eval/rules/loader-broad.yar, eval/eval.py, and a make eval / make gate you can copy and extend). AI drafts the confusion-matrix arithmetic and the scorecard table; you own the metric choice (precision and recall — coverage is not precision), the held-out wall (grade on data you never tuned on), and the gate's fail-closed direction (a missing rule or a broken eval must fail, not silently pass). Commit eval.py + your held-out corpus + the gate alongside your rule, and run it before shipping any rule. (Honor system — this gate guards you against regressions; there's no grader.)

AI acceleration¶

Draft the executive summary section of your report by feeding the IOC table and ATT&CK layer to a model: "Write a two-paragraph executive summary for a malware analysis report. The sample is a loader that establishes persistence, injects into svchost.exe, and beacons to a C2 server. Write it for a non-technical executive — no jargon." Review every sentence — models tend to understate severity. Adjust the framing to accurately represent the risk level your technical findings support.

For the eval, ask a model to generate adversarial near-miss benign fixtures — legitimate files crafted to look like the family (a utility that shares the public algorithm constant, a parser that reads the same config tag) — then label each one yourself and verify it really is benign. A model labelling its own test set is the contamination this whole discipline guards against: you own the labels, the metric, and the gate's fail-closed direction.

Connects forward¶

This module is the capstone of Track 04. The detection pipeline you've built — YARA for file-based hunting, Sigma for log-based alerting, and a structured report — feeds directly into Track 02 (Defensive) Module 08 (Detection-as-Code) and Track 02 Module 13, where these rules become part of a pipeline that tests them continuously against new data.

Marketable proof¶

"I can take a malware analysis from raw artifact to validated YARA and Sigma detections and a structured report — including false-positive testing and ATT&CK Navigator output — producing a detection package that a SOC engineer can deploy the same day they receive it."

Stretch¶

Extend my-detection.yar with a second rule that uses the pe module to check the import hash (imphash) of the target binary. Use yara -m to inspect the PE metadata of synth-loader.bin first to get the correct imphash value. This makes the rule more specific and demonstrates use of the YARA PE module.

Comments

Sign in with GitHub to comment. Choose the type: Feedback (errors or suggestions on this page) · Hints (help for fellow learners — no spoilers) · General (anything else).