Lab 13 — Detection Authoring & Reporting¶
Hands-on lab · ← Back to the module concept
Setup¶
git clone https://github.com/plaintext-security/plaintext-labs
cd plaintext-labs/malware/13-detection-reporting
make up && make demo
make demo runs the worked-rule validation and then the eval harness: it scores a
YARA rule against a held-out labelled corpus (eval/heldout/ — 5 known-malicious
fixtures the rule must match, 8 benign files it must not, several of them deliberate
near-misses) and applies a regression gate. You watch the gate go GREEN on the precise
rule (recall 100%, precision 100%) and RED on a deliberately over-broad rule
(eval/rules/loader-broad.yar — recall 100% but precision 55.6%: it false-positives on the
benign look-alikes). make eval RULE=<your-rule.yar> scores your own rule + gate; make gate
proves the gate catches the over-broad regression (exits non-zero). All offline and
deterministic — the fixtures are tiny generated files with magic headers and representative
strings, not real malware. Honor system: the gate guards you against a regression, it is
not a grader.
Why synthetic here, when modules 02–12 use real samples? A regression gate must be reproducible and fully labelled — committable so a reviewer can re-run it and get the same score. Real MalwareBazaar samples are neither (you can't commit live malware, and a tag fetch pulls whatever is recent, so the corpus drifts run to run). The discipline this lab teaches — precision vs. recall on a held-out set, and a you-owned label set — depends on a controlled corpus; that is the point, not a shortcut. Want to score against real malware? Bring the real samples you fetched in modules 03–11 (Agent Tesla, GuLoader, the UPX-packed PE) as an extra malicious set, hand-label them, and add them to your own
eval/— but keep the shipped synthetic corpus as the committed, reproducible gate.
Scenario¶
You have completed the full malware analysis: static triage, dynamic behaviour, decompilation, unpacking, and IOC extraction. Now you close the loop. Your deliverables are a validated YARA rule, a validated Sigma rule, and a completed analysis report — all ready to hand to the detection team and the IR lead.
The lab ships:
- data/samples/synth-loader.bin — the benign PE sample from Module 09 (the FNV-hash utility, compiled from source — not malware)
- data/samples/clean-binaries/ — five benign Linux utilities for false-positive testing
- data/examples/synth-loader.yar — a worked YARA rule you will study and extend
- data/examples/synth-beacon.yml — a worked Sigma rule for the beacon behaviour
- data/report-template.md — the analysis report template to fill in
Do¶
- [ ] Study the worked YARA rule.
Open
data/examples/synth-loader.yar. Identify: the string identifiers, the condition logic, and themetasection. What does theuint16(0) == 0x5A4Dcondition test? What would you add to reduce false positives?
Hint: 0x5A4D is the MZ header — it restricts the rule to PE files. A second condition on file size limits scan time.
-
[ ] Validate the worked rule against the corpus. Run
Should match. Then: Should produce no matches. Record the result.make demoto see the full validation flow. Then interactively: -
[ ] Write your own YARA rule. Create
my-detection.yar. Your rule must: - Target the mutex string
SynthLoader_Mutexas a primary string. - Include the XOR key byte sequence from the loader (use
{41 ?? 41}as a hex pattern approximation). - Have a condition of
2 of themso both indicators must be present. - Include
meta: author, date, description, ATT&CK technique IDs.
Test it: match on synth-loader.bin, no match on clean-binaries/.
-
[ ] Grade your rule on the held-out corpus — coverage is not precision. The
Read the scorecard: recall (malicious caught / all malicious), precision (of what fired, how much was truly malicious), and the confusion matrix. A rule that catches all 5 malicious but also fires on the near-misses has great recall and bad precision — and a high-recall, low-precision rule is worse than no rule: it floods the SOC with false positives until an analyst disables it, taking its real detections with it. Now runclean-binaries/test in step 3 is your tuning set: you wrote the rule against the sample you can see, so of course it passes. That is a demo, not a measurement. Score it instead against the held-out set ineval/heldout/— fixtures the rule was never tuned on: 5 known-malicious (variants that drop the mutex, or drop the config header — the rule must still catch them) and 8 benign, several of which are near-misses that share one superficial feature with the family (a legit utility that uses the public FNV-1a basis constant; a config parser that readsCFG0-tagged files; an app whose mutex merely containsLoaderMtx; a plain-text incident note that quotes every IOC string but has no PE/ELF header).make gateand watch the shipped over-broad rule (eval/rules/loader-broad.yar) go RED on exactly that failure — the contrast is the lesson. Tune your rule (gate on the magic header; require a combination of durable IOCs, not a single shared feature) and re-runmake evaluntil the gate is GREEN: 0 missed malicious, 0 false positives. A gate you have only ever seen pass is not a gate. -
[ ] Study the worked Sigma rule. Open
data/examples/synth-beacon.yml. Identify thedetection.keywordsordetection.selectionblock. What field does it match on? What condition logic does it use? -
[ ] Validate the Sigma rule with sigma-cli. Run:
Then convert it to Splunk SPL: Record the SPL output. Would this query fire on the sandbox report's registry event? -
[ ] Write your own Sigma rule. Create
my-persistence.yml— a Sigma rule that detects the Run key persistence observed in the sandbox report: title: Synth Loader Persistence via Run Keylogsource: categoryregistry_set, productwindowsdetection.selection: TargetObject containing\CurrentVersion\Run\SynthUpdater- Include ATT&CK tags:
attack.t1547.001
Validate with sigma check and convert with sigma convert -t splunk.
- [ ] Complete the analysis report.
Fill in
data/report-template.mdusing your findings from all modules (08–13). The executive summary is one paragraph; the technical findings table should include all IOCs from Module 12; the detection section should reference both rules you wrote.
Success criteria — you're done when¶
- [ ]
synth-loader.yar(worked example) matchessynth-loader.binand produces no hits onclean-binaries/. - [ ] Your
my-detection.yarrule passes the same two-part validation. - [ ] You scored
my-detection.yaron the held-out corpus (make eval RULE=my-detection.yar) and have a scorecard (recall + precision), not just a demo anecdote. - [ ] Your regression gate is GREEN on your rule (0 missed malicious, 0 false positives) and you have seen it go RED (
make gate) on the over-broad rule — a gate you've only watched pass isn't a gate. - [ ]
sigma checkreturns no errors forsynth-beacon.ymland yourmy-persistence.yml. - [ ]
sigma convert -t splunkproduces valid SPL for your Sigma rule. - [ ]
analysis-report.mdis filled in and ready to submit.
Deliverables¶
Commit to your portfolio repo:
- my-detection.yar — your YARA rule.
- my-persistence.yml — your Sigma rule.
- analysis-report.md — the completed report.
- Your held-out scorecard + the regression gate (the eval.py + make eval target) — committed so the rule can't silently regress.
- Do not commit the benign sample binaries (they are in .gitignore), the Sigma backend output, or the ATT&CK Navigator JSON from Module 12.
Automate & own it¶
Required. Don't stop at a one-shot validation script — turn your rule into a regression
gate so it can't silently rot into a false-positive machine. Wrap the held-out scoring in an
eval.py that scores any .yar against your held-out corpus and exits non-zero when it
misses a malicious fixture (a false negative) or fires on a benign one (a false positive) —
exactly as a unit test fails on a broken function. Prove it both ways: GREEN on your precise rule,
and RED on the deliberately over-broad rule (the lab ships eval/rules/loader-broad.yar,
eval/eval.py, and a make eval / make gate you can copy and extend). AI drafts the
confusion-matrix arithmetic and the scorecard table; you own the metric choice (precision and
recall — coverage is not precision), the held-out wall (grade on data you never tuned on), and the
gate's fail-closed direction (a missing rule or a broken eval must fail, not silently pass).
Commit eval.py + your held-out corpus + the gate alongside your rule, and run it before shipping
any rule. (Honor system — this gate guards you against regressions; there's no grader.)
AI acceleration¶
Draft the executive summary section of your report by feeding the IOC table and ATT&CK layer to a model: "Write a two-paragraph executive summary for a malware analysis report. The sample is a loader that establishes persistence, injects into svchost.exe, and beacons to a C2 server. Write it for a non-technical executive — no jargon." Review every sentence — models tend to understate severity. Adjust the framing to accurately represent the risk level your technical findings support.
For the eval, ask a model to generate adversarial near-miss benign fixtures — legitimate files crafted to look like the family (a utility that shares the public algorithm constant, a parser that reads the same config tag) — then label each one yourself and verify it really is benign. A model labelling its own test set is the contamination this whole discipline guards against: you own the labels, the metric, and the gate's fail-closed direction.
Connects forward¶
This module is the capstone of Track 04. The detection pipeline you've built — YARA for file-based hunting, Sigma for log-based alerting, and a structured report — feeds directly into Track 02 (Defensive) Module 08 (Detection-as-Code) and Track 02 Module 13, where these rules become part of a pipeline that tests them continuously against new data.
Marketable proof¶
"I can take a malware analysis from raw artifact to validated YARA and Sigma detections and a structured report — including false-positive testing and ATT&CK Navigator output — producing a detection package that a SOC engineer can deploy the same day they receive it."
Stretch¶
Extend my-detection.yar with a second rule that uses the pe module to check the import hash (imphash) of the target binary. Use yara -m to inspect the PE metadata of synth-loader.bin first to get the correct imphash value. This makes the rule more specific and demonstrates use of the YARA PE module.
Comments
Sign in with GitHub to comment. Choose the type: Feedback (errors or suggestions on this page) · Hints (help for fellow learners — no spoilers) · General (anything else).