Lab 12 — IOC Extraction & ATT&CK Mapping¶
Hands-on lab · ← Back to the module concept
Setup¶
git clone https://github.com/plaintext-security/plaintext-labs
cd plaintext-labs/malware/12-ioc-extraction-attck
make up && make demo
Scenario¶
Your IR team has received a CAPE sandbox report for the artifact from Module 11. The report is a JSON file with process events, network connections, file operations, and registry modifications. Your task is to extract the IOCs, structure them in MISP format, map the behaviours to ATT&CK, and produce a navigator layer ready to hand to the detection team.
The lab ships:
- data/sandbox-report.json — a realistic synthetic sandbox report (CAPE-format JSON; all IOCs fabricated)
- data/attck-mapping.json — a mapping from behaviour categories to ATT&CK technique IDs
- extract_iocs.py — a Python extractor script (you will complete it as part of the lab)
Opt-in real report. make fetch-report (needs a free abuse.ch Auth-Key in MB_AUTH_KEY) pulls a real CAPE/Hatching-Triage sandbox report for a live Agent Tesla sample (MITRE S0331) straight from MalwareBazaar's vendor_intel — report JSON only, the binary is never downloaded — into a gitignored reports/. Run the extractor against it (python3 /lab/extract_iocs.py /lab/reports/<sha>.cape.json) to practise on genuine sandbox output. The lab works fully without it against the bundled synthetic report.
Do¶
- [ ] Explore the sandbox report structure.
Run
make shelland:python3 -c "import json; r=json.load(open('/lab/data/sandbox-report.json')); print(list(r.keys()))". Identify the top-level keys. Which keys contain network data? Which contain process events?
Hint: look for network, processes, behavior, signatures.
-
[ ] Run the extractor and review the output. Run
python3 /lab/extract_iocs.py /lab/data/sandbox-report.json. It produces two files:iocs.json(MISP-compatible attributes) andattck-layer.json(ATT&CK Navigator layer). Read the first 30 lines of each. -
[ ] Inspect the MISP attribute output. Open
iocs.json. For each attribute, check: is the type correct (domain, ip-dst, sha256, regkey)? Doesto_idsmatch the IOC's reliability? Should any IPs be set toto_ids: false? Make notes. -
[ ] Inspect the ATT&CK layer. Open
attck-layer.json. Count the number of techniques mapped. Upload the file to the ATT&CK Navigator at https://mitre-attack.github.io/attack-navigator/ (File → Open Existing Layer from URL/file). Take a screenshot or note which techniques are highlighted.
If Navigator is not accessible from your network, verify the JSON structure manually: it should have techniques[] with techniqueID and color fields.
-
[ ] Assess IOC quality using the Pyramid of Pain. From your
iocs.json, categorise each IOC by Pyramid level: hash (bottom), network artefacts (middle), TTPs (top). How many of each level do you have? Which IOCs will expire fastest if the attacker rotates infrastructure? -
[ ] Extend the extractor with one new IOC type. The report contains mutex names in
behavior.mutexes. Add extraction for mutexes toextract_iocs.py(MISP type:mutex). Re-run and confirm they appear in the output.
Success criteria — you're done when¶
- [ ]
iocs.jsoncontains at least one attribute of each type: domain, ip-dst, sha256, url, regkey. - [ ]
attck-layer.jsonis valid ATT&CK Navigator JSON with at least 5 technique entries. - [ ] You have assessed
to_idsfor each IP and noted any that should be false. - [ ] Mutex extraction is added and tested.
- [ ] Pyramid of Pain categorisation is in your notes.
Deliverables¶
Commit to your portfolio repo:
- extract_iocs.py — your extended extractor with mutex support.
- iocs-summary.md — the IOC table (attribute type, value, to_ids, ATT&CK ID) and Pyramid of Pain categorisation.
- Do not commit attck-layer.json (it is large) or the sandbox report JSON.
Automate & own it¶
Required. Extend extract_iocs.py with a --format misp-csv option that outputs the IOCs as a CSV ready for bulk import into MISP (columns: type, value, to_ids, comment, tags). Draft the argument parsing with AI assistance, then manually verify the CSV output is correctly delimited and that none of the IOC values contain unescaped commas. Commit the updated script.
AI acceleration¶
Paste the first 100 lines of sandbox-report.json into a model. Prompt: "Map the process creation events in this sandbox report to ATT&CK sub-techniques. For each mapping, cite the specific process name and parent-child relationship as evidence." Cross-check each suggested technique ID against attack.mitre.org. Correct any parent-only mappings by navigating to the sub-technique list.
Connects forward¶
Module 13 uses the IOCs and ATT&CK mappings from this module as the input for YARA and Sigma rule authoring. The iocs.json file you produce here becomes the seed for the YARA hash metadata and the Sigma field values.
Marketable proof¶
"I can parse a sandbox report, extract structured IOCs in MISP format, map behaviours to ATT&CK sub-techniques, and produce an ATT&CK Navigator layer — delivering analyst-ready intelligence in a format that integrates directly into a threat intel programme."
Stretch¶
Write a second script pivot_iocs.py that takes iocs.json as input and, for each domain IOC, performs a DNS lookup and adds the resolved IPs as new ip-dst attributes with to_ids=false (since resolved IPs may be shared CDN infrastructure). Use dnspython for the lookups. Test against the fake domains in the report to confirm it handles NXDOMAIN gracefully.
Comments
Sign in with GitHub to comment. Choose the type: Feedback (errors or suggestions on this page) · Hints (help for fellow learners — no spoilers) · General (anything else).