Module 12 — IOC Extraction & ATT&CK Mapping¶
Type 6 · Reconstruct — parse a Cuckoo/CAPE-shaped sandbox report to extract network IOCs, hashes, and behavioural indicators, structuring them as MISP attributes and mapping each behaviour to an ATT&CK technique, deliverable a MISP-compatible attribute set plus a Navigator layer (graded against the Pyramid of Pain). (Secondary: Tool-Build — the extractor is a reusable script.) Go to the hands-on lab →
Last reviewed: 2026-06
Malware Analysis — turn a pile of analysis artefacts into structured intelligence that defenders and other teams can actually use.
In 60 seconds
An analysis that stays in one analyst's head helps no one — the moment you extract an IOC into a structured, shareable format it reaches SIEMs, firewalls, feeds, and other analysts. Extraction has two levels: mechanical (regex pulls domains/hashes/keys in thirty lines) and analytical (deciding which strings are real IOCs vs. benign system noise — where automation falls short). And IOCs and ATT&CK answer different questions: IOCs say what the malware is, techniques say what it does. The QakBot takedown (CISA AA23-242A) shipped exactly this pair: STIX IOCs plus an ATT&CK map.
Why this matters¶
An analysis that stays inside one analyst's head helps no one. The moment you extract an IOC — a domain, a hash, a registry key — into a structured, shareable format, it becomes available to SIEMs, firewalls, threat feeds, and other analysts. MISP is the most widely deployed open threat intelligence platform; understanding its data model is a prerequisite for contributing to any threat intel programme, and for consuming the feeds that come back. The QakBot takedown is a concrete model of where this work ends up: when the FBI disrupted the botnet (Operation Duck Hunt, August 2023), CISA and the FBI published joint advisory AA23-242A, releasing the QakBot IOCs as downloadable STIX files plus an ATT&CK technique mapping — exactly the Event/Object/Attribute-shaped, machine-consumable intelligence this module's pipeline produces. The structured artefact is what let defenders worldwide block QakBot infrastructure the same day. (CISA AA23-242A — Identification and Disruption of QakBot Infrastructure.)
Objective¶
Parse a realistic sandbox report (JSON-shaped like a Cuckoo/CAPE output), extract network IOCs, file hashes, and behavioural indicators using a Python script, structure them as MISP-compatible attributes, and map each behaviour to a MITRE ATT&CK technique ID using a bundled mapping table.
The core idea¶
The mental model
IOCs and ATT&CK answer two different questions, and conflating them is the classic beginner error.
IOCs describe what the malware is — a SHA-256, a C2 domain, a registry key. ATT&CK
techniques describe what it does — T1055.001 — Process Injection: DLL Injection. You extract
IOCs from the report's strings; you map behaviours (process events, API sequences, network
activity) to the most specific ATT&CK sub-technique the evidence supports. Both feed the report,
but they are distinct artefacts with distinct shelf lives.
Extraction itself has two levels. Mechanical extraction is regex work — pull every string that looks like a domain, IP, URL, MD5, SHA-256, or registry key. Python's re module and a handful of patterns get you 90% there in thirty lines. The remaining 10% is analytical: deciding which extracted strings are real IOCs versus benign system artefacts (Windows update domains, localhost, known-clean hashes). That filtering is where analyst judgment matters and automation falls short.
The gotcha
Not every extracted indicator should fire a block. MISP's to_ids flag controls whether an
attribute is exported to blocking lists — and an IP in a VPN exit range or a domain that may be
shared infrastructure warrants to_ids=False with a note explaining why. Likewise, map to the
most specific sub-technique the evidence supports, not a convenient parent technique: blocking
shared infrastructure or mapping vaguely both produce collateral damage and unactionable intel.
Go deeper: MISP's model and the two-artefact output
MISP is Events (the incident record) → Objects (related-attribute groups, e.g. a file object with
filename + hash + path) → Attributes (individual data points, each with a type, value, and
to_ids flag).
mermaid
flowchart TD
E["Event<br/>(the incident)"] --> O1["Object: file"]
E --> O2["Object: network"]
O1 --> A1["sha256 — to_ids=true"]
O1 --> A2["filename — to_ids=false"]
O2 --> A3["c2-domain — to_ids=true"]
O2 --> A4["shared-IP — to_ids=false"] This module outputs a MISP-compatible JSON and an ATT&CK Navigator layer — a JSON
the free browser tool renders as a colour-coded matrix. Hand that to a detection engineer and it
becomes a prioritised detection backlog. It's the exact shape of the AA23-242A QakBot release:
machine-consumable IOCs (STIX) plus a behaviour-to-ATT&CK map — the two artefacts a defender deploys.
Learn (~2.5 hrs)¶
MISP data model
- MISP Project — "MISP Objects and Attributes" documentation — read the "Core Concepts" and "Attributes" sections; the taxonomy section is optional (~30 min).
- MISP GitHub Repository — Documentation — comprehensive guide to MISP project structure, the event/object/attribute model, and the purpose of the to_ids flag (~20 min).
A real published IOC + ATT&CK release (~20 min) - CISA AA23-242A — Identification and Disruption of QakBot Infrastructure — the joint CISA/FBI advisory from the 2023 QakBot takedown. Read how it packages IOCs as downloadable STIX files alongside an ATT&CK technique table — the exact two-artefact output (machine-consumable indicators + behaviour mapping) this module's pipeline produces.
ATT&CK for malware analysis - MITRE ATT&CK — Using ATT&CK for Cyber Threat Intelligence — the official CTI training; complete Unit 1 (mapping to ATT&CK) which is specifically about analysing malware reports (~45 min). - MITRE ATT&CK Navigator — Getting Started — understand how to use the interactive matrix to visualize and share ATT&CK mappings from analysis (~15 min).
IOC types and their shelf life - David Bianco — The Pyramid of Pain — the original framework for choosing which indicators matter most: the spectrum from hash values (lowest value, trivial for an attacker to change) up through network/host artefacts and tools to TTPs (highest value, costliest to evade). This is the model your extractor's output is graded against (~20 min).
Key concepts¶
- IOC extraction is mechanical (regex) + analytical (filtering benign artefacts).
- MISP: Event → Objects → Attributes;
to_idsflag controls what goes to blocking lists. - ATT&CK maps behaviours, not IOCs; use the most specific sub-technique the evidence supports.
- ATT&CK Navigator layers convert analysis output into a detection prioritisation artefact.
- High-volatility IOCs (IPs, domains) expire fast; hash-based IOCs last until the binary changes.
- The Pyramid of Pain: TTPs > tools > network/host artefacts > hashes for detection durability.
- Real worked campaign: QakBot (CISA/FBI advisory AA23-242A) — the takedown release shipped IOCs as STIX plus an ATT&CK mapping, the same machine-consumable output this module's extraction pipeline produces
AI acceleration¶
Feed the sandbox report JSON to a model. Prompt: "Extract all network IOCs (IPs, domains, URLs), file hashes, registry keys, and process-creation events. For each behavioural event, suggest the most specific ATT&CK sub-technique ID." Review every suggestion: models frequently map behaviours to parent techniques instead of the correct sub-technique, and occasionally invent technique IDs. Verify each ID against attack.mitre.org before committing it to reporting.
AI caveat
A model is strong at the mechanical regex extraction but weak exactly where judgment lives — it maps behaviours to parent techniques instead of the right sub-technique, and occasionally invents technique IDs outright. Verify every ID against attack.mitre.org before it reaches the report.
Check yourself
- A SHA-256 and
T1055.001are both in your report. Which is an IOC and which is a technique, and why does the distinction matter? - When should an extracted indicator carry
to_ids=False, and what do you attach to it? - Why does the durable, high-value intel skew toward TTPs over hashes (the Pyramid of Pain)?
Comments
Sign in with GitHub to comment. Choose the type: Feedback (errors or suggestions on this page) · Hints (help for fellow learners — no spoilers) · General (anything else).