Module 12 — IOC Extraction & ATT&CK Mapping¶

Type 6 · Reconstruct — parse a Cuckoo/CAPE-shaped sandbox report to extract network IOCs, hashes, and behavioural indicators, structuring them as MISP attributes and mapping each behaviour to an ATT&CK technique, deliverable a MISP-compatible attribute set plus a Navigator layer (graded against the Pyramid of Pain). (Secondary: Tool-Build — the extractor is a reusable script.) Go to the hands-on lab →

Last reviewed: 2026-06

Malware Analysis — turn a pile of analysis artefacts into structured intelligence that defenders and other teams can actually use.

Difficulty: Intermediate · Estimated time: ~3.5–5.5 hrs (study + lab) · Prerequisites: Foundations

In 60 seconds

An analysis that stays in one analyst's head helps no one — the moment you extract an IOC into a structured, shareable format it reaches SIEMs, firewalls, feeds, and other analysts. Extraction has two levels: mechanical (regex pulls domains/hashes/keys in thirty lines) and analytical (deciding which strings are real IOCs vs. benign system noise — where automation falls short). And IOCs and ATT&CK answer different questions: IOCs say what the malware is, techniques say what it does. The QakBot takedown (CISA AA23-242A) shipped exactly this pair: STIX IOCs plus an ATT&CK map.

Why this matters¶

An analysis that stays inside one analyst's head helps no one. The moment you extract an IOC — a domain, a hash, a registry key — into a structured, shareable format, it becomes available to SIEMs, firewalls, threat feeds, and other analysts. MISP is the most widely deployed open threat intelligence platform; understanding its data model is a prerequisite for contributing to any threat intel programme, and for consuming the feeds that come back. The QakBot takedown is a concrete model of where this work ends up: when the FBI disrupted the botnet (Operation Duck Hunt, August 2023), CISA and the FBI published joint advisory AA23-242A, releasing the QakBot IOCs as downloadable STIX files plus an ATT&CK technique mapping — exactly the Event/Object/Attribute-shaped, machine-consumable intelligence this module's pipeline produces. The structured artefact is what let defenders worldwide block QakBot infrastructure the same day. (CISA AA23-242A — Identification and Disruption of QakBot Infrastructure.)

Objective¶

Parse a realistic sandbox report (JSON-shaped like a Cuckoo/CAPE output), extract network IOCs, file hashes, and behavioural indicators using a Python script, structure them as MISP-compatible attributes, and map each behaviour to a MITRE ATT&CK technique ID using a bundled mapping table.

The core idea¶

The mental model

IOCs and ATT&CK answer two different questions, and conflating them is the classic beginner error. IOCs describe what the malware is — a SHA-256, a C2 domain, a registry key. ATT&CK techniques describe what it does — T1055.001 — Process Injection: DLL Injection. You extract IOCs from the report's strings; you map behaviours (process events, API sequences, network activity) to the most specific ATT&CK sub-technique the evidence supports. Both feed the report, but they are distinct artefacts with distinct shelf lives.

Extraction itself has two levels. Mechanical extraction is regex work — pull every string that looks like a domain, IP, URL, MD5, SHA-256, or registry key. Python's re module and a handful of patterns get you 90% there in thirty lines. The remaining 10% is analytical: deciding which extracted strings are real IOCs versus benign system artefacts (Windows update domains, localhost, known-clean hashes). That filtering is where analyst judgment matters and automation falls short.

The gotcha

Not every extracted indicator should fire a block. MISP's to_ids flag controls whether an attribute is exported to blocking lists — and an IP in a VPN exit range or a domain that may be shared infrastructure warrants to_ids=False with a note explaining why. Likewise, map to the most specific sub-technique the evidence supports, not a convenient parent technique: blocking shared infrastructure or mapping vaguely both produce collateral damage and unactionable intel.

Go deeper: MISP's model and the two-artefact output

MISP is Events (the incident record) → Objects (related-attribute groups, e.g. a file object with filename + hash + path) → Attributes (individual data points, each with a type, value, and to_ids flag).

mermaid flowchart TD E["Event<br/>(the incident)"] --> O1["Object: file"] E --> O2["Object: network"] O1 --> A1["sha256 — to_ids=true"] O1 --> A2["filename — to_ids=false"] O2 --> A3["c2-domain — to_ids=true"] O2 --> A4["shared-IP — to_ids=false"] This module outputs a MISP-compatible JSON and an ATT&CK Navigator layer — a JSON the free browser tool renders as a colour-coded matrix. Hand that to a detection engineer and it becomes a prioritised detection backlog. It's the exact shape of the AA23-242A QakBot release: machine-consumable IOCs (STIX) plus a behaviour-to-ATT&CK map — the two artefacts a defender deploys.

Learn (~2.5 hrs)¶

MISP data model - MISP Project — "MISP Objects and Attributes" documentation — read the "Core Concepts" and "Attributes" sections; the taxonomy section is optional (~30 min). - MISP GitHub Repository — Documentation — comprehensive guide to MISP project structure, the event/object/attribute model, and the purpose of the to_ids flag (~20 min).

A real published IOC + ATT&CK release (~20 min) - CISA AA23-242A — Identification and Disruption of QakBot Infrastructure — the joint CISA/FBI advisory from the 2023 QakBot takedown. Read how it packages IOCs as downloadable STIX files alongside an ATT&CK technique table — the exact two-artefact output (machine-consumable indicators + behaviour mapping) this module's pipeline produces.

ATT&CK for malware analysis - MITRE ATT&CK — Using ATT&CK for Cyber Threat Intelligence — the official CTI training; complete Unit 1 (mapping to ATT&CK) which is specifically about analysing malware reports (~45 min). - MITRE ATT&CK Navigator — Getting Started — understand how to use the interactive matrix to visualize and share ATT&CK mappings from analysis (~15 min).

IOC types and their shelf life - David Bianco — The Pyramid of Pain — the original framework for choosing which indicators matter most: the spectrum from hash values (lowest value, trivial for an attacker to change) up through network/host artefacts and tools to TTPs (highest value, costliest to evade). This is the model your extractor's output is graded against (~20 min).

Key concepts¶

IOC extraction is mechanical (regex) + analytical (filtering benign artefacts).
MISP: Event → Objects → Attributes; to_ids flag controls what goes to blocking lists.
ATT&CK maps behaviours, not IOCs; use the most specific sub-technique the evidence supports.
ATT&CK Navigator layers convert analysis output into a detection prioritisation artefact.
High-volatility IOCs (IPs, domains) expire fast; hash-based IOCs last until the binary changes.
The Pyramid of Pain: TTPs > tools > network/host artefacts > hashes for detection durability.
Real worked campaign: QakBot (CISA/FBI advisory AA23-242A) — the takedown release shipped IOCs as STIX plus an ATT&CK mapping, the same machine-consumable output this module's extraction pipeline produces

AI acceleration¶

Feed the sandbox report JSON to a model. Prompt: "Extract all network IOCs (IPs, domains, URLs), file hashes, registry keys, and process-creation events. For each behavioural event, suggest the most specific ATT&CK sub-technique ID." Review every suggestion: models frequently map behaviours to parent techniques instead of the correct sub-technique, and occasionally invent technique IDs. Verify each ID against attack.mitre.org before committing it to reporting.

AI caveat

A model is strong at the mechanical regex extraction but weak exactly where judgment lives — it maps behaviours to parent techniques instead of the right sub-technique, and occasionally invents technique IDs outright. Verify every ID against attack.mitre.org before it reaches the report.

Check yourself

A SHA-256 and T1055.001 are both in your report. Which is an IOC and which is a technique, and why does the distinction matter?
When should an extracted indicator carry to_ids=False, and what do you attach to it?
Why does the durable, high-value intel skew toward TTPs over hashes (the Pyramid of Pain)?

Comments

Sign in with GitHub to comment. Choose the type: Feedback (errors or suggestions on this page) · Hints (help for fellow learners — no spoilers) · General (anything else).