Lab 11 — Document & Script Malware¶
Hands-on lab · ← Back to the module concept
Setup¶
git clone https://github.com/plaintext-security/plaintext-labs
cd plaintext-labs/malware/11-document-script-malware
make up
make fetch-sample # pulls a real maldoc from MalwareBazaar into the isolated container
make demo
⚠ This lab analyzes a live malicious document. Handle it accordingly. - Static only — never open it. A maldoc is live malware. Do not open the document in Word/Excel/a viewer, do not enable macros, and do not execute the VBA or the PowerShell it drops. You only ever parse it:
olevbaextracts and reads the macro source;oleid/pdfidcount risk indicators. Reading the macro is safe; running it is not. - Isolation. All work stays inside the isolated container; never copy the sample to your host or to a Windows machine where a double-click could detonate it. - Hygiene. The sample is fetched at lab time (password-protected zip, passwordinfected) and is never committed —.gitignorecoverssamples/.make fetch-sampleneeds a free abuse.ch Auth-Key (setMB_AUTH_KEY). Default tag ismaldoc; for the Emotet downloader specifically, runTAG=Emotet make fetch-sample. - Offline fallback. No key / MalwareBazaar unreachable? Skipmake fetch-sample;make demofalls back to the bundled syntheticsuspicious.doc/suspicious.pdf/obfuscated.ps1— legible, deterministic, and benign.
Scenario¶
A triage queue drops a Word attachment an email gateway flagged as a likely maldoc — most likely Emotet, the modular downloader that for years arrived as exactly this: a phishing Office document whose macro launches PowerShell to pull the next stage. Before anyone opens it (and nobody opens it), the team needs a static dissection for the case ticket: extract the embedded VBA macro without running it, deobfuscate it, and recover the PowerShell downloader and its URLs/IOCs from the source alone. Your job is to produce that dissection with olevba (Decalage's oletools), map what you find to Emotet's delivery chain — T1566.001 (spearphishing attachment) → T1059.005 (VBA) → T1059.001 (PowerShell) — and write the IOC table the intel team can action.
Throughout, the sample = the real maldoc that
make fetch-sampledrops insamples/(path printed by the target). In offline mode it is the bundled syntheticsuspicious.doc(plussuspicious.pdfandobfuscated.ps1) — benign, but structured to exercise the same olevba/pdfid signals a real maldoc trips.
Do¶
- [ ] Fingerprint the document with
oleidbefore you touch the macro. Runoleidagainst the sample. Record the file type (OLE2 vs OpenXML), and which indicators it raises (VBA macros present, encrypted, external relationships/links, Flash/objects).oleidis your go/no-go: it tells you whether there is a macro worth extracting before you spend time on it.
Hint: oleid /lab/samples/<sample> (real) or oleid /lab/data/suspicious.doc (offline). For a real maldoc you do not know the structure — that is the point.
- [ ] Extract and read the VBA macro with
olevba— never run it. Runolevbaon the sample. It pulls the macro source out of the OLE/OpenXML container statically and deobfuscates common tricks (string concatenation,Chr()sequences, base64). Record: the auto-execute handler (AutoOpen/Document_Open/Workbook_Open), everyShell/CreateObject/WScript/powershellreference, and olevba's flagged IOCs and risk lines. What does the macro intend to do on open?
Hint: olevba /lab/samples/<sample>. Read the VBA MACRO, ANALYSIS, and IOC sections. Emotet-style macros stage and launch a PowerShell one-liner — find where the macro hands off to PowerShell. You are reading source, not executing anything.
- [ ] Recover the obfuscated PowerShell downloader from the macro.
The macro typically does not contain the URLs in cleartext — it builds a base64 / split-string PowerShell command and passes it to
powershell -enc(or concatenates an encoded blob). Pull that command out of the olevba output, base64-decode it (UTF-16LE — PowerShell's native encoding), and recover the staging URLs / file paths / IOCs it would have fetched. Use the bundled helper as a starting point:python3 /lab/data/decode_ps1.py <file>.
Hint: a real Emotet maldoc lists several fallback download URLs. For the offline synthetic, obfuscated.ps1 carries one clearly-fake URL so you can see the exact decode pattern (FromBase64String → Unicode.GetString).
- [ ] (If your sample is a PDF) triage it with
pdfid. Some maldoc campaigns ship a PDF lure instead. Runpdfidon the sample and record counts for/JS,/JavaScript,/OpenAction,/AA,/EmbeddedFile,/Launch. Which combination auto-runs code on open?
Hint: /JS + /OpenAction together means JavaScript executes automatically when the PDF is opened. Use the bundled suspicious.pdf if your fetched sample is a Word doc.
- [ ] Build the IOC table. For each artifact, record: SHA-256 hash, auto-execution trigger, embedded script/payload type (VBA → PowerShell, PDF JS, etc.), the decoded staging URLs / paths, and the ATT&CK technique IDs. Anchor the chain to Emotet: T1566.001 (delivery) → T1059.005 (VBA macro) → T1059.001 (PowerShell) → T1547.001 (Run-key persistence, if present). This is your deliverable.
Success criteria — you're done when¶
- [ ]
oleidcorrectly identifies the container type and flags the VBA macro before extraction. - [ ]
olevbaoutput shows the extracted macro source and the auto-execute handler — read, never run. - [ ] The obfuscated PowerShell command is base64-decoded and its staging URLs/IOCs recovered.
- [ ] (PDF sample)
pdfidshows/JSand/OpenActioncounts greater than 0. - [ ]
ioc-table.mdcontains every artifact with SHA-256, trigger, payload type, decoded IOCs, and ATT&CK IDs mapped to the Emotet delivery chain.
Deliverables¶
Commit to your portfolio repo:
- ioc-table.md — a markdown table with columns: Artifact, SHA-256, Trigger, Payload Type, Decoded IOCs, ATT&CK IDs.
- doc_triage.sh — the triage script from Automate & own it.
- Do not commit the real maldoc (or the synthetic .doc/.pdf/.ps1), and do not commit screenshots of the macro source. The decoded IOC strings in your table are fine; the live artifact never leaves the container.
Automate & own it¶
Required. Write doc_triage.sh — a shell script that:
1. Accepts a file path as an argument.
2. Computes and prints the SHA-256.
3. Detects the file type (file command).
4. If OLE/OpenXML (.doc/.docx/.docm/.xls/.xlsm/.rtf), runs olevba --detect (or oleid) and prints the result — never opening or executing the document.
5. If .pdf, runs pdfid and greps for /JS, /OpenAction, /EmbeddedFile, /Launch.
6. If .ps1 (or an extracted PowerShell blob), searches for FromBase64String / -enc and prints the line number.
7. Prints a one-line triage verdict: [HIGH], [MEDIUM], or [LOW] based on what it found.
Draft with AI assistance — then read every line: a triage script that opens the file instead of parsing it is the one mistake that detonates the sample. Test against the bundled synthetic files (and the real one if fetched). Commit doc_triage.sh.
AI acceleration¶
Paste the extracted VBA macro source from olevba into a model. Prompt: "This is a VBA macro extracted statically from a suspicious document. List any IOCs, identify the auto-execute handler, decode any obfuscated PowerShell, and assess whether this macro would download or execute a payload — do not assume I will run it." Verify the assessment against what you found manually in steps 2–3. The model is fast at deobfuscation; the judgment that it deobfuscated correctly is yours.
Connects forward¶
Module 12 takes the IOCs you just extracted and structures them into a MISP-compatible format, maps them to ATT&CK, and prepares them for sharing with threat intelligence platforms.
Marketable proof¶
"I triage a real malicious Office document without ever opening it — extracting and deobfuscating the embedded VBA with olevba, recovering the staged PowerShell downloader and its IOCs statically, and mapping the chain to the Emotet delivery technique set (T1566.001 → T1059.005 → T1059.001)."
Stretch¶
- Run
olevba --deobfand compare its automatic deobfuscation against your manual decode of the PowerShell blob. Where do they differ, and why? - Pull a second sample with a different tag (
TAG=Emotetvsmaldoc, or an.xlsm) and diff the macro structure: do both use the same auto-exec handler and the same PowerShell hand-off pattern? Note the family fingerprint.
Further reading¶
- oletools / olevba — Decalage's static VBA macro extraction & deobfuscation toolkit (the right tool for this job; it never executes the document).
- MITRE ATT&CK — Emotet (S0367) — the modular downloader this lab is built around: T1566.001 (spearphishing attachment), T1059.005 (VBA), T1059.001 (PowerShell), T1547.001 (Run-key persistence).
Comments
Sign in with GitHub to comment. Choose the type: Feedback (errors or suggestions on this page) · Hints (help for fellow learners — no spoilers) · General (anything else).