Skip to content

Lab 03 — Static Analysis — Strings & PE

Hands-on lab · ← Back to the module concept

Setup

git clone https://github.com/plaintext-security/plaintext-labs
cd plaintext-labs/malware/03-static-strings-pe
make up
make fetch-sample      # pulls a real Agent Tesla sample from MalwareBazaar into the isolated container
make demo

⚠ This lab analyzes a live malware sample. Handle it accordingly. - Static only. This module never executes the sample — strings, pefile, entropy, YARA. Do not run it. - Isolation. All work stays inside the isolated container; never copy the sample to your host. - Hygiene. The sample is fetched at lab time (password-protected zip, password infected) and is never committed.gitignore covers samples/. make fetch-sample needs a free abuse.ch Auth-Key (set MB_AUTH_KEY). - Offline fallback. No key / MalwareBazaar unreachable? Skip make fetch-sample; make demo falls back to the bundled synthetic loader.exe, and util.dll stays the benign control.

Scenario

A triage queue drops a sample an email gateway flagged as a likely Agent Tesla infostealer — the family FortiGuard dissects in the module's Learn path. Before anyone detonates it, the team wants a structured static metadata dump for the case ticket: imports, strings of interest, compile timestamp, and section entropy. Your job is to produce that dump from the real sample using strings, pefile, and analyze_pe.py, then write a detection rule from what you find — and confirm the strings/IAT line up with the keylogging + credential-theft + SMTP-exfil behaviour the FortiGuard writeup documents.

Throughout, the sample = the real Agent Tesla PE that make fetch-sample drops in samples/ (path printed by the target). In offline mode it is the bundled synthetic loader.exe; util.dll (or /bin/ls) remains the benign no-match control.

Do

  1. [ ] Run strings against the sample. Capture all printable strings (minimum length 6). Categorise the output into: file paths, registry keys, IP addresses or URLs, error messages, and "other interesting." Do you see the credential-store paths, the SMTP server / port 587, or a mutex? What does the string set tell you about the binary's intended behaviour?

  2. [ ] Dump the Import Address Table (IAT) for the sample (and util.dll as control). Use pefile in Python to list every imported DLL and every imported function. For each function you don't recognise, look it up on MalAPI.io. Flag the keylogging combination (SetWindowsHookEx / GetAsyncKeyState) and anything else in MalAPI's "suspicious" category.

  3. [ ] Parse the COFF timestamp. Use pefile to extract the TimeDateStamp from the COFF header and convert it to a human-readable date. Does the timestamp look plausible? (Check PE.FILE_HEADER.TimeDateStamp; convert with datetime.utcfromtimestamp().)

  4. [ ] Extract per-section entropy. For each PE section, calculate Shannon entropy and record the section name, raw size, and entropy. Flag any section above 7.0 as potentially packed or encrypted.

  5. [ ] Run analyze_pe.py and review the JSON output. Run make demo which executes the analysis script against all samples. Verify the JSON output matches your manual findings from steps 1–4. Fix any discrepancy.

  6. [ ] Write the analysis note. In static-analysis-note.md, for loader.exe: summarise the imports, flag any suspicious API combinations (referencing MalAPI.io), note the timestamp, and give a verdict on whether this binary warrants dynamic analysis.

  7. [ ] Author a YARA rule from your findings and prove it (the build half). Reading the metadata is only half the job — now turn the highest-signal findings into a detection. Write a YARA rule static-strings-pe.yar that keys on what this sample actually exposes: the suspicious-API combination you flagged in step 2 (the import names as strings) plus one distinctive string from step 1 (a mutex, log path, or URL), gated on pe.is_pe. Then prove the two-sided result: yara static-strings-pe.yar loader.exe must match, and yara static-strings-pe.yar util.dll (the benign control from the same case) must not match. If no benign PE is at hand, point it at a known-good binary like /bin/ls — it must stay quiet there too. A rule that fires on the benign control is keyed on the wrong thing; narrow it until only loader.exe matches. Recording the verdict and authoring the rule that proves you understood it are equal halves. (Hint: yara /path/to/rule /path/to/file; use the pe module for the PE check and require all the chosen strings in the condition so a single shared import can't trigger it.)

Success criteria — you're done when

  • [ ] IAT dump for both PE files is complete and every function is labelled (suspicious / benign / unknown).
  • [ ] Compile timestamp is parsed and assessed.
  • [ ] Section entropy table is complete; any high-entropy sections are flagged.
  • [ ] static-analysis-note.md exists with a verdict and reasoning.
  • [ ] analyze_pe.py runs cleanly and produces valid JSON output.
  • [ ] static-strings-pe.yar matches loader.exe and does not match util.dll (or /bin/ls) — the build half, proven two-sided.

Deliverables

analyze_pe.py (see Automate & own it), static-analysis-note.md, static-strings-pe.yar (with the match/no-match proof recorded in the note). Commit all three.

Automate & own it

Required. analyze_pe.py is provided as a starting point in data/. Extend it to also: (1) output a list of strings matching any of the patterns in a simple patterns file (data/string-patterns.txt) — one regex per line — and (2) add a "verdict" key to the JSON that is "packed" if any section entropy >= 7.0, "suspicious" if any import is in a hardcoded list of high-risk APIs, and "benign" otherwise. AI can draft the pattern-matching extension; you write the high-risk API list yourself by hand after reviewing MalAPI.io.

AI acceleration

Give an AI your IAT dump and ask it to map each import to a MITRE ATT&CK technique. Cross-check five entries against MalAPI.io — any that are wrong or missing, note them and correct the AI's output in your analysis note. Attribution of technique to API is a skill, not a lookup.

Connects forward

The analyze_pe.py output feeds directly into Module 04 (capability detection with capa) — capa's JSON output supplements the per-function analysis with higher-level behaviour labels. In Module 07 you will correlate the imports you found here with the actual disassembly to see where each function is called.

Marketable proof

"I extract and interpret PE metadata — imports, strings, entropy, timestamp — and produce a structured analysis report from a binary without executing it."

Stretch

  • Add Unicode string extraction (strings -el for UTF-16LE) and note whether any Unicode strings differ from the ASCII set.
  • Detect if any section is named with non-standard characters or has an unusual combination of flags (e.g., a writable + executable section — a common packer characteristic).

Comments

Sign in with GitHub to comment. Choose the type: Feedback (errors or suggestions on this page) · Hints (help for fellow learners — no spoilers) · General (anything else).