Lab 08 — Decompilation & Code Analysis¶
Hands-on lab · ← Back to the module concept
Setup¶
git clone https://github.com/plaintext-security/plaintext-labs
cd plaintext-labs/malware/08-decompilation-code-analysis
make up
make fetch-sample # pulls a real Agent Tesla sample from MalwareBazaar into the isolated container
make demo
⚠ This lab decompiles a live malware sample. Handle it accordingly. - Static only. Nothing in this lab executes the sample — analysis is decompilation (dnSpy/ILSpy on the .NET assembly, or
retdec/Ghidra). Do not run it. - Isolation. All work stays inside the isolated container; never copy the sample to your host. - Hygiene. The sample is fetched at lab time (password-protected zip, passwordinfected) and is never committed —.gitignorecoverssamples/.make fetch-sampleneeds a free abuse.ch Auth-Key (setMB_AUTH_KEY). - Offline fallback. No key / MalwareBazaar unreachable? Skipmake fetch-sample;make demofalls back to the bundled synthetictarget.c— the same string-decode-from-a-buffer mechanism, deliberately legible.
Scenario¶
A triage queue drops a sample flagged as Agent Tesla — the .NET infostealer FortiGuard dissects (MITRE S0331). Agent Tesla doesn't leave its C2 host, SMTP credentials, or panel paths sitting in plaintext: per the FortiGuard analysis, "all the constant strings in the .NET program are encoded and saved within a large buffer, and every string is assigned an index" — at use, an index is passed to a decode function that returns the plaintext. strings on the assembly yields the encoded blob, not the config. Your job is to recover it the analyst's way: decompile the assembly, find the string-decode routine, identify the algorithm, and reconstruct the plaintext config — without executing the binary.
The same mechanism, made legible, lives in the bundled data/target.c (read it only after the analysis): a small C program implementing the classic RC4 key-scheduling + keystream that underpins this style of string protection. The container compiles it stripped (-O2 -s) and runs the OSS retdec decompiler to reconstruct pseudo-C — your warm-up for reading a decode routine with no symbols. Start there to learn the loop's fingerprint, then turn dnSpy/ILSpy (or Ghidra/retdec) on the real Agent Tesla assembly in /lab/samples/.
Do¶
- [ ] Compile the target binary.
Run
make demofirst to see the full automated flow. Thenmake shellto work interactively. Inside the container:gcc -O2 -s -o /tmp/target /lab/data/target.c— the-sflag strips symbols, simulating a real stripped binary.
Hint: check that the output binary exists with file /tmp/target; confirm it reports "stripped" in the output.
- [ ] Decompile with retdec.
Run
retdec-decompiler /tmp/target -o /tmp/target.c. Examine/tmp/target.c— this is the pseudo-C output. Focus on the function retdec names something likemainorunknown_function.
Hint: cat /tmp/target.c or open it in less. Look for a for loop with a modulo (%) operation.
- [ ] Trace the key-scheduling loop.
Find the loop that initialises a 256-element array. It should have the pattern:
arr[i] = ifollowed by a second loop that swaps elements using a key. Map each variable to its semantic role: which one is the index? Which one is the running sum? Which one is the key byte?
Hint: a 256-byte state array initialised to the identity, then permuted by a running index that mixes in % key_length of a key, is the fingerprint of a well-known stream cipher's key schedule — identify which one in step 6.
-
[ ] Rename in your notes. You cannot rename in-place in the retdec output, but you can annotate. Copy the relevant function to a file
analysis-notes.txtin/lab/and add C-style comments naming each variable (/* key_index */,/* swap_temp */, etc.). -
[ ] Cross-check against strings. Run
strings /tmp/targetandobjdump -d /tmp/target | grep -A 20 '<main>'to see the disassembly alongside the decompiled version. Confirm the loop boundaries match. -
[ ] Write the algorithm summary. In
analysis-notes.txt, add a two-sentence summary: the algorithm family, the key length used in the demo, and one distinguishing characteristic visible in the decompiled output. -
[ ] Turn the same skill on the real Agent Tesla sample (opt-in — needs
make fetch-sample). The synthetictarget.ctaught you to read a no-symbol decode routine; now find one in the wild. Open the assembly in/lab/samples/with a .NET decompiler (dnSpy or ILSpy on your host; or extract IL withmonodisin-container) and locate the string-decode method the FortiGuard writeup describes — the one that takes an integer index and returns a plaintext string from the encoded buffer. Trace it: where is the encoded buffer? What transform (XOR/RC4/AES, base64 wrapper) does the decoder apply? Annotate one decoded call site inanalysis-notes.txt. (Heavily obfuscated or packed samples may show only a stub — note that too; recognising the protection layer is a valid finding. Static only; never execute.)
Success criteria — you're done when¶
- [ ]
retdec-decompileroutput exists at/tmp/target.cand is non-empty. - [ ] Your
analysis-notes.txthas renamed variables for every loop variable in the KSA function. - [ ] You can state in two sentences what algorithm the function implements and how you know.
- [ ] The disassembly (
objdump) loop boundaries match the decompiler's loop boundaries.
Deliverables¶
Commit to your portfolio repo:
- analysis-notes.txt — annotated pseudo-C with renamed variables and a two-sentence algorithm summary.
- Do not commit the compiled binary, the retdec raw output, or any sample captures.
Automate & own it¶
Required. Write a Python script annotate_decomp.py that:
1. Takes a retdec pseudo-C file as input (sys.argv[1]).
2. Detects the presence of a modulo-256 loop (regex on % 256 or i < 256).
3. Prints [MATCH] Possible key-scheduling loop at line N for each hit.
4. Exits with code 0 if a match is found, 1 if not.
Draft it with AI assistance, then read every line and confirm the regex is not over-broad (test it against a benign file with no loops). Commit annotate_decomp.py alongside your notes.
AI acceleration¶
Paste the decompiled function into a model. Prompt: "This is retdec pseudo-C output from a stripped binary. Identify the algorithm, name each variable by its role, and explain the loop in plain English." Use the response as a hypothesis — verify each rename against the data flow before accepting it. Note which renames you accepted and which you changed.
Connects forward¶
Module 09 picks up where this module leaves off: once you can read decompiled logic, you need to handle the case where the code is packed and the decompiler sees a stub rather than the real algorithm. Unpacking is prerequisite to decompilation on most real samples.
Marketable proof¶
"I can open a stripped binary in a free decompiler, identify a custom cipher implementation from the loop structure, rename variables to match the algorithm's semantics, and document my findings in a format suitable for an IR case file."
Stretch¶
Modify data/target.c to add a second function that implements a simple Caesar cipher. Recompile, re-decompile, and confirm retdec produces two recognisable functions. Extend annotate_decomp.py to detect Caesar-style add/modulo loops as well as XOR-swap loops.
Comments
Sign in with GitHub to comment. Choose the type: Feedback (errors or suggestions on this page) · Hints (help for fellow learners — no spoilers) · General (anything else).