Skip to content

Module 03 — File Systems & Carving

Type 6 · Reconstruct — recover a deleted file from a raw disk image with SleuthKit (fls/icat/fsstat) and foremost, reconstructing the inodes and paths that prove the file existed and was deleted. (Secondary: Tool-Build — turn the carve/recover step into a reusable parser.) Go to the hands-on lab →

Last reviewed: 2026-06

Digital Forensics & IRdeleted files don't disappear — they just become invisible to the OS, and visible to you.

Difficulty: Intermediate  ·  Estimated time: ~5–7 hrs (study + lab)  ·  Prerequisites: Foundations

In 60 seconds

Deleting a file removes its directory pointer; the content clusters stay on disk until the allocator reuses them — and that gap is the investigator's window. SleuthKit walks the filesystem by layer (fls lists deleted entries, icat extracts content by inode); foremost carves file-shaped data straight from raw bytes by magic signature when no metadata survives. You need both: inode recovery when metadata exists, carving when it's gone. NTFS is especially rich — the MFT, dual timestamps, and journals reconstruct what happened and when.

Why this matters

When a user deletes a file or an attacker wipes their traces, the operating system marks that space as available — but the data usually remains on disk until overwritten. File system forensics is the discipline of reading what's really on the media, not what the OS presents. It is where evidence goes after it's been "deleted" and where malware that never ran from a normal path often hides. Understanding how NTFS and ext4 actually allocate and reclaim space is what separates an investigator who can say "I found a deleted file" from one who can prove when it was deleted, who deleted it, and what it contained.

This is exactly the move at the center of the M57-Patents scenario, the public Digital Corpora disk-image dataset that forensics courses use to teach recovery: a confidential spreadsheet leaves the company, and the investigation hinges on recovering deleted artifacts and carving content out of unallocated space across the imaged workstation drives — not on what the live filesystem shows. The same dataset ships bulk_extractor output so you can check your carving against a known-good reference. Those raw images are the kind of input you point fls/icat/foremost at, and they make a real case out of a technique that toy images make feel academic.

Objective

Use SleuthKit tools (fls, icat, fsstat) to parse a raw disk image and recover deleted file entries; use foremost to carve recovered content from unallocated space; document the inodes and file paths that prove a file's existence and deletion.

The core idea

Every filesystem is a data structure — a tree of metadata (inodes, MFT entries, directory entries) that points to clusters of actual file content. When a file is deleted, only the metadata pointer is removed from the directory listing. The MFT entry (NTFS) or inode (ext) is marked as unallocated, but the content clusters remain physically on disk until the allocator reuses them. This gap — between "metadata says deleted" and "data overwrote those sectors" — is the investigator's window. SleuthKit exploits it: fls lists all directory entries including deleted ones (flagged with *), and icat extracts the content at a given inode number even if the file no longer exists in the directory tree.

The mental model

SleuthKit is navigable once you think in layers: volume (mmls) → filesystem (fsstat) → metadata/inode (istat) → filename (fls) → data (icat). Each tool queries exactly one layer, which tells you what its output will and won't contain. Most SleuthKit confusion is a layer mismatch — asking fls for inode detail, or expecting icat to know a filename.

flowchart TB
    V["Volume — partitions"] -->|"mmls"| F["Filesystem — layout"]
    F -->|"fsstat"| M["Metadata — inode / MFT entry"]
    M -->|"istat"| N["Filename — directory entry"]
    N -->|"fls"| D["Data — content clusters"]
    D -->|"icat"| O["bytes out"]

File carving is a different and complementary technique: instead of navigating the filesystem metadata, you scan the raw byte stream looking for known file signatures (magic bytes) and carve the data into a file regardless of whether any metadata exists for it. foremost uses a configuration file of header/footer byte patterns (PNG: \x89PNG\r\n\x1a\n; JPEG: \xff\xd8\xff; ZIP: PK\x03\x04) to locate and extract file-shaped content from unallocated space. Carving finds things the filesystem doesn't know about anymore, but it can't tell you the original filename, creation time, or path — the metadata is gone, only the content remains. You need both approaches: inode-based recovery when the metadata still exists, carving when it doesn't.

The gotcha

Carving recovers content, not context. A carved JPEG has no filename, no path, no creation time — the metadata that proves when and by whom it was deleted is gone. And the window closes: once the allocator overwrites those clusters, the data is unrecoverable. Inode recovery and carving answer different questions; reach for both and know which one your finding rests on.

Go deeper: why NTFS is the artifact-rich filesystem

The Master File Table (MFT) is a structured database where every file and directory — current and recently deleted — has an entry, carrying multiple attribute streams: $DATA (content), $STANDARD_INFORMATION (timestamps), $FILE_NAME (filename-level timestamps that differ from $SI and matter for timestomping detection, Module 11). $MFT itself is an extractable file. NTFS also keeps transaction logs ($LogFile, $UsnJrnl) recording filesystem operations — a gold mine for reconstructing what happened, when, and in what order. SleuthKit reads MFT entries directly from the raw image, bypassing Windows entirely.

AI caveat

A model is good for translating tool output into plain English ("what does this fsstat tell me about the volume?") and drafting foremost config entries — but it cannot read your image and will hallucinate inode numbers and byte offsets. Always verify those values by running the tools yourself.

Learn (~4 hrs)

Filesystem internals (~1.5 hrs) - Brian Carrier — File System Forensic Analysis (book overview) — the definitive text; if you have access, chapters 8–10 (FAT/NTFS) are the canonical reference. Otherwise, Carrier's tool documentation serves as a free proxy. - NTFS Documentation — ntfs.com — brief but precise on MFT attributes; understand $DATA, $STANDARD_INFORMATION, and $FILE_NAME before the lab.

SleuthKit in practice (~1.5 hr) - SleuthKit User Guide and Tool Documentation — official reference for all SleuthKit tools (fls, icat, fsstat, istat, mmls); each tool's usage and output format. Keep this open during the lab. - foremost — SourceForge Project Documentation — the carver's documentation and configuration format; read the header/footer configuration section to understand how you'd add a new file type. - Digital Corpora — M57-Patents scenario — real disk images where the case turns on recovering a deleted/exfiltrated document. Grab one workstation image (~15 min) and try fls -d and foremost against it; the included bulk_extractor output gives you an answer key to calibrate against.

FAT32 internals (for the lab image) (~1 hr) - FAT32 File System Specification (Microsoft) — the authoritative spec; sections 3–5 cover the FAT structure and directory entries. Worth a skim so you know what fsstat is reporting.

Key concepts

  • File deletion removes the directory entry and marks the inode/MFT entry unallocated; content sectors remain.
  • SleuthKit's five-layer model: volume → filesystem → metadata → filename → data.
  • fls -d lists deleted directory entries; icat extracts content at a given inode.
  • File carving scans raw bytes for magic signatures; it recovers content without metadata.
  • foremost uses a header/footer config to carve specific file types from unallocated space.
  • NTFS MFT entries carry dual timestamps ($SI vs. $FN) that matter for anti-forensics detection.
  • $LogFile and $UsnJrnl record filesystem operations — often more valuable than the files themselves.
  • Real disk-image datasets (Digital Corpora's M57-Patents) hinge on recovering deleted/carved files from unallocated space — practice against them, not just toy images.

AI acceleration

AI is useful for translating SleuthKit command output into plain English ("what does this fsstat output tell me about the volume?") and for drafting foremost configuration entries for new file types. Where AI is not a substitute: it cannot read your disk image, and it will hallucinate inode numbers and specific offsets. Use it to explain the output you've already captured; always verify inode and offset values by running the tools yourself.

Check yourself

  • When you delete a file, what is actually removed and what remains on disk — and what closes the recovery window?
  • You recovered a JPEG with foremost but can't state its original filename or deletion time. Why not, and which technique would tell you?
  • Which SleuthKit tool answers "what files (including deleted ones) were in this directory?" versus "give me the bytes at this inode"?

Comments

Sign in with GitHub to comment. Choose the type: Feedback (errors or suggestions on this page) · Hints (help for fellow learners — no spoilers) · General (anything else).