Hacktricks-skills forensic-data-carving

How to perform file and data carving on disk images, firmware, and binary files using forensic tools like Autopsy, Binwalk, Foremost, Scalpel, Bulk Extractor, and PhotoRec. Use this skill whenever the user needs to recover deleted files, extract embedded content from binaries, analyze disk images, carve data from firmware, or perform digital forensics on any storage media. Trigger on mentions of: data recovery, file carving, disk image analysis, firmware extraction, deleted file recovery, forensic analysis, embedded content extraction, or any scenario where hidden or deleted data needs to be recovered from files or images.

install
source · Clone the upstream repo
git clone https://github.com/abelrguezr/hacktricks-skills
manifest: skills/generic-methodologies-and-resources/basic-forensic-methodology/partitions-file-systems-carving/file-data-carving-recovery-tools/SKILL.MD
source content

Forensic Data Carving & Recovery

This skill covers tools and techniques for extracting files and data from disk images, firmware, and binary files. Use the appropriate tool based on your evidence type and recovery goals.

Quick Decision Guide

ScenarioRecommended Tool
Full disk image analysisAutopsy (GUI + CLI)
Firmware/embedded contentBinwalk
Simple file carvingForemost or Scalpel
Network artifacts + parallel scanningBulk Extractor
Maximum file type coveragePhotoRec
Failing/unstable drivesddrescue (image first)
EXT3/4 deleted filesextundelete / ext4magic
Visual binary analysisbinvis
Classify carved artifactsYARA-X

Tool Selection & Usage

Autopsy (Full-featured forensic platform)

Best for: Complete disk image analysis with GUI and CLI support.

Installation:

# Download from https://www.autopsy.com/download/
# Or via package manager on supported systems

CLI workflow (Autopsy ≥4.21):

# Create a case
autopsycli case --create MyCase --base /cases

# Ingest evidence with data-carve module (8 parallel threads)
autopsycli ingest MyCase /evidence/disk01.E01 --threads 8

Key features:

  • Supports disk images and various evidence formats
  • Built-in carving module (SleuthKit v4.13+)
  • Parallel extraction on multi-core systems
  • Scriptable via CLI for CI/CD or batch processing

Binwalk (Firmware & embedded content)

Best for: Finding and extracting embedded files from firmware binaries.

Installation:

sudo apt install binwalk

Common commands:

# Display embedded data
binwalk firmware.bin

# Extract recognized objects (safe default)
binwalk -e firmware.bin

# Extract everything (use with caution)
binwalk --dd ".*" firmware.bin

⚠️ Security warning: Versions ≤2.3.3 have a Path Traversal vulnerability (CVE-2022-4510). Upgrade before analyzing untrusted samples.


Foremost (Simple file carving)

Best for: Quick extraction of known file types from images.

Installation:

sudo apt-get install foremost

Usage:

# Carve with default file types
foremost -v -i file.img -o output

# Files appear in the "output" directory

Configuration: Edit

/etc/foremost.conf
to enable/disable specific file types.


Scalpel (Configurable file carving)

Best for: Custom file type extraction with detailed configuration.

Installation:

sudo apt-get install scalpel

Usage:

scalpel file.img -o output

Configuration: Edit

/etc/scalpel/scalpel.conf
to uncomment desired file types.


Bulk Extractor (Network artifacts & parallel scanning)

Best for: Extracting network artifacts (URLs, emails, IPs, MACs) and carving in parallel.

Installation (v2.1.1+):

git clone https://github.com/simsong/bulk_extractor.git && cd bulk_extractor
mkdir build && cd build && cmake .. && make -j$(nproc) && sudo make install

Usage:

# Run all scanners with aggressive JPEG carving
bulk_extractor -o out_folder -S jpeg_carve_mode=2 -S write_bodyfile=y /evidence/disk.img

Post-processing:

  • bulk_diff
    - Compare artifacts between images
  • bulk_extractor_reader.py
    - Convert results to JSON for SIEM

PhotoRec (Maximum file type coverage)

Best for: Recovering the widest range of file types.

Download: https://www.cgsecurity.org/wiki/TestDisk_Download

Features:

  • GUI and CLI versions available
  • Select specific file types to search for
  • Works on raw images and physical drives

ddrescue (Imaging failing drives)

Best for: Creating reliable images from unstable or failing drives.

Installation:

sudo apt install gddrescue ddrescueview

Workflow:

# First pass - quick copy without retries
sudo ddrescue -f -n /dev/sdX suspect.img suspect.log

# Second pass - aggressive recovery with 3 retries
sudo ddrescue -d -r3 /dev/sdX suspect.img suspect.log

# Visualize status (green=good, red=bad)
ddrescueview suspect.log

Note: Version 1.28+ supports

--cluster-size
for high-capacity SSDs.


EXT3/4 Recovery (Filesystem-specific)

Best for: Recovering deleted files from Linux EXT filesystems without full carving.

extundelete (journal-based):

extundelete disk.img --restore-all

ext4magic (directory scan):

ext4magic disk.img -M -f '*.jpg' -d ./recovered

⚠️ Note: If the filesystem was mounted after deletion, data blocks may be overwritten. Use carving tools (Foremost/Scalpel) as fallback.


binvis (Visual binary analysis)

Best for: Understanding unknown binaries, spotting patterns, identifying packers/steganography.

Access:

Features:

  • Visual structure viewer
  • String and resource extraction (PE/ELF)
  • Pattern detection for cryptanalysis
  • Steganography identification
  • Binary diffing

YARA-X (Artifact classification)

Best for: Rapidly classifying thousands of carved objects.

Installation: https://github.com/VirusTotal/yara-x

Usage:

# Scan carved objects with YARA rules
yarax -r rules/index.yar out_folder/ --threads 8 --print-meta

Performance: 10-30× faster than classic YARA.


Complementary Tools

ToolPurpose
viuView images in terminal
pdftotextExtract text from PDFs
FindAESSearch for AES keys (TrueCrypt/BitLocker)

Recommended Workflow

1. Image the evidence first

# For stable drives
dd if=/dev/sdX of=evidence.img bs=4M status=progress

# For failing drives (see ddrescue section)

2. Choose carving approach

  • Full analysis: Autopsy
  • Quick extraction: Foremost/Scalpel
  • Firmware: Binwalk
  • Network artifacts: Bulk Extractor

3. Classify results

yarax -r rules/index.yar carved_output/ --threads 8

4. Review and document

  • Use
    ddrescueview
    for imaging status
  • Use
    viu
    for quick image preview
  • Export results to JSON for SIEM if needed

Security Considerations

  1. Always work on copies - Never carve directly from original evidence
  2. Use read-only mounts - Mount images with
    -o ro
    flag
  3. Isolate untrusted samples - Run carving in containers or VMs
  4. Keep tools updated - Especially Binwalk (CVE-2022-4510)
  5. Document chain of custody - Log all operations with timestamps

References