Hacktricks-skills structural-exploit-detection

Use this skill whenever analyzing suspicious files for 0-click exploit detection, forensic investigation of mobile malware, or validating file format structural integrity. Trigger on any request about PDF exploits, WebP vulnerabilities, font bytecode analysis, DNG/TIFF forensics, HEIF/AVIF parsing issues, or general file format exploit detection. This skill helps detect exploit chains by validating structural invariants rather than relying on byte signatures.

install
source · Clone the upstream repo
git clone https://github.com/abelrguezr/hacktricks-skills
manifest: skills/generic-methodologies-and-resources/basic-forensic-methodology/specific-software-file-type-tricks/structural-file-format-exploit-detection/SKILL.MD
source content

Structural File-Format Exploit Detection

This skill provides practical techniques to detect 0-click mobile exploit files by validating structural invariants of their formats instead of relying on byte signatures. The approach generalizes across samples, polymorphic variants, and future exploits that abuse the same parser logic.

Key principle: Encode structural impossibilities and cross-field inconsistencies that only appear when a vulnerable decoder/parser state is reached.

When to Use This Skill

Use this skill when:

  • Investigating suspicious files that may contain 0-click exploits
  • Building forensic detection rules for mobile malware
  • Validating file format integrity in security gateways
  • Analyzing exploit chains (FORCEDENTRY, BLASTPASS, TRIANGULATION, LANDFALL, etc.)
  • Creating structural detection rules that work without payload signatures

Why Structure, Not Signatures

When weaponized samples are unavailable and payload bytes mutate, traditional IOC/YARA patterns fail. Structural detection inspects the container's declared layout versus what is mathematically or semantically possible for the format implementation.

Typical checks:

  • Validate table sizes and bounds derived from the spec and safe implementations
  • Flag illegal/undocumented opcodes or state transitions in embedded bytecode
  • Cross-check metadata vs actual encoded stream components
  • Detect contradictory fields that indicate parser confusion or integer overflow set-ups

Detection Patterns by Format

PDF/JBIG2 – FORCEDENTRY (CVE-2021-30860)

Target: JBIG2 symbol dictionaries embedded inside PDFs (often used in mobile MMS parsing).

Structural signals:

  • Contradictory dictionary state that cannot occur in benign content but is required to trigger the overflow in arithmetic decoding
  • Suspicious use of global segments combined with abnormal symbol counts during refinement coding

Detection logic:

if input_symbols_count == 0 and (ex_syms > 0 and ex_syms < 4):
    mark_malicious("JBIG2 impossible symbol dictionary state")

Practical triage:

  1. Identify and extract JBIG2 streams from the PDF using
    pdfid
    ,
    pdf-parser
    , or
    peepdf
  2. Verify arithmetic coding flags and symbol dictionary parameters against the JBIG2 spec

Notes: Works without embedded payload signatures. Low false positive rate because the flagged state is mathematically inconsistent.


WebP/VP8L – BLASTPASS (CVE-2023-4863)

Target: WebP lossless (VP8L) Huffman prefix-code tables.

Structural signals:

  • Total size of constructed Huffman tables exceeds the safe upper bound expected by reference/patched implementations, implying the overflow precondition

Detection logic:

let total_size = sum(table_sizes)
if total_size > 2954:   # FIXED_TABLE_SIZE + MAX_TABLE_SIZE
    mark_malicious("VP8L oversized Huffman tables")

Practical triage:

  1. Check WebP container chunks: VP8X + VP8L
  2. Parse VP8L prefix codes and compute actual allocated table sizes

Notes: Robust against byte-level polymorphism of the payload. Bound is derived from upstream limits/patch analysis.


TrueType – TRIANGULATION (CVE-2023-41990)

Target: TrueType bytecode inside fpgm/prep/glyf programs.

Structural signals:

  • Presence of undocumented/forbidden opcodes in Apple's interpreter used by the exploit chain

Detection logic:

switch opcode:
  case 0x8F, 0x90:
    mark_malicious("Undocumented TrueType bytecode")
  default:
    continue

Practical triage:

  1. Dump font tables using
    fontTools
    /
    ttx
    and scan fpgm/prep/glyf programs
  2. No need to fully emulate the interpreter to get value from presence checks

Notes: May produce rare false positives if nonstandard fonts include unknown opcodes; validate with secondary tooling.


DNG/TIFF – CVE-2025-43300

Target: DNG/TIFF image metadata vs actual component count in encoded stream (e.g., JPEG-Lossless SOF3).

Structural signals:

  • Inconsistency between EXIF/IFD fields (SamplesPerPixel, PhotometricInterpretation) and the component count parsed from the image stream header used by the pipeline

Detection logic:

if samples_per_pixel == 2 and sof3_components == 1:
    mark_malicious("DNG/TIFF metadata vs. stream mismatch")

Practical triage:

  1. Parse primary IFD and EXIF tags
  2. Locate and parse the embedded JPEG-Lossless header (SOF3) and compare component counts

Notes: Reported exploited in the wild; excellent candidate for structural consistency checks.


DNG/TIFF – LANDFALL (CVE-2025-21042)

Target: DNG (TIFF-derived) images carrying an embedded ZIP archive appended at EOF to stage native payloads after parser RCE.

Structural signals:

  • File magic indicates TIFF/DNG (
    II*\x00
    or
    MM\x00*
    ) but filename mimics JPEG (e.g.,
    .jpg
    /
    .jpeg
    WhatsApp naming)
  • Presence of a ZIP Local File Header or EOCD magic near EOF (
    PK\x03\x04
    or
    PK\x05\x06
    ) that is not referenced by any TIFF IFD data region
  • Unusually large trailing data beyond the last referenced IFD data block (hundreds of KB to MB), consistent with a bundled archive of .so modules

Detection logic:

if is_tiff_dng(magic):
    ext = file_extension()
    if ext in {".jpg", ".jpeg"}: mark_suspicious("Extension/magic mismatch: DNG vs JPEG")

    zip_off = rfind_any(["PK\x05\x06", "PK\x03\x04"], search_window_last_n_bytes=8*1024*1024)
    if zip_off >= 0:
        end_dng = approx_end_of_tiff_data()
        if zip_off > end_dng + 0x200:
            mark_malicious("DNG with appended ZIP payload (LANDFALL-style)")

Practical triage:

  1. Identify format vs name:
    file sample; exiftool -s -FileType -MIMEType sample
  2. Locate ZIP footer/header near EOF and carve:
    off=$(grep -aboa -E $'PK\x05\x06|PK\x03\x04' sample.dng | tail -n1 | cut -d: -f1)
    dd if=sample.dng of=payload.zip bs=1 skip="$off"
    zipdetails -v payload.zip; unzip -l payload.zip
    
  3. Sanity-check TIFF data regions don't overlap the carved ZIP region:
    tiffdump -D sample.dng | egrep 'StripOffsets|TileOffsets|JPEGInterchangeFormat'
    
  4. One-shot carving (coarse):
    binwalk -eM sample.dng

Notes: Exploited in the wild against Samsung's libimagecodec.quram.so. The appended ZIP contained native modules (e.g., loader + SELinux policy editor) extracted/executed post-RCE.


HEIF/AVIF – libheif & libde265 (CVE-2024-41311, CVE-2025-29482, CVE-2025-65586)

Target: HEIF/AVIF containers parsed by libheif (and ImageIO/OpenImageIO builds that bundle it).

Structural signals:

  • Overlay items (iloc/iref) whose source rectangles exceed the base image dimensions or whose offsets are negative/overflowing → triggers ImageOverlay::parse out-of-bounds (CVE-2024-41311)
  • Grid items referencing non-existent item IDs (ImageItem_Grid::get_decoder NULL deref, CVE-2025-43967)
  • SAO/loop-filter parameters or tile counts that force table allocations larger than the max allowed by libde265 (CVE-2025-29482)
  • Box length/extent sizes that point past EOF (typical in CVE-2025-65586 PoCs)

Detection logic:

# HEIF overlay bounds check
for overlay in heif_overlays:
    if overlay.x < 0 or overlay.y < 0: mark_malicious("HEIF overlay negative offset")
    if overlay.x + overlay.w > base.w or overlay.y + overlay.h > base.h:
        mark_malicious("HEIF overlay exceeds base image (CVE-2024-41311 pattern)")

# Grid item reference validation
for grid in heif_grids:
    if any(ref_id not in item_ids):
        mark_malicious("HEIF grid references missing item (CVE-2025-43967 pattern)")

# SAO / slice allocation guard
if sao_band_count > 32 or (tile_cols * tile_rows) > MAX_TILES or sao_eo_class not in {0..3}:
    mark_malicious("HEIF SAO/tiling exceeds safe bounds (CVE-2025-29482 pattern)")

Practical triage:

  1. Quick metadata sanity without full decode:
    heif-info sample.heic
    oiiotool --info --stats sample.heic
    
  2. Validate extents versus file size:
    heif-convert --verbose sample.heic /dev/null | grep -i extent
    
  3. Carve suspicious boxes for manual inspection:
    dd if=sample.heic bs=1 skip=$((box_off)) count=$((box_len)) of=box.bin
    

Notes: These checks catch malformed structure before heavy decode; useful for mail/MMS gateways that only need allow/deny decisions. libheif limits shift across versions; re-baseline constants when upstream changes.


Implementation Patterns

A practical scanner should:

  • Auto-detect file type and dispatch only relevant analyzers (PDF/JBIG2, WebP/VP8L, TTF, DNG/TIFF, HEIF/AVIF)
  • Stream/partial-parse to minimize allocations and enable early termination
  • Run analyses in parallel (thread-pool) for bulk triage

Example workflow with ElegantBouncer (open-source Rust implementation):

# Scan a path recursively with structural detectors
elegant-bouncer --scan /path/to/directory

# Optional TUI for parallel scanning and real-time alerts
elegant-bouncer --tui --scan /path/to/samples

DFIR Tips and Edge Cases

  • Embedded objects: PDFs may embed images (JBIG2) and fonts (TrueType); extract and recursively scan
  • Decompression safety: Use libraries that hard-limit tables/buffers before allocation
  • False positives: Keep rules conservative, favor contradictions that are impossible under the spec
  • Version drift: Re-baseline bounds (e.g., VP8L table sizes) when upstream parsers change limits

Related Tools

ToolPurpose
ElegantBouncerStructural scanner for the detections above
pdfid/pdf-parser/peepdfPDF object extraction and static analysis
pdfcpuPDF linter/sanitizer
fontTools/ttxDump TrueType tables and bytecode
exiftoolRead TIFF/DNG/EXIF metadata
dwebp/webpmuxParse WebP metadata and chunks
heif-info/heif-convert (libheif)HEIF/AVIF structure inspection
oiiotoolValidate HEIF/AVIF via OpenImageIO
binwalkCarve embedded files from containers

Quick Reference: Detection Scripts

Use the bundled scripts for common detection tasks:

  • scripts/check_dng_appended_zip.sh
    - Detects appended ZIP payloads in DNG/TIFF files (LANDFALL pattern)
  • scripts/check_truetype_opcodes.py
    - Scans TrueType bytecode for undocumented opcodes (TRIANGULATION pattern)
  • scripts/validate_heif_structure.py
    - Validates HEIF overlay bounds and references

References