Hacktricks-skills document-steganography

Analyze documents for hidden steganographic content. Use this skill whenever the user needs to extract hidden data from PDFs, Office files (.docx/.xlsx/.pptx), or other document formats. Trigger on requests to find hidden files, extract embedded content, analyze document structures, or investigate suspicious documents in CTFs, forensics, or security research.

install

source · Clone the upstream repo

git clone https://github.com/abelrguezr/hacktricks-skills

manifest: skills/stego/documents/documents/SKILL.MD

source content

Document Steganography Analysis

This skill helps you extract hidden content from document containers like PDFs and Office OOXML files. Documents are often just containers with embedded files, streams, and hidden objects.

Quick Start

# For PDFs
python scripts/analyze_pdf.py <file.pdf>

# For Office files (.docx/.xlsx/.pptx)
python scripts/analyze_office.py <file.docx>

PDF Analysis

PDFs are structured containers with objects, streams, and optional embedded files. Hidden content often appears as:

Embedded attachments
Compressed object streams
Hidden objects (JavaScript, embedded images, odd streams)

Step-by-Step Process

Get basic info
```
pdfinfo file.pdf
```

List and extract attachments

pdfdetach -list file.pdf
pdfdetach -saveall file.pdf

Flatten and decompress

qpdf --qdf --object-streams=disable file.pdf out.pdf

Search for suspicious content

strings out.pdf | grep -iE "(password|secret|hidden|flag|base64)"
grep -a "JVBER" out.pdf  # Find embedded PDFs

What to Look For

Objects with unusual stream lengths
JavaScript actions (
```
/JS
```
or
```
/JavaScript
```
)
Embedded images in unexpected places
Base64-encoded data in text streams
References to external files or URLs

Office OOXML Analysis

Office OOXML files (

.docx

.xlsx

.pptx

) are ZIP containers with XML and assets. Hidden payloads often appear in:

Media files (
```
word/media/
```
)
Relationship files (
```
_rels/
```
)
Custom XML parts
Unusual file structures

Step-by-Step Process

List contents
```
7z l file.docx
```

Extract to inspect

7z x file.docx -oout/

Check key locations

# Main document content
cat out/word/document.xml

# Relationships (may point to hidden resources)
cat out/word/_rels/*.rels

# Embedded media
ls -la out/word/media/

Search for suspicious patterns

grep -r "http" out/
grep -r "base64" out/
grep -r "flag" out/

What to Look For

External URLs in relationship files
Unusual file types in
```
media/
```
Custom XML parts not in standard locations
Large or oddly-named files
References to non-existent resources

Examples

Example 1: Extracting PDF Attachments

Input: User has

suspicious_report.pdf

and suspects hidden files

Process:

# List attachments
pdfdetach -list suspicious_report.pdf
# Output: 1: hidden_data.txt

# Extract all
pdfdetach -saveall suspicious_report.pdf
# Creates: hidden_data.txt

# Inspect
strings hidden_data.txt

Example 2: Finding Hidden Office Content

Input: User has

meeting_notes.docx

with suspicious behavior

Process:

# Extract
7z x meeting_notes.docx -oextracted/

# Check relationships
cat extracted/word/_rels/document.xml.rels
# May reveal: <Relationship Id="rId3" Type="..." Target="../hidden/secret.xml"/>

# Inspect hidden parts
ls -la extracted/word/hidden/

Example 3: CTF Flag Hunting

Input: User has

document.pdf

in a CTF challenge

Process:

# Flatten PDF
qpdf --qdf --object-streams=disable document.pdf flat.pdf

# Search for flag patterns
strings flat.pdf | grep -i "flag{"
strings flat.pdf | grep -i "ctf{"

# Check for embedded images
pdfimages -list flat.pdf

Common Tools

Tool	Purpose
`pdfinfo`	Basic PDF metadata
`pdfdetach`	List/extract PDF attachments
`qpdf`	Flatten and decompress PDFs
`7z`	Extract Office OOXML files
`strings`	Extract readable text from binaries
`grep`	Search for patterns

Tips

Always work on copies - Don't modify original files
Check file extensions - A
```
.pdf
```
might actually be something else
Look for inconsistencies - File size vs. content, metadata vs. actual content
Search broadly - Use
```
strings
```
and
```
grep
```
to find anything unusual
Check relationships - In OOXML,
```
_rels/
```
files often reveal hidden structure

When to Use This Skill

Use this skill when:

Investigating suspicious documents
Extracting hidden files from PDFs or Office documents
Participating in CTF challenges with document-based steganography
Performing forensic analysis on document files
Looking for embedded content, attachments, or hidden streams
Analyzing document structure for security research