Hacktricks-skills document-steganography
Analyze documents for hidden steganographic content. Use this skill whenever the user needs to extract hidden data from PDFs, Office files (.docx/.xlsx/.pptx), or other document formats. Trigger on requests to find hidden files, extract embedded content, analyze document structures, or investigate suspicious documents in CTFs, forensics, or security research.
git clone https://github.com/abelrguezr/hacktricks-skills
skills/stego/documents/documents/SKILL.MDDocument Steganography Analysis
This skill helps you extract hidden content from document containers like PDFs and Office OOXML files. Documents are often just containers with embedded files, streams, and hidden objects.
Quick Start
# For PDFs python scripts/analyze_pdf.py <file.pdf> # For Office files (.docx/.xlsx/.pptx) python scripts/analyze_office.py <file.docx>
PDF Analysis
PDFs are structured containers with objects, streams, and optional embedded files. Hidden content often appears as:
- Embedded attachments
- Compressed object streams
- Hidden objects (JavaScript, embedded images, odd streams)
Step-by-Step Process
-
Get basic info
pdfinfo file.pdf -
List and extract attachments
pdfdetach -list file.pdf pdfdetach -saveall file.pdf -
Flatten and decompress
qpdf --qdf --object-streams=disable file.pdf out.pdf -
Search for suspicious content
strings out.pdf | grep -iE "(password|secret|hidden|flag|base64)" grep -a "JVBER" out.pdf # Find embedded PDFs
What to Look For
- Objects with unusual stream lengths
- JavaScript actions (
or/JS
)/JavaScript - Embedded images in unexpected places
- Base64-encoded data in text streams
- References to external files or URLs
Office OOXML Analysis
Office OOXML files (
.docx, .xlsx, .pptx) are ZIP containers with XML and assets. Hidden payloads often appear in:
- Media files (
)word/media/ - Relationship files (
)_rels/ - Custom XML parts
- Unusual file structures
Step-by-Step Process
-
List contents
7z l file.docx -
Extract to inspect
7z x file.docx -oout/ -
Check key locations
# Main document content cat out/word/document.xml # Relationships (may point to hidden resources) cat out/word/_rels/*.rels # Embedded media ls -la out/word/media/ -
Search for suspicious patterns
grep -r "http" out/ grep -r "base64" out/ grep -r "flag" out/
What to Look For
- External URLs in relationship files
- Unusual file types in
media/ - Custom XML parts not in standard locations
- Large or oddly-named files
- References to non-existent resources
Examples
Example 1: Extracting PDF Attachments
Input: User has
suspicious_report.pdf and suspects hidden files
Process:
# List attachments pdfdetach -list suspicious_report.pdf # Output: 1: hidden_data.txt # Extract all pdfdetach -saveall suspicious_report.pdf # Creates: hidden_data.txt # Inspect strings hidden_data.txt
Example 2: Finding Hidden Office Content
Input: User has
meeting_notes.docx with suspicious behavior
Process:
# Extract 7z x meeting_notes.docx -oextracted/ # Check relationships cat extracted/word/_rels/document.xml.rels # May reveal: <Relationship Id="rId3" Type="..." Target="../hidden/secret.xml"/> # Inspect hidden parts ls -la extracted/word/hidden/
Example 3: CTF Flag Hunting
Input: User has
document.pdf in a CTF challenge
Process:
# Flatten PDF qpdf --qdf --object-streams=disable document.pdf flat.pdf # Search for flag patterns strings flat.pdf | grep -i "flag{" strings flat.pdf | grep -i "ctf{" # Check for embedded images pdfimages -list flat.pdf
Common Tools
| Tool | Purpose |
|---|---|
| Basic PDF metadata |
| List/extract PDF attachments |
| Flatten and decompress PDFs |
| Extract Office OOXML files |
| Extract readable text from binaries |
| Search for patterns |
Tips
- Always work on copies - Don't modify original files
- Check file extensions - A
might actually be something else.pdf - Look for inconsistencies - File size vs. content, metadata vs. actual content
- Search broadly - Use
andstrings
to find anything unusualgrep - Check relationships - In OOXML,
files often reveal hidden structure_rels/
When to Use This Skill
Use this skill when:
- Investigating suspicious documents
- Extracting hidden files from PDFs or Office documents
- Participating in CTF challenges with document-based steganography
- Performing forensic analysis on document files
- Looking for embedded content, attachments, or hidden streams
- Analyzing document structure for security research