Hacktricks-skills stego-analysis
Steganography analysis and hidden data extraction. Use this skill whenever the user needs to find or extract hidden data from files (images, audio, video, documents, archives) or text-based steganography. Trigger on requests involving: hidden messages, steganography, LSB extraction, metadata analysis, file forensics, embedded files, spectrogram analysis, or any task where data might be concealed within another file. Make sure to use this skill for CTF challenges, security audits, or any investigation where hidden content is suspected.
git clone https://github.com/abelrguezr/hacktricks-skills
skills/stego/stego/SKILL.MDSteganography Analysis
A structured approach to finding and extracting hidden data from files and text. This skill treats steganography as a forensics problem: identify the real container, enumerate high-signal locations, then apply content-level extraction techniques.
Workflow & Triage
Follow this prioritized workflow for any steganography investigation:
1. Container Identification
First, determine what you're actually working with:
# Check actual file type vs extension file <filename> file -b <filename> # Check magic bytes xxd <filename> | head -20 # Verify extension matches content python3 scripts/check_file_type.py <filename>
Why this matters: Many stego challenges use mismatched extensions (e.g., a PNG disguised as a JPG). The actual file format determines which tools and techniques apply.
2. Metadata & String Inspection
Extract all metadata and readable strings before diving into content analysis:
# Extract metadata from all supported formats python3 scripts/extract_metadata.py <filename> # Extract printable strings strings <filename> > strings.txt strings -n 10 <filename> | grep -iE "(flag|secret|hidden|ctf|password|key)" # Look for appended data (common in stego) tail -c 10000 <filename> | strings
Why this matters: Hidden data often appears in metadata fields, appended after the file's actual content, or as readable strings. This is the fastest way to find low-hanging fruit.
3. Format-Specific Analysis
Based on the file type, apply targeted techniques:
Image Steganography
Images are the most common stego container. Prioritize these checks:
Quick Wins
# Check for appended data binwalk -e <image> foremost -i <image> # Extract all chunks (PNG) zcat <(pngcheck -v <image>) 2>&1 | grep -A5 "chunk" # Check for multiple images file <image> strings <image> | grep -i "PNG\|JFIF\|GIF"
LSB & Bit-Plane Analysis
For PNG/BMP files, check for Least Significant Bit manipulation:
# Extract LSB from image python3 scripts/extract_lsb.py <image> --bits 1-4 --output lsb_output/ # Visualize bit planes python3 scripts/visualize_bitplanes.py <image> # Check for stego signatures steghide extract -sf <image> # if password-protected zsteg <image> # comprehensive PNG stego scanner
JPEG-Specific
# Check for hidden data in JPEG exiftool <image> jpeginfo <image> # Extract from JPEG comments exiftool -Comment <image>
GIF Multi-Frame
# Extract all frames from GIF convert <gif> -coalesce frames/frame_%03d.png # Check for hidden data in frames file frames/*.png
Audio Steganography
Audio files hide data in samples, spectrograms, or as DTMF tones.
Spectrogram Analysis
# Generate spectrogram (hidden messages often visible) sox <audio> -n spectrogram spectrogram.png # Alternative with ffmpeg ffmpeg -i <audio> -filter_complex "spectrumpic" spectrogram.png
Look for: Text, QR codes, or images visible in the frequency spectrum.
Sample-Level Analysis
# Extract LSB from audio samples python3 scripts/extract_audio_lsb.py <audio> --bits 1-2 # Check for hidden data in silence sox <audio> silence -v -d 0.5 -t 0.1 silence_analysis.txt
DTMF Tones
# Detect telephone keypad tones sox <audio> -n stat -V 3 2>&1 | grep -i "tone\|frequency" # Visualize to spot DTMF patterns sox <audio> -n spectrogram dtmf_spectrogram.png
Text Steganography
Text that renders normally but contains hidden data through encoding tricks.
Unicode & Zero-Width Characters
# Detect zero-width characters python3 scripts/detect_zwc.py <textfile> # Remove zero-width characters python3 scripts/remove_zwc.py <textfile> > cleaned.txt # Check for homoglyphs (lookalike characters) python3 scripts/check_homoglyphs.py <textfile>
Whitespace Encoding
# Detect whitespace-based encoding python3 scripts/decode_whitespace.py <textfile> # Check for trailing spaces cat -A <textfile> | grep -E "\s\$"
Base64 & Encoded Strings
# Find base64 strings strings <textfile> | grep -E "^[A-Za-z0-9+/]{20,}={0,2}$" | base64 -d 2>/dev/null # Check for hex encoding strings <textfile> | grep -E "^[0-9a-fA-F]{20,}$" | xxd -r -p
Document Steganography
PDFs and Office files are containers first. Focus on embedded content.
PDF Analysis
# Extract all embedded files pdftk <pdf> cat output extracted/ # Check for hidden layers/annotations qpdf --show-npages <pdf> qpdf --json <pdf> | python3 -m json.tool # Extract streams and objects python3 scripts/extract_pdf_streams.py <pdf> # Look for hidden text pdftotext <pdf> -layout text.txt strings <pdf> | grep -iE "(flag|secret|hidden)"
Office Files (DOCX, XLSX, PPTX)
# Office files are ZIP archives unzip -l <file> unzip <file> -d extracted/ # Check relationships and embedded objects cat extracted/_rels/*.rels cat extracted/[Content_Types].xml # Extract embedded files find extracted/ -name "*.png" -o -name "*.jpg" -o -name "*.pdf"
ZIP Archives
# Check for hidden entries zipinfo -l <archive> # Extract and check all files unzip -o <archive> -d extracted/ file extracted/* # Look for password-protected entries zip2john <archive> # if password needed
Malware & Delivery Steganography
Payloads hidden in valid-looking files with marker-delimited text.
Common Patterns
# Check for text payloads in images strings <image> | grep -A5 -B5 "BEGIN\|END\|PAYLOAD\|DATA" # Look for base64 blocks strings <image> | grep -E "^[A-Za-z0-9+/]{50,}={0,2}$" # Check for shellcode or encoded commands xxd <image> | grep -E "(\x90{10,}|\x00{10,})"
Network-Based Steganography
# Analyze PCAP files tshark -r <pcap> -Y "http" -T fields -e http.file_data # Extract HTTP payloads tshark -r <pcap> -Y "http.request" -T fields -e http.request.full_uri
Common Tools Reference
Installation
# Core tools sudo apt install binwalk foremost zsteg steghide exiftool sudo apt install sox ffmpeg qpdf pdftk # Python tools pip install pillow pydub pycryptodome
Quick Reference
| Tool | Purpose |
|---|---|
| Find embedded files in containers |
| Carve files from raw data |
| PNG steganography scanner |
| Extract hidden data (password-protected) |
| Read/modify metadata |
| Audio analysis and spectrogram |
| Extract readable text from binaries |
| Identify file type by magic bytes |
Debugging & Verification
When Nothing Works
- Re-check the file type:
- is it what you think?file <filename> - Look at raw bytes:
- any obvious patterns?xxd <filename> | head -50 - Check file size:
- is it suspiciously large for its type?ls -la - Try multiple tools: Different tools detect different patterns
- Check for compression:
might show "compressed data"file <filename>
Verify Extraction
# Check if extracted data is valid file extracted_file strings extracted_file | head -20 # For images file extracted_image && identify extracted_image # For text file extracted_text && head -20 extracted_text
Tips for Success
- Start simple: Metadata and strings often reveal the answer before complex analysis
- Document everything: Keep notes on what you've tried and what you found
- Check file sizes: Unusually large files often contain appended data
- Look for patterns: Repeated bytes, unusual sequences, or obvious markers
- Combine techniques: Sometimes you need to extract, then analyze the extracted content
- Don't ignore the obvious: Sometimes the hidden data is in plain sight (comments, metadata)
Next Steps
After initial analysis:
- If you found embedded files, analyze them recursively
- If you extracted encoded data, decode it and check for more hidden content
- If spectrogram shows text, transcribe it and check for encoding
- If you found a password-protected container, try common CTF passwords or brute force
Remember: steganography is often layered. What you extract might contain more hidden data.