Claude-skill-registry Python JSON

UTF-8 JSON file I/O utilities to avoid Windows encoding issues (CP-1252 vs UTF-8)

install

source · Clone the upstream repo

git clone https://github.com/majiayu000/claude-skill-registry

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/PythonJson" ~/.claude/skills/majiayu000-claude-skill-registry-python-json && rm -rf "$T"

manifest: skills/data/PythonJson/SKILL.md

Python JSON

Python on Windows defaults to the system encoding (usually CP-1252) when opening files, not UTF-8. This causes mojibake when reading JSON files containing non-ASCII characters like

£

€

, or accented letters.

Instructions

When helping users with JSON file operations in Python, follow these guidelines:

Never use raw file operations for JSON: Always use a utility module that abstracts the encoding issue
Always specify UTF-8 encoding: When opening files for JSON, use
```
encoding='utf-8'
```
explicitly
Use ensure_ascii=False when writing: This preserves Unicode characters instead of escaping them to
```
\uXXXX
```
Create a json_io utility module: For any project that reads/writes JSON files, create a dedicated utility
RFC 8259 compliance: JSON text must be encoded in UTF-8 per the JSON specification

Examples

Example 1: Project needs JSON file operations

User: Help me read and write JSON files in my Python project

Claude: I'll create a json_io utility module to abstract the Windows encoding issue:
- Create utils/json_io.py with load_json and save_json functions
- Both functions explicitly use UTF-8 encoding
- save_json uses ensure_ascii=False to preserve Unicode

[Provides implementation using the json_io pattern from reference material below]

Example 2: Mojibake in JSON data

User: My JSON file shows £ as Â£ when I read it

Claude: This is the UTF-8 double-encoding bug on Windows. The £ character (U+00A3, bytes C2 A3 in UTF-8) is being read as CP-1252, where those bytes represent Â£.

Solution: Use explicit UTF-8 encoding when opening files:
- with open(path, 'r', encoding='utf-8') as f

I'll create a json_io utility module to prevent this issue project-wide.

[Provides implementation using the json_io pattern from reference material below]

Example 3: Writing JSON with special characters

User: When I save JSON, my £ symbols become \u00a3

Claude: Python's json.dump() defaults to ensure_ascii=True, which escapes non-ASCII characters. To preserve them:
- Use ensure_ascii=False in json.dump()
- Always specify encoding='utf-8' when opening the file

[Provides implementation using the json_io pattern from reference material below]

Reference Implementation Details

The sections below contain proven working code that the examples above reference.

Reference Files in This Folder:

```
json_io.py
```
- Complete UTF-8 JSON I/O utility (copy to utils/)

json_io Pattern

Purpose: Abstract JSON file operations to always use UTF-8 encoding

Code Example

"""
JSON file I/O utilities with proper UTF-8 encoding.

This module abstracts JSON file operations to handle the well-known
issue of Python defaulting to system encoding on Windows instead of
UTF-8 (which is the default encoding per RFC 8259).
"""
import json
from pathlib import Path
from typing import Any, Union


def load_json(path: Union[str, Path]) -> Any:
    """
    Load JSON from a file with proper UTF-8 encoding.

    Args:
        path: Path to the JSON file

    Returns:
        Parsed JSON data
    """
    with open(path, 'r', encoding='utf-8') as f:
        return json.load(f)


def save_json(path: Union[str, Path], data: Any, indent: int = 2) -> None:
    """
    Save data to a JSON file with proper UTF-8 encoding.

    Args:
        path: Path to the JSON file
        data: Data to serialize
        indent: Indentation level (default: 2)
    """
    with open(path, 'w', encoding='utf-8') as f:
        json.dump(data, f, indent=indent, ensure_ascii=False)

Key Points:

Always use
```
encoding='utf-8'
```
when opening files
Use
```
ensure_ascii=False
```
to preserve Unicode characters in output
Accept both str and Path objects for flexibility
Default indent of 2 for human-readable output

Usage Pattern

from utils.json_io import load_json, save_json

# Reading
data = load_json('config.json')

# Writing
save_json('output.json', {'price': '£99.99'})

The Problem Explained

Windows Default Encoding

Python's

open()

function uses

locale.getencoding()

to determine the default encoding. On Windows, this is typically CP-1252 (Windows-1252), not UTF-8.

The Double-Encoding Bug

When a UTF-8 file is read with CP-1252 encoding:

Character	UTF-8 bytes	CP-1252 interpretation
£ (U+00A3)	C2 A3	Â£
€ (U+20AC)	E2 82 AC	â‚¬
é (U+00E9)	C3 A9	Ã©

RFC 8259 Requirement

Per RFC 8259 (The JSON Data Interchange Format):

JSON text exchanged between systems that are not part of a closed ecosystem MUST be encoded using UTF-8.

Troubleshooting

Mojibake Characters

Symptom: Characters like

Â£

â‚¬

Ã©

appear in your data

Cause: UTF-8 file read with CP-1252 encoding

Solution: Use the json_io utility module or add

encoding='utf-8'

to file open calls

\uXXXX Escapes in Output

Symptom: Non-ASCII characters appear as

\u00a3

in JSON output

Cause:

json.dump()

default

ensure_ascii=True

Solution: Use

ensure_ascii=False

in json.dump()

Best Practices Summary

Create a
```
utils/json_io.py
```
module in every Python project that handles JSON files
Never use raw
```
json.load(open(...))
```
without explicit UTF-8 encoding
Use
```
ensure_ascii=False
```
when writing JSON to preserve readability
Consider setting
```
PYTHONUTF8=1
```
environment variable for system-wide UTF-8 default (Python 3.7+)