Claude-skill-registry io-utilities

Guide for using IO utilities in speedy_utils, including fast JSONL reading, multi-format loading, and file serialization.

install

source · Clone the upstream repo

git clone https://github.com/majiayu000/claude-skill-registry

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/io-utilities" ~/.claude/skills/majiayu000-claude-skill-registry-io-utilities && rm -rf "$T"

manifest: skills/data/io-utilities/SKILL.md

source content

IO Utilities Guide

This skill provides comprehensive guidance for using the IO utilities in

speedy_utils

When to Use This Skill

Use this skill when you need to:

Read and write data in various formats (JSON, JSONL, Pickle, CSV, TXT).
Efficiently process large JSONL files with streaming and multi-threading.
Automatically handle file compression (gzip, bz2, xz, zstd).
Load data based on file extension automatically.
Serialize Pydantic models and other objects easily.

Prerequisites

```
speedy_utils
```
installed.
Optional dependencies for specific features:
- ```
orjson
```
  : For faster JSON parsing.
- ```
zstandard
```
  : For
```
.zst
```
  file support.
- ```
pandas
```
  : For CSV/TSV loading.
- ```
pyarrow
```
  : For faster CSV reading with pandas.

Core Capabilities

Fast JSONL Processing (

fast_load_jsonl

)

Streams data line-by-line for memory efficiency.
Supports automatic decompression.
Uses
```
orjson
```
if available for speed.
Supports multi-threaded processing for large files.
Shows progress bar with
```
tqdm
```
.

Universal Loading (

load_by_ext

)

Detects file type by extension.
Supports glob patterns (e.g.,
```
data/*.json
```
) and lists of files.
Uses parallel processing for multiple files.
Supports memoization via
```
do_memoize=True
```
.

Serialization (

dump_json_or_pickle

load_json_or_pickle

)

Unified interface for JSON and Pickle.
Handles Pydantic models automatically.
Creates parent directories if they don't exist.

Usage Examples

Example 1: Streaming Large JSONL

Read a large compressed JSONL file line by line.

from speedy_utils import fast_load_jsonl

# Iterates lazily, low memory usage
for item in fast_load_jsonl('large_data.jsonl.gz', progress=True):
    process(item)

Example 2: Loading Any File

Load a file without worrying about the format.

from speedy_utils import load_by_ext

data = load_by_ext('config.json')
df = load_by_ext('data.csv')
items = load_by_ext('dataset.pkl')

Example 3: Parallel Loading

Load multiple files in parallel.

from speedy_utils import load_by_ext

# Returns a list of results, one for each file
all_data = load_by_ext('logs/*.jsonl')

Example 4: Dumping Data

Save data to disk, creating directories as needed.

from speedy_utils import dump_json_or_pickle

data = {"key": "value"}
dump_json_or_pickle(data, 'output/processed/result.json')

Guidelines

Prefer JSONL for Large Datasets:
- Use
```
fast_load_jsonl
```
  for datasets that don't fit in memory.
- It handles compression transparently, so keep files compressed (
```
.jsonl.gz
```
  or
```
.jsonl.zst
```
  ) to save space.
Use
```
load_by_ext
```
for Scripts:
- When writing scripts that might accept different input formats, use
```
load_by_ext
```
  to be flexible.
Error Handling:
- ```
fast_load_jsonl
```
  has an
```
on_error
```
  parameter (
```
raise
```
  ,
```
warn
```
  ,
```
skip
```
  ) to handle malformed lines gracefully.
Performance:
- Install
```
orjson
```
  for significantly faster JSON operations.
- ```
load_by_ext
```
  uses
```
pyarrow
```
  engine for CSVs if available, which is much faster.

Limitations

Memory Usage:
```
load_by_ext
```
loads the entire file into memory. Use
```
fast_load_jsonl
```
for streaming.
Glob Expansion:
```
load_by_ext
```
with glob patterns loads all matching files into memory at once (in a list). Be careful with massive datasets.

Claude-skill-registry io-utilities

IO Utilities Guide

When to Use This Skill

Prerequisites

Core Capabilities

Fast JSONL Processing (
`fast_load_jsonl`
)

Universal Loading (
`load_by_ext`
)

Serialization (
`dump_json_or_pickle`
,
`load_json_or_pickle`
)

Usage Examples

Example 1: Streaming Large JSONL

Example 2: Loading Any File

Example 3: Parallel Loading

Example 4: Dumping Data

Guidelines

Limitations

Claude-skill-registry io-utilities

IO Utilities Guide

When to Use This Skill

Prerequisites

Core Capabilities

Fast JSONL Processing (fast_load_jsonl)

Universal Loading (load_by_ext)

Serialization (dump_json_or_pickle, load_json_or_pickle)

Usage Examples

Example 1: Streaming Large JSONL

Example 2: Loading Any File

Example 3: Parallel Loading

Example 4: Dumping Data

Guidelines

Limitations

Fast JSONL Processing (
`fast_load_jsonl`
)

Universal Loading (
`load_by_ext`
)

Serialization (
`dump_json_or_pickle`
,
`load_json_or_pickle`
)