Claude-skill-registry data-structures
Python data structure conventions for this codebase. Apply when choosing between Pydantic models, dataclasses, and other data containers.
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/data-structures" ~/.claude/skills/majiayu000-claude-skill-registry-data-structures && rm -rf "$T"
manifest:
skills/data/data-structures/SKILL.mdsource content
Data Structure Conventions
Use Pydantic models or dataclasses instead of raw dictionaries or tuples.
Quick Reference
| Use Case | Choice | Example |
|---|---|---|
| Validation, serialization, API boundaries | Pydantic | Request/response models |
| Simple internal data containers | | Internal DTOs |
| Immutable value objects, hashable keys | | Cache keys, IDs |
| Configuration from environment | Pydantic | App settings |
| Performance-critical hot paths | | Lower overhead than Pydantic |
Forbidden Patterns
| Pattern | Reason |
|---|---|
Raw returns | No IDE support, no validation, error-prone |
returns | Positional access is unclear |
| Only for backward compatibility when refactoring tuple returns |
Pydantic Models
Use for validation, serialization, and API boundaries.
# CORRECT - Pydantic model with Field descriptions from pydantic import BaseModel, Field class SearchResult(BaseModel): """A single search result from the retrieval system.""" document_id: str = Field(description="Unique identifier for the document") content: str = Field(description="The matched text content") score: float = Field(ge=0.0, le=1.0, description="Relevance score") metadata: dict[str, str] = Field(default_factory=dict) # INCORRECT - raw dictionary def search(query: str) -> dict: # No type safety, no validation return {"id": "123", "content": "...", "score": 0.95}
When to Use Pydantic
- API request/response models
- Data requiring validation constraints (
,ge
,le
, etc.)min_length - Serialization to/from JSON
- External data boundaries (user input, file parsing, API responses)
Configuration with Pydantic Settings
Use
pydantic_settings.BaseSettings for environment-based configuration.
# CORRECT - typed settings from environment from pydantic_settings import BaseSettings class Settings(BaseSettings): """Application settings loaded from environment.""" openai_api_key: str database_url: str debug: bool = False max_workers: int = 4 model_config = {"env_prefix": "APP_"} # Usage: reads APP_OPENAI_API_KEY, APP_DATABASE_URL, etc. settings = Settings()
Dataclasses
Use for simple internal data containers where validation isn't needed.
# CORRECT - simple dataclass from dataclasses import dataclass @dataclass class Point: """A 2D point.""" x: float y: float # CORRECT - frozen for immutability and hashing @dataclass(frozen=True) class UserId: """Immutable user identifier, safe for use as dict key.""" value: int # Can be used as dict key or in sets cache: dict[UserId, User] = {}
When to Use Dataclasses
- Internal data transfer objects
- Simple value containers
- When Pydantic overhead isn't justified
- When you need hashable objects (
)frozen=True
Decision Flow
Is the data from external source (API, user input, file)? ├── Yes → Use Pydantic BaseModel (validation + serialization) └── No → Is serialization needed? ├── Yes → Use Pydantic BaseModel └── No → Is validation needed? ├── Yes → Use Pydantic BaseModel └── No → Is immutability/hashability needed? ├── Yes → Use dataclass(frozen=True) └── No → Use dataclass
Examples
Returning Multiple Values
# INCORRECT - tuple return def get_user_stats(user_id: int) -> tuple[int, float, str]: return (42, 0.95, "active") # What do these values mean? # CORRECT - dataclass return @dataclass class UserStats: """Statistics for a user.""" post_count: int engagement_score: float status: str def get_user_stats(user_id: int) -> UserStats: return UserStats(post_count=42, engagement_score=0.95, status="active")
API Response Model
# CORRECT - Pydantic for API boundaries from pydantic import BaseModel, Field class UserResponse(BaseModel): """API response for user data.""" id: int = Field(description="User ID") name: str = Field(min_length=1, description="Display name") email: str = Field(description="Email address") is_active: bool = Field(default=True, description="Account status") model_config = {"extra": "forbid"} # Reject unknown fields
Immutable Cache Key
# CORRECT - frozen dataclass as cache key from dataclasses import dataclass from functools import lru_cache @dataclass(frozen=True) class QueryKey: """Immutable key for query caching.""" query: str top_k: int filters: tuple[str, ...] # Use tuple, not list, for hashability @lru_cache(maxsize=1000) def cached_search(key: QueryKey) -> list[Result]: ...