Claude-skill-registry data-structures

Python data structure conventions for this codebase. Apply when choosing between Pydantic models, dataclasses, and other data containers.

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/data-structures" ~/.claude/skills/majiayu000-claude-skill-registry-data-structures && rm -rf "$T"
manifest: skills/data/data-structures/SKILL.md
source content

Data Structure Conventions

Use Pydantic models or dataclasses instead of raw dictionaries or tuples.

Quick Reference

Use CaseChoiceExample
Validation, serialization, API boundariesPydantic
BaseModel
Request/response models
Simple internal data containers
dataclass
Internal DTOs
Immutable value objects, hashable keys
dataclass(frozen=True)
Cache keys, IDs
Configuration from environmentPydantic
BaseSettings
App settings
Performance-critical hot paths
dataclass
Lower overhead than Pydantic

Forbidden Patterns

PatternReason
Raw
dict
returns
No IDE support, no validation, error-prone
tuple
returns
Positional access is unclear
NamedTuple
Only for backward compatibility when refactoring tuple returns

Pydantic Models

Use for validation, serialization, and API boundaries.

# CORRECT - Pydantic model with Field descriptions
from pydantic import BaseModel, Field

class SearchResult(BaseModel):
    """A single search result from the retrieval system."""

    document_id: str = Field(description="Unique identifier for the document")
    content: str = Field(description="The matched text content")
    score: float = Field(ge=0.0, le=1.0, description="Relevance score")
    metadata: dict[str, str] = Field(default_factory=dict)

# INCORRECT - raw dictionary
def search(query: str) -> dict:  # No type safety, no validation
    return {"id": "123", "content": "...", "score": 0.95}

When to Use Pydantic

  • API request/response models
  • Data requiring validation constraints (
    ge
    ,
    le
    ,
    min_length
    , etc.)
  • Serialization to/from JSON
  • External data boundaries (user input, file parsing, API responses)

Configuration with Pydantic Settings

Use

pydantic_settings.BaseSettings
for environment-based configuration.

# CORRECT - typed settings from environment
from pydantic_settings import BaseSettings

class Settings(BaseSettings):
    """Application settings loaded from environment."""

    openai_api_key: str
    database_url: str
    debug: bool = False
    max_workers: int = 4

    model_config = {"env_prefix": "APP_"}

# Usage: reads APP_OPENAI_API_KEY, APP_DATABASE_URL, etc.
settings = Settings()

Dataclasses

Use for simple internal data containers where validation isn't needed.

# CORRECT - simple dataclass
from dataclasses import dataclass

@dataclass
class Point:
    """A 2D point."""
    x: float
    y: float

# CORRECT - frozen for immutability and hashing
@dataclass(frozen=True)
class UserId:
    """Immutable user identifier, safe for use as dict key."""
    value: int

# Can be used as dict key or in sets
cache: dict[UserId, User] = {}

When to Use Dataclasses

  • Internal data transfer objects
  • Simple value containers
  • When Pydantic overhead isn't justified
  • When you need hashable objects (
    frozen=True
    )

Decision Flow

Is the data from external source (API, user input, file)?
├── Yes → Use Pydantic BaseModel (validation + serialization)
└── No → Is serialization needed?
    ├── Yes → Use Pydantic BaseModel
    └── No → Is validation needed?
        ├── Yes → Use Pydantic BaseModel
        └── No → Is immutability/hashability needed?
            ├── Yes → Use dataclass(frozen=True)
            └── No → Use dataclass

Examples

Returning Multiple Values

# INCORRECT - tuple return
def get_user_stats(user_id: int) -> tuple[int, float, str]:
    return (42, 0.95, "active")  # What do these values mean?

# CORRECT - dataclass return
@dataclass
class UserStats:
    """Statistics for a user."""
    post_count: int
    engagement_score: float
    status: str

def get_user_stats(user_id: int) -> UserStats:
    return UserStats(post_count=42, engagement_score=0.95, status="active")

API Response Model

# CORRECT - Pydantic for API boundaries
from pydantic import BaseModel, Field

class UserResponse(BaseModel):
    """API response for user data."""

    id: int = Field(description="User ID")
    name: str = Field(min_length=1, description="Display name")
    email: str = Field(description="Email address")
    is_active: bool = Field(default=True, description="Account status")

    model_config = {"extra": "forbid"}  # Reject unknown fields

Immutable Cache Key

# CORRECT - frozen dataclass as cache key
from dataclasses import dataclass
from functools import lru_cache

@dataclass(frozen=True)
class QueryKey:
    """Immutable key for query caching."""
    query: str
    top_k: int
    filters: tuple[str, ...]  # Use tuple, not list, for hashability

@lru_cache(maxsize=1000)
def cached_search(key: QueryKey) -> list[Result]:
    ...