Claude-skill-registry dataclass-optimization

Python dataclass best practices: slots, frozen, validation. Trigger when optimizing dataclasses or creating config classes.

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/dataclass-optimization" ~/.claude/skills/majiayu000-claude-skill-registry-dataclass-optimization && rm -rf "$T"
manifest: skills/data/dataclass-optimization/SKILL.md
source content

Python Dataclass Optimization Patterns

Experiment Overview

ItemDetails
Date2025-12-18
GoalApply dataclass best practices for memory efficiency and safety
EnvironmentPython 3.10+
StatusSuccess - 5 patterns verified

Context

Python dataclasses (PEP 557) have several underused features that can significantly improve memory usage and code safety. Based on KDNuggets article analysis and practical application.

Pattern 1: slots=True for Memory Efficiency

Problem: Default dataclasses use

__dict__
for attribute storage, wasting memory.

Before (~152 bytes per instance):

@dataclass
class Config:
    n_envs: int = 64
    learning_rate: float = 1e-4

After (~56 bytes per instance):

@dataclass(slots=True)
class Config:
    n_envs: int = 64
    learning_rate: float = 1e-4

Benefit: ~15-20% memory reduction, faster attribute access

When to use: Almost always. Only skip if you need dynamic attributes or inheritance from non-slotted classes.


Pattern 2: frozen=True for Immutable Configs

Problem: Configuration objects can be accidentally modified after creation.

Before (mutable, risky):

@dataclass
class RiskLimits:
    max_drawdown: float = 0.15
    max_position_weight: float = 0.20

# Bug: accidental modification
limits = RiskLimits()
limits.max_drawdown = 0.50  # Silently corrupts config!

After (immutable, safe):

@dataclass(frozen=True, slots=True)
class RiskLimits:
    max_drawdown: float = 0.15
    max_position_weight: float = 0.20

limits = RiskLimits()
limits.max_drawdown = 0.50  # Raises FrozenInstanceError

When to use: Configuration objects, immutable data records, anything that shouldn't change after creation.

When NOT to use: Classes with methods that modify state (like

update_metrics()
).


Pattern 3: compare=False for Metadata Fields

Problem: Timestamps and metadata shouldn't affect equality comparison.

Before (timestamps break equality):

@dataclass
class TradeRecord:
    symbol: str
    entry_time: datetime
    entry_price: float

# Two identical trades appear different due to microsecond differences
trade1 = TradeRecord("AAPL", datetime.now(), 150.0)
trade2 = TradeRecord("AAPL", datetime.now(), 150.0)
trade1 == trade2  # False! (different timestamps)

After (timestamps excluded from comparison):

from dataclasses import dataclass, field

@dataclass(slots=True)
class TradeRecord:
    symbol: str
    entry_time: datetime = field(compare=False)
    entry_price: float

trade1 = TradeRecord("AAPL", datetime.now(), 150.0)
trade2 = TradeRecord("AAPL", datetime.now(), 150.0)
trade1 == trade2  # True! (compares only symbol and price)

When to use: Timestamps, IDs, logging metadata, any field that's not part of the "identity" of the object.


Pattern 4: post_init for Validation

Problem: Invalid configurations cause errors deep in code, hard to debug.

Before (no validation):

@dataclass(slots=True)
class PPOConfig:
    n_envs: int = 64
    learning_rate: float = 1e-4
    gamma: float = 0.99

# Invalid config passes silently, fails during training
config = PPOConfig(n_envs=-1, gamma=2.0)  # No error here!

After (early validation):

@dataclass(slots=True)
class PPOConfig:
    n_envs: int = 64
    learning_rate: float = 1e-4
    gamma: float = 0.99

    def __post_init__(self):
        if self.n_envs <= 0:
            raise ValueError(f"n_envs must be positive, got {self.n_envs}")
        if not 0 < self.learning_rate < 1:
            raise ValueError(f"learning_rate must be in (0, 1), got {self.learning_rate}")
        if not 0 < self.gamma <= 1:
            raise ValueError(f"gamma must be in (0, 1], got {self.gamma}")

config = PPOConfig(n_envs=-1)  # Raises ValueError immediately!

When to use: Configuration classes, any dataclass where invalid values could cause problems.


Pattern 5: default_factory for Mutable Defaults

Problem: Mutable default arguments are shared across instances (Python gotcha).

Before (BUG - shared list):

@dataclass
class SignalQuality:
    rejection_reasons: List[str] = []  # WRONG! Shared across all instances

sq1 = SignalQuality()
sq1.rejection_reasons.append("low_confidence")
sq2 = SignalQuality()
print(sq2.rejection_reasons)  # ['low_confidence'] - BUG!

After (correct - new list per instance):

from dataclasses import dataclass, field

@dataclass(slots=True)
class SignalQuality:
    rejection_reasons: List[str] = field(default_factory=list)

sq1 = SignalQuality()
sq1.rejection_reasons.append("low_confidence")
sq2 = SignalQuality()
print(sq2.rejection_reasons)  # [] - Correct!

When to use: Any mutable default (list, dict, set, custom objects).


Failed Attempts (Critical)

AttemptWhy it FailedLesson Learned
frozen=True
on class with
update_metrics()
method
Can't modify attributes in frozen classOnly freeze immutable data structures
slots=True
with class inheritance
Slots don't work well with multiple inheritanceUse composition over inheritance, or skip slots for inherited classes
Validation that accesses other fields before they're set
__post_init__
runs after all fields are set, but field order matters
Order validation checks carefully
compare=False
on primary key fields
Breaks dict/set membershipOnly exclude truly metadata fields

Decision Matrix

Dataclass Typeslotsfrozencompare=Falsepost_init
Config/SettingsYesYesN/AYes (validation)
Immutable RecordYesYesOn timestampsOptional
Mutable StateYesNoOn metadataOptional
Data Transfer ObjectYesOptionalOn IDsYes

Combining Patterns

from dataclasses import dataclass, field
from datetime import datetime
from typing import Optional, List

@dataclass(frozen=True, slots=True)
class RiskLimits:
    """Immutable configuration with validation."""
    max_portfolio_var: float = 0.02
    max_position_weight: float = 0.20
    max_drawdown: float = 0.15

    def __post_init__(self):
        if not 0 < self.max_portfolio_var <= 1:
            raise ValueError(f"max_portfolio_var must be in (0, 1]")
        if not 0 < self.max_position_weight <= 1:
            raise ValueError(f"max_position_weight must be in (0, 1]")
        if not 0 < self.max_drawdown <= 1:
            raise ValueError(f"max_drawdown must be in (0, 1]")


@dataclass(slots=True)
class TradeRecord:
    """Mutable record with excluded metadata."""
    symbol: str
    entry_time: datetime = field(compare=False)
    entry_price: float
    exit_time: Optional[datetime] = field(default=None, compare=False)
    exit_price: Optional[float] = None
    notes: List[str] = field(default_factory=list, compare=False)

Key Insights

  • slots=True
    is almost always beneficial - default to using it
  • frozen=True
    is for data that shouldn't change, not for all dataclasses
  • compare=False
    on timestamps prevents subtle bugs in equality checks
  • __post_init__
    catches invalid configs early, before they cause downstream errors
  • default_factory
    is mandatory for mutable defaults - Python doesn't warn you

References