Claude-skill-registry-data Type Hints for ML Code
Apply appropriate type hints for ML/PyTorch code. Use when adding type annotations to ML code or addressing mypy errors.
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry-data
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry-data "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/ml-type-hints" ~/.claude/skills/majiayu000-claude-skill-registry-data-type-hints-for-ml-code && rm -rf "$T"
manifest:
data/ml-type-hints/SKILL.mdsource content
Type Hints for ML Code - Constitutional AI
When to Apply
Automatically activate when:
- Adding type hints to ML code
- Addressing mypy type errors
- Working with PyTorch tensors, models, or optimizers
- Dealing with HuggingFace transformers types
Project Type Checking Status
Current mypy status:
- 41 errors remaining (documented in MYPY_ANALYSIS_REPORT.md)
- Status: Accepted as reasonable for ML research code
- CI behavior:
(doesn't fail builds)continue-on-error: true
Key insight: Perfect type coverage is not the goal for ML code. Prioritize correctness and readability over type perfection.
When to Use Type Hints
✅ Do Use Type Hints For
-
Public API functions
def load_model(model_name: str) -> tuple[AutoModelForCausalLM, AutoTokenizer]: """Load model and tokenizer.""" pass -
Configuration dataclasses
@dataclass class TrainingConfig: learning_rate: float batch_size: int num_epochs: int -
Clear input/output types
def evaluate_text(text: str, framework: ConstitutionalFramework) -> dict[str, Any]: """Evaluate text against principles.""" pass -
Helper functions with simple types
def format_prompt(prompt: str, examples: list[str]) -> str: """Format prompt with examples.""" pass
❌ Don't Force Type Hints For
- Complex tensor operations (mypy struggles with tensor shapes)
- Dynamic PyTorch internals (intentionally uses
)Any - NumPy operations (overload resolution issues)
- Training loop internals (too complex, low value)
PyTorch Type Patterns
Tensor Types
import torch from torch import Tensor # ✅ Basic tensor type def forward(inputs: Tensor) -> Tensor: return inputs * 2 # ✅ Optional tensor (common in ML) def process_batch( inputs: Tensor, labels: Tensor | None = None ) -> Tensor: pass # ⚠️ Avoid overly specific tensor types (mypy can't verify shapes) # This is too specific and won't type-check well: def bad_example(inputs: Tensor[int, 32, 768]) -> Tensor[int, 32, 10]: pass
Model and Optimizer Types
from torch.nn import Module from torch.optim import Optimizer from transformers import PreTrainedModel, PreTrainedTokenizer # ✅ Use base classes for flexibility def train_model( model: Module, # or PreTrainedModel for HuggingFace optimizer: Optimizer, dataloader: DataLoader ) -> None: pass # ✅ HuggingFace specific types def generate_text( prompt: str, model: PreTrainedModel, tokenizer: PreTrainedTokenizer, max_length: int = 100 ) -> str: pass
Device Types
from torch import device as Device # ✅ Device type def move_to_device(tensor: Tensor, device: Device | str) -> Tensor: return tensor.to(device) # Common pattern: device: str | Device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
Handling "Any" Type
When Any
is Acceptable
Anyfrom typing import Any # ✅ For complex nested structures def process_model_output(output: Any) -> dict[str, Any]: """Process model output (structure varies by model).""" pass # ✅ For highly dynamic operations def advanced_tensor_op(tensors: list[Tensor]) -> Any: """Complex operation with unpredictable output type.""" pass # ✅ For configuration dictionaries config: dict[str, Any] = { 'learning_rate': 0.001, 'model_name': 'gpt2', 'device': 'cuda', }
Prefer Specific Types When Possible
# ❌ Too vague def process_data(data: Any) -> Any: pass # ✅ More specific def process_data(data: list[dict[str, float]]) -> dict[str, Tensor]: pass
Optional and Union Types
from typing import Optional # or use | None (Python 3.10+) # Modern syntax (Python 3.10+) def load_checkpoint(path: str | None = None) -> dict[str, Tensor] | None: if path is None: return None return torch.load(path) # Equivalent older syntax from typing import Optional, Union def load_checkpoint(path: Optional[str] = None) -> Optional[dict[str, Tensor]]: pass
Common ML Type Patterns
Return Multiple Values
# ✅ Tuple with type hints def load_model(name: str) -> tuple[PreTrainedModel, PreTrainedTokenizer]: model = AutoModelForCausalLM.from_pretrained(name) tokenizer = AutoTokenizer.from_pretrained(name) return model, tokenizer # Usage model, tokenizer = load_model("gpt2")
Dataclass for Complex Returns
from dataclasses import dataclass @dataclass class EvaluationResult: any_flagged: bool flagged_principles: list[str] weighted_score: float details: dict[str, Any] def evaluate_text(text: str) -> EvaluationResult: """Evaluate text and return structured result.""" pass
Generator Types
from collections.abc import Iterator def batch_generator(data: list[Tensor], batch_size: int) -> Iterator[Tensor]: """Generate batches from data.""" for i in range(0, len(data), batch_size): yield data[i:i + batch_size]
Known Mypy Challenges in Project
Category 1: NumPy Overload Resolution (14 errors)
# Mypy struggles with NumPy's 6+ overload variants import numpy as np # This may show mypy error, but it's correct at runtime scores = np.array([1.0, 2.0, 3.0]) # mypy: "cannot infer type"
Resolution: Use
# type: ignore[misc] or accept the error (CI allows it)
Category 2: Optional Attribute Access (7 errors)
# Mypy warns about potential None access def process_model(model: PreTrainedModel | None): if model is not None: output = model.generate(...) # mypy may still warn # Resolution: Assert non-None or use type: ignore
Category 3: Tensor Type Inference (8 errors)
# PyTorch intentionally uses dynamic typing loss = criterion(outputs, targets) # mypy: "Cannot determine type"
Resolution: Accept as limitation - PyTorch is dynamically typed by design
Type Hints Best Practices for This Project
1. Prioritize Public APIs
# ✅ Type hints for exported functions def setup_default_framework() -> ConstitutionalFramework: """Public API - should have types.""" pass # ⚠️ Optional for internal helpers def _internal_helper(data): """Internal - types optional.""" pass
2. Use Any
Strategically
Any# ✅ Good use of Any def process_principle_config(config: dict[str, Any]) -> ConstitutionalPrinciple: """Config structure varies - Any is appropriate.""" pass # ❌ Overuse of Any def add(a: Any, b: Any) -> Any: """Too vague - use specific types.""" return a + b
3. Document Complex Types
from typing import TypeAlias # Define alias for complex type TrainingBatch: TypeAlias = dict[str, Tensor] EvalResults: TypeAlias = dict[str, float | bool | list[str]] def train_step(batch: TrainingBatch) -> EvalResults: """Type alias makes signature readable.""" pass
4. Handle Protocol/Abstract Types
from typing import Protocol class Evaluator(Protocol): """Protocol for evaluation functions.""" def evaluate(self, text: str) -> dict[str, Any]: ... def run_evaluation(text: str, evaluator: Evaluator) -> dict[str, Any]: """Accept any object matching Evaluator protocol.""" return evaluator.evaluate(text)
Mypy Configuration (from pyproject.toml)
[tool.mypy] python_version = "3.10" warn_return_any = true warn_unused_configs = true disallow_untyped_defs = false # Not required for ML code ignore_missing_imports = true # Many ML libraries lack stubs
Key settings:
- Types helpful but not requireddisallow_untyped_defs = false
- PyTorch, transformers lack complete type stubsignore_missing_imports = true
When to Use # type: ignore
# type: ignoreAcceptable Use Cases
# ✅ Known mypy limitation with NumPy scores = np.array(data) # type: ignore[misc] # ✅ PyTorch dynamic typing loss = criterion(outputs, targets) # type: ignore[arg-type] # ✅ Third-party library without stubs from some_ml_lib import advanced_feature # type: ignore
Avoid Overuse
# ❌ Don't ignore fixable errors def add(a: int, b: int) -> str: return a + b # type: ignore # Fix the return type instead! # ✅ Fix the actual issue def add(a: int, b: int) -> int: return a + b
Type Checking Commands
# Run mypy on all source code mypy constitutional_ai/ --ignore-missing-imports # Run on specific module mypy constitutional_ai/framework.py # Show error codes (useful for targeted ignores) mypy constitutional_ai/ --show-error-codes # Strict mode (educational, will fail) mypy constitutional_ai/ --strict
Summary: Type Hint Philosophy for ML Code
- Use type hints where they add clarity (public APIs, config)
- Skip type hints where they fight the framework (PyTorch internals)
- Accept mypy errors for known ML ecosystem limitations (documented in MYPY_ANALYSIS_REPORT.md)
- Use
strategically, not lazily (complex structures = OK, simple functions = not OK)Any - CI should pass with 41 known errors (continue-on-error: true)
- Prioritize correctness over type perfection (tests are the real validation)
Quick Reference
| Type | When to Use | Example |
|---|---|---|
, , | Primitives | |
, | Collections | |
| PyTorch tensors | |
| PyTorch models | |
| HF models | |
| Unknown/dynamic | |
| Optional | |
| Multiple returns | |
Remember: The goal is helpful type hints, not perfect type coverage. This is research code.