Claude-skill-registry data-modeler

Design data models with Pydantic schemas, comprehensive validation rules,

install

source · Clone the upstream repo

git clone https://github.com/majiayu000/claude-skill-registry

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/data-modeler" ~/.claude/skills/majiayu000-claude-skill-registry-data-modeler && rm -rf "$T"

manifest: skills/data/data-modeler/SKILL.md

Purpose

The data-modeler skill provides comprehensive guidance for designing robust data models using Pydantic, Python's most popular data validation library. This skill helps the Architecture Designer agent create type-safe, validated data structures that serve as the foundation for feature implementations.

This skill emphasizes:

Type Safety: Complete type annotations for all fields
Validation: Comprehensive validators for business rules
Documentation: Clear field descriptions and constraints
Relationships: Proper modeling of entity relationships
Serialization: Correct handling of JSON/dict conversion

The data-modeler skill ensures that data models are not just simple data containers, but intelligent objects that enforce business rules, validate data integrity, and provide clear contracts for data interchange.

When to Use

This skill auto-activates when the agent describes:

"Design data models for..."
"Create Pydantic schemas for..."
"Define data structures with..."
"Model the data with..."
"Create validation rules for..."
"Define entity relationships..."
"Specify field constraints for..."
"Design request/response schemas..."

Provided Capabilities

1. Pydantic Schema Design

What it provides:

BaseModel class structure
Field definitions with types and constraints
Default values and factory functions
Optional vs required fields
Nested model composition
Model inheritance patterns

Guidance:

Use
```
Field()
```
for metadata and constraints
Provide
```
description
```
for all fields
Set appropriate
```
default
```
or
```
default_factory
```
Use
```
Optional[T]
```
for nullable fields
Validate field names follow conventions

Example:

from pydantic import BaseModel, Field, validator
from typing import Optional, List
from datetime import datetime
from enum import Enum

class UserRole(str, Enum):
    """User role enumeration."""
    ADMIN = "admin"
    USER = "user"
    GUEST = "guest"

class Address(BaseModel):
    """Nested address model."""
    street: str = Field(..., description="Street address", min_length=1, max_length=200)
    city: str = Field(..., description="City name", min_length=1, max_length=100)
    state: str = Field(..., description="State/province code", min_length=2, max_length=2)
    postal_code: str = Field(..., description="Postal/ZIP code", regex=r"^\d{5}(-\d{4})?$")
    country: str = Field(default="US", description="Country code (ISO 3166-1 alpha-2)")

    class Config:
        schema_extra = {
            "example": {
                "street": "123 Main St",
                "city": "Springfield",
                "state": "IL",
                "postal_code": "62701",
                "country": "US"
            }
        }

class User(BaseModel):
    """User data model with comprehensive validation."""

    # Identity fields
    id: Optional[int] = Field(None, description="User ID (auto-generated)")
    username: str = Field(..., description="Unique username", min_length=3, max_length=50)
    email: str = Field(..., description="Email address (validated)")

    # Profile fields
    full_name: str = Field(..., description="User's full name", min_length=1, max_length=200)
    role: UserRole = Field(default=UserRole.USER, description="User role")
    is_active: bool = Field(default=True, description="Account active status")

    # Nested model
    address: Optional[Address] = Field(None, description="Mailing address")

    # Lists
    tags: List[str] = Field(default_factory=list, description="User tags")

    # Timestamps
    created_at: datetime = Field(default_factory=datetime.utcnow, description="Creation timestamp")
    updated_at: Optional[datetime] = Field(None, description="Last update timestamp")

    class Config:
        """Pydantic model configuration."""
        # Allow ORM models to be parsed
        orm_mode = True

        # Use enum values in JSON
        use_enum_values = True

        # Example for documentation
        schema_extra = {
            "example": {
                "username": "johndoe",
                "email": "john@example.com",
                "full_name": "John Doe",
                "role": "user",
                "address": {
                    "street": "123 Main St",
                    "city": "Springfield",
                    "state": "IL",
                    "postal_code": "62701"
                },
                "tags": ["verified", "premium"]
            }
        }

2. Field-Level Validators

What it provides:

```
@validator
```
decorator usage
Value transformation
Cross-field validation
Custom error messages
Pre and post validation

Validation Types:

Format validation: Email, URL, phone, regex
Range validation: min/max for numbers, length for strings
Business rules: Custom logic validation
Referential integrity: Cross-field checks

Example:

from pydantic import BaseModel, Field, validator, root_validator
import re

class UserRegistration(BaseModel):
    """User registration with comprehensive validation."""

    username: str = Field(..., min_length=3, max_length=50)
    email: str = Field(...)
    password: str = Field(..., min_length=8)
    password_confirm: str = Field(..., min_length=8)
    age: int = Field(..., ge=13, le=120)
    phone: Optional[str] = Field(None)

    @validator('username')
    def validate_username(cls, v):
        """Validate username format."""
        if not re.match(r'^[a-zA-Z0-9_-]+$', v):
            raise ValueError('Username must contain only letters, numbers, hyphens, and underscores')

        # Check against reserved names
        reserved = ['admin', 'root', 'system']
        if v.lower() in reserved:
            raise ValueError(f'Username "{v}" is reserved')

        return v.lower()  # Normalize to lowercase

    @validator('email')
    def validate_email(cls, v):
        """Validate email format."""
        email_regex = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
        if not re.match(email_regex, v):
            raise ValueError('Invalid email format')

        return v.lower()  # Normalize to lowercase

    @validator('password')
    def validate_password_strength(cls, v):
        """Validate password strength."""
        if not re.search(r'[A-Z]', v):
            raise ValueError('Password must contain at least one uppercase letter')
        if not re.search(r'[a-z]', v):
            raise ValueError('Password must contain at least one lowercase letter')
        if not re.search(r'\d', v):
            raise ValueError('Password must contain at least one digit')
        if not re.search(r'[!@#$%^&*(),.?":{}|<>]', v):
            raise ValueError('Password must contain at least one special character')

        return v

    @validator('phone')
    def validate_phone(cls, v):
        """Validate phone number format."""
        if v is None:
            return v

        # Remove all non-digit characters
        digits = re.sub(r'\D', '', v)

        if len(digits) != 10:
            raise ValueError('Phone number must be 10 digits')

        # Return formatted phone
        return f'({digits[:3]}) {digits[3:6]}-{digits[6:]}'

    @root_validator
    def validate_passwords_match(cls, values):
        """Validate that passwords match (cross-field validation)."""
        password = values.get('password')
        password_confirm = values.get('password_confirm')

        if password != password_confirm:
            raise ValueError('Passwords do not match')

        return values

3. Model-Level Validators

What it provides:

```
@root_validator
```
for cross-field validation
Pre-validation transformations
Post-validation checks
Complex business rule enforcement

Example:

from pydantic import BaseModel, Field, root_validator
from datetime import date, datetime
from typing import Optional

class EventBooking(BaseModel):
    """Event booking with complex validation."""

    event_name: str = Field(...)
    start_date: date = Field(...)
    end_date: date = Field(...)
    attendees: int = Field(..., ge=1, le=1000)
    room_capacity: int = Field(..., ge=1)
    is_catering: bool = Field(default=False)
    catering_headcount: Optional[int] = Field(None, ge=1)

    @root_validator(pre=True)
    def convert_date_strings(cls, values):
        """Pre-validation: Convert date strings to date objects."""
        for field in ['start_date', 'end_date']:
            if field in values and isinstance(values[field], str):
                values[field] = datetime.strptime(values[field], '%Y-%m-%d').date()
        return values

    @root_validator
    def validate_dates(cls, values):
        """Validate date logic."""
        start = values.get('start_date')
        end = values.get('end_date')

        if start and end:
            # End must be after start
            if end < start:
                raise ValueError('End date must be after start date')

            # Maximum event duration: 30 days
            if (end - start).days > 30:
                raise ValueError('Event duration cannot exceed 30 days')

            # Must be future dates
            if start < date.today():
                raise ValueError('Event cannot be in the past')

        return values

    @root_validator
    def validate_capacity(cls, values):
        """Validate room capacity vs attendees."""
        attendees = values.get('attendees')
        capacity = values.get('room_capacity')

        if attendees and capacity:
            if attendees > capacity:
                raise ValueError(f'Attendees ({attendees}) exceeds room capacity ({capacity})')

        return values

    @root_validator
    def validate_catering(cls, values):
        """Validate catering requirements."""
        is_catering = values.get('is_catering')
        catering_headcount = values.get('catering_headcount')
        attendees = values.get('attendees')

        if is_catering:
            # Catering headcount required if catering enabled
            if not catering_headcount:
                raise ValueError('Catering headcount required when catering is enabled')

            # Catering headcount cannot exceed attendees
            if catering_headcount > attendees:
                raise ValueError('Catering headcount cannot exceed number of attendees')
        else:
            # No catering headcount if catering disabled
            if catering_headcount:
                raise ValueError('Catering headcount specified but catering is disabled')

        return values

4. Type Annotations and Constraints

What it provides:

Proper use of typing module
Generic types (List, Dict, Set, Tuple)
Union types and Optional
Literal types for constants
Custom types

Example:

from pydantic import BaseModel, Field, constr, conint, confloat, conlist
from typing import List, Dict, Set, Optional, Union, Literal, Any
from datetime import datetime

# Custom constrained types
Username = constr(regex=r'^[a-zA-Z0-9_-]+$', min_length=3, max_length=50)
PositiveInt = conint(gt=0)
Percentage = confloat(ge=0.0, le=100.0)
NonEmptyList = conlist(str, min_items=1)

class ProductStatus(str, Enum):
    """Product status enum."""
    DRAFT = "draft"
    ACTIVE = "active"
    ARCHIVED = "archived"

class Product(BaseModel):
    """Product model with advanced type annotations."""

    # Basic types with constraints
    id: Optional[int] = None
    name: constr(min_length=1, max_length=200)
    sku: constr(regex=r'^[A-Z]{3}-\d{6}$')  # Format: ABC-123456

    # Numeric types with constraints
    price: confloat(gt=0.0, le=1000000.0)
    discount_percentage: Percentage = 0.0
    stock_quantity: PositiveInt

    # Enum
    status: ProductStatus = ProductStatus.DRAFT

    # Collections
    tags: List[str] = Field(default_factory=list)
    categories: Set[str] = Field(default_factory=set)
    attributes: Dict[str, Any] = Field(default_factory=dict)

    # Union types
    metadata: Union[Dict[str, str], None] = None

    # Literal type (specific values only)
    measurement_unit: Literal["kg", "lb", "oz", "g"]

    # Nested models
    dimensions: Optional['ProductDimensions'] = None

    # Timestamps
    created_at: datetime = Field(default_factory=datetime.utcnow)
    updated_at: Optional[datetime] = None

class ProductDimensions(BaseModel):
    """Product dimensions (nested model)."""
    length: confloat(gt=0)
    width: confloat(gt=0)
    height: confloat(gt=0)
    unit: Literal["cm", "in", "m"]

    @property
    def volume(self) -> float:
        """Calculate volume."""
        return self.length * self.width * self.height

# Enable forward reference
Product.update_forward_refs()

5. Relationship Mappings

What it provides:

One-to-one relationships
One-to-many relationships
Many-to-many relationships
Foreign key references
Embedded vs referenced documents

Relationship Patterns:

One-to-One:

class UserProfile(BaseModel):
    """User profile (one-to-one with User)."""
    user_id: int = Field(..., description="Foreign key to User")
    bio: Optional[str] = Field(None, max_length=500)
    avatar_url: Optional[str] = None

class User(BaseModel):
    """User with one-to-one profile."""
    id: int
    username: str
    profile: Optional[UserProfile] = None  # Embedded relationship

One-to-Many:

class Comment(BaseModel):
    """Comment (many comments per post)."""
    id: int
    post_id: int = Field(..., description="Foreign key to Post")
    content: str
    created_at: datetime

class Post(BaseModel):
    """Post with many comments."""
    id: int
    title: str
    content: str
    comments: List[Comment] = Field(default_factory=list)  # Embedded list

Many-to-Many:

class Tag(BaseModel):
    """Tag entity."""
    id: int
    name: str

class Article(BaseModel):
    """Article with many tags."""
    id: int
    title: str
    tag_ids: List[int] = Field(default_factory=list)  # Reference by ID
    # OR
    tags: List[Tag] = Field(default_factory=list)  # Embedded tags

6. Serialization Strategies

What it provides:

JSON serialization/deserialization
```
dict()
```
conversion with exclusions
```
json()
```
output with formatting
Custom serializers for complex types
Alias usage for field naming

Example:

from pydantic import BaseModel, Field
from datetime import datetime
from typing import Optional

class ApiResponse(BaseModel):
    """API response with serialization control."""

    id: int
    name: str
    internal_code: str = Field(..., alias="code")  # Use 'code' in JSON
    created_at: datetime
    secret_key: Optional[str] = None  # Should not be exposed
    _internal_state: str = "processing"  # Private field (not serialized)

    class Config:
        # Allow field aliases
        allow_population_by_field_name = True

        # Custom JSON encoders
        json_encoders = {
            datetime: lambda v: v.isoformat()
        }

# Usage
response = ApiResponse(
    id=1,
    name="Test",
    code="ABC123",
    created_at=datetime.utcnow(),
    secret_key="secret"
)

# Serialize to dict (exclude secret)
data = response.dict(exclude={'secret_key'})
# {'id': 1, 'name': 'Test', 'internal_code': 'ABC123', 'created_at': datetime(...)}

# Serialize to JSON with alias
json_str = response.json(by_alias=True, exclude={'secret_key'})
# {"id": 1, "name": "Test", "code": "ABC123", "created_at": "2025-10-29T..."}

# Include/exclude specific fields
data = response.dict(include={'id', 'name'})
# {'id': 1, 'name': 'Test'}

Usage Guide

Step 1: Identify Data Entities

Requirements → Entities → Attributes → Relationships

Step 2: Define Base Models

Create BaseModel → Add fields → Set types → Add descriptions

Step 3: Add Constraints

Field(...) → min/max → regex → custom constraints

Step 4: Implement Validators

@validator → business rules → error messages → transformations

Step 5: Model Relationships

Identify relationships → Choose embedding vs reference → Add foreign keys

Step 6: Configure Serialization

Config class → JSON encoders → Aliases → ORM mode

Step 7: Add Examples

schema_extra → Example data → Documentation

Step 8: Test Models

Create instances → Validate data → Test edge cases → Check errors

Best Practices

Use Field() for Metadata
- Always provide descriptions
- Set constraints (min/max, regex)
- Provide examples in Config
Comprehensive Validation
- Validate at field level when possible
- Use root_validator for cross-field logic
- Provide clear error messages
Type Safety
- Use specific types, not Any
- Use Optional[T] for nullable fields
- Use Enum for fixed choices
Documentation
- Description for every field
- Examples in schema_extra
- Docstrings for complex logic
Serialization Control
- Use aliases for API compatibility
- Exclude sensitive fields
- Custom encoders for complex types
Model Organization
- Group related models in same file
- Use inheritance for shared fields
- Keep models focused and cohesive

Resources

data-model-guide.md

Comprehensive data modeling guide including:

Entity-relationship principles
Normalization guidelines
Pydantic model structure best practices
Validation strategies and patterns
Type hints and annotations guide
Field constraints catalog

pydantic-patterns.md

Pydantic-specific patterns and examples:

BaseModel advanced usage
Custom validators (field and model level)
Field validators with dependencies
Root validators for complex logic
Config options and settings
Nested models and composition
Serialization and deserialization
ORM mode and database integration
Complete code examples

Example Usage

Input (from Architecture Designer agent):

"Design data models for a user management system with user profiles, addresses, roles, and audit logging."

Output (data-modeler skill provides):

# Comprehensive data models with validation

from pydantic import BaseModel, Field, validator, root_validator
from typing import Optional, List
from datetime import datetime
from enum import Enum

class UserRole(str, Enum):
    """User roles."""
    ADMIN = "admin"
    MANAGER = "manager"
    USER = "user"

class Address(BaseModel):
    """User address."""
    street: str = Field(..., min_length=1, max_length=200)
    city: str = Field(..., min_length=1, max_length=100)
    state: str = Field(..., min_length=2, max_length=2)
    postal_code: str = Field(..., regex=r'^\d{5}(-\d{4})?$')

class UserProfile(BaseModel):
    """User profile information."""
    bio: Optional[str] = Field(None, max_length=500)
    avatar_url: Optional[str] = None
    phone: Optional[str] = Field(None, regex=r'^\(\d{3}\) \d{3}-\d{4}$')

    @validator('avatar_url')
    def validate_avatar_url(cls, v):
        if v and not v.startswith(('http://', 'https://')):
            raise ValueError('Avatar URL must be HTTP/HTTPS')
        return v

class User(BaseModel):
    """Main user model."""
    id: Optional[int] = None
    username: str = Field(..., min_length=3, max_length=50)
    email: str = Field(...)
    full_name: str = Field(..., min_length=1, max_length=200)
    role: UserRole = UserRole.USER
    is_active: bool = True

    # Relationships
    profile: Optional[UserProfile] = None
    addresses: List[Address] = Field(default_factory=list)

    # Audit fields
    created_at: datetime = Field(default_factory=datetime.utcnow)
    created_by: Optional[int] = None
    updated_at: Optional[datetime] = None
    updated_by: Optional[int] = None

    @validator('email')
    def validate_email(cls, v):
        import re
        if not re.match(r'^[^@]+@[^@]+\.[^@]+$', v):
            raise ValueError('Invalid email format')
        return v.lower()

    class Config:
        orm_mode = True
        schema_extra = {
            "example": {
                "username": "johndoe",
                "email": "john@example.com",
                "full_name": "John Doe",
                "role": "user",
                "profile": {
                    "bio": "Software engineer",
                    "phone": "(555) 123-4567"
                },
                "addresses": [{
                    "street": "123 Main St",
                    "city": "Springfield",
                    "state": "IL",
                    "postal_code": "62701"
                }]
            }
        }

Integration

Used By:

@architecture-designer (Primary) - Phase 2 sub-agent for architecture design

Integrates With:

architecture-planner skill - Data models designed after component structure
api-designer skill - Data models used in API request/response schemas

Workflow Position:

Analysis Specialist completes requirements analysis
Architecture Designer receives analysis
architecture-planner skill designs component structure (Step 3)
data-modeler skill designs data models (Step 4)
api-designer skill designs API contracts using these models (Step 5)
Results synthesized into PRP

Version: 2.0.0 Auto-Activation: Yes Phase: 2 - Design & Planning Created: 2025-10-29