Claude-skill-registry-data ml-reviewer

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry-data
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry-data "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/ml-reviewer" ~/.claude/skills/majiayu000-claude-skill-registry-data-ml-reviewer && rm -rf "$T"
manifest: data/ml-reviewer/SKILL.md
source content

ML Reviewer Skill

Purpose

Reviews Machine Learning and Deep Learning code for PyTorch, TensorFlow, scikit-learn, and MLOps best practices.

When to Use

  • ML/DL project code review
  • "PyTorch", "TensorFlow", "Keras", "scikit-learn", "model training" mentions
  • Model performance, training optimization inspection
  • Projects with ML framework dependencies

Project Detection

  • torch
    ,
    tensorflow
    ,
    keras
    ,
    sklearn
    in requirements.txt/pyproject.toml
  • .pt
    ,
    .pth
    ,
    .h5
    ,
    .pkl
    model files
  • train.py
    ,
    model.py
    ,
    dataset.py
    files
  • Jupyter notebooks with ML imports

Workflow

Step 1: Analyze Project

**Framework**: PyTorch / TensorFlow / scikit-learn
**Python**: 3.10+
**CUDA**: 11.x / 12.x
**Task**: Classification / Regression / NLP / CV
**Stage**: Research / Production

Step 2: Select Review Areas

AskUserQuestion:

"Which areas to review?"
Options:
- Full ML pattern check (recommended)
- Model architecture review
- Training loop optimization
- Data pipeline efficiency
- MLOps/deployment patterns
multiSelect: true

Detection Rules

PyTorch Patterns

CheckRecommendationSeverity
Missing model.eval()Inconsistent inferenceHIGH
Missing torch.no_grad()Memory leak in inferenceHIGH
In-place operations in autogradGradient computation errorCRITICAL
DataLoader num_workers=0CPU bottleneckMEDIUM
Missing gradient clippingExploding gradientsMEDIUM
# BAD: Missing eval() and no_grad()
def predict(model, x):
    return model(x)  # Dropout/BatchNorm inconsistent!

# GOOD: Proper inference mode
def predict(model, x):
    model.eval()
    with torch.no_grad():
        return model(x)

# BAD: In-place operation breaking autograd
x = torch.randn(10, requires_grad=True)
x += 1  # In-place! Breaks gradient computation

# GOOD: Out-of-place operation
x = torch.randn(10, requires_grad=True)
x = x + 1

# BAD: DataLoader bottleneck
loader = DataLoader(dataset, batch_size=32)  # num_workers=0

# GOOD: Parallel data loading
loader = DataLoader(
    dataset,
    batch_size=32,
    num_workers=4,
    pin_memory=True,  # For GPU
    persistent_workers=True,
)

# BAD: No gradient clipping
optimizer.step()

# GOOD: Clip gradients
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
optimizer.step()

TensorFlow/Keras Patterns

CheckRecommendationSeverity
Missing @tf.functionPerformance lossMEDIUM
Eager mode in productionSlow inferenceHIGH
Large model in memoryOOM riskHIGH
Missing mixed precisionTraining inefficiencyMEDIUM
# BAD: No @tf.function
def train_step(x, y):
    with tf.GradientTape() as tape:
        pred = model(x)
        loss = loss_fn(y, pred)
    gradients = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))

# GOOD: Use @tf.function
@tf.function
def train_step(x, y):
    with tf.GradientTape() as tape:
        pred = model(x, training=True)
        loss = loss_fn(y, pred)
    gradients = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))

# BAD: Missing mixed precision
model.fit(x_train, y_train, epochs=10)

# GOOD: Enable mixed precision
tf.keras.mixed_precision.set_global_policy('mixed_float16')
model.fit(x_train, y_train, epochs=10)

scikit-learn Patterns

CheckRecommendationSeverity
fit_transform on test dataData leakageCRITICAL
Missing cross-validationOverfitting riskHIGH
No feature scalingModel performanceMEDIUM
Hardcoded random_stateReproducibilityLOW
# BAD: Data leakage
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.fit_transform(X_test)  # LEAK! Re-fitting

# GOOD: transform only on test
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)  # No re-fit

# BAD: No cross-validation
model.fit(X_train, y_train)
score = model.score(X_test, y_test)

# GOOD: Use cross-validation
from sklearn.model_selection import cross_val_score
scores = cross_val_score(model, X, y, cv=5)
print(f"CV Score: {scores.mean():.3f} (+/- {scores.std():.3f})")

# BAD: Pipeline without scaling
model = LogisticRegression()
model.fit(X_train, y_train)

# GOOD: Use Pipeline with scaling
from sklearn.pipeline import Pipeline
pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('classifier', LogisticRegression())
])
pipeline.fit(X_train, y_train)

Data Pipeline

CheckProblemSolution
Loading full dataset to memoryOOMUse generators/tf.data
No data augmentationOverfittingAdd augmentation
Unbalanced classesBiased modelOversample/undersample/weights
No validation splitNo early stoppingUse validation set
# BAD: Full dataset in memory
images = []
for path in all_image_paths:
    images.append(load_image(path))  # OOM for large datasets!

# GOOD: Use generator
def data_generator(paths, batch_size):
    for i in range(0, len(paths), batch_size):
        batch_paths = paths[i:i+batch_size]
        yield np.array([load_image(p) for p in batch_paths])

# GOOD: Use tf.data
dataset = tf.data.Dataset.from_tensor_slices(paths)
dataset = dataset.map(load_and_preprocess)
dataset = dataset.batch(32).prefetch(tf.data.AUTOTUNE)

# BAD: No class weights for imbalanced data
model.fit(X_train, y_train)

# GOOD: Add class weights
from sklearn.utils.class_weight import compute_class_weight
weights = compute_class_weight('balanced', classes=np.unique(y), y=y)
class_weights = dict(enumerate(weights))
model.fit(X_train, y_train, class_weight=class_weights)

GPU/Performance

CheckRecommendationSeverity
CPU tensor operationsUse GPU tensorsHIGH
Frequent GPU-CPU transferBatch transfersHIGH
No gradient accumulationOOM for large batchMEDIUM
Missing torch.cuda.empty_cache()Memory fragmentationLOW
# BAD: CPU operations
x = torch.randn(1000, 1000)
y = torch.randn(1000, 1000)
z = x @ y  # CPU computation

# GOOD: GPU operations
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
x = torch.randn(1000, 1000, device=device)
y = torch.randn(1000, 1000, device=device)
z = x @ y  # GPU computation

# BAD: Frequent CPU-GPU transfer
for x, y in dataloader:
    x = x.cuda()
    y = y.cuda()
    loss = model(x, y)
    print(loss.item())  # Sync every iteration!

# GOOD: Batch logging
losses = []
for x, y in dataloader:
    x, y = x.to(device), y.to(device)
    loss = model(x, y)
    losses.append(loss)
if step % log_interval == 0:
    print(torch.stack(losses).mean().item())

# Gradient accumulation for large effective batch
accumulation_steps = 4
for i, (x, y) in enumerate(dataloader):
    loss = model(x, y) / accumulation_steps
    loss.backward()
    if (i + 1) % accumulation_steps == 0:
        optimizer.step()
        optimizer.zero_grad()

MLOps/Experiment Tracking

CheckRecommendationSeverity
No experiment trackingReproducibilityHIGH
Hardcoded hyperparametersConfig managementMEDIUM
No model versioningDeployment issuesMEDIUM
Missing seed settingNon-reproducibleHIGH
# BAD: No seed setting
model = train_model(X, y)

# GOOD: Set all seeds
import random
import numpy as np
import torch

def set_seed(seed=42):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True

set_seed(42)

# BAD: Hardcoded hyperparameters
lr = 0.001
batch_size = 32
epochs = 100

# GOOD: Use config file or hydra
import hydra
from omegaconf import DictConfig

@hydra.main(config_path="configs", config_name="train")
def train(cfg: DictConfig):
    model = build_model(cfg.model)
    optimizer = torch.optim.Adam(model.parameters(), lr=cfg.lr)

# GOOD: Use experiment tracking
import wandb
wandb.init(project="my-project", config=cfg)
for epoch in range(epochs):
    loss = train_epoch(model, dataloader)
    wandb.log({"loss": loss, "epoch": epoch})
wandb.finish()

Response Template

## ML Code Review Results

**Project**: [name]
**Framework**: PyTorch/TensorFlow/scikit-learn
**Task**: Classification/Regression/NLP/CV
**Files Analyzed**: X

### Model Architecture
| Status | File | Issue |
|--------|------|-------|
| MEDIUM | models/resnet.py | Missing dropout for regularization |
| LOW | models/transformer.py | Consider gradient checkpointing |

### Training Loop
| Status | File | Issue |
|--------|------|-------|
| HIGH | train.py | Missing model.eval() in validation (line 45) |
| HIGH | train.py | No gradient clipping (line 67) |

### Data Pipeline
| Status | File | Issue |
|--------|------|-------|
| CRITICAL | data/dataset.py | fit_transform on test data (line 23) |
| HIGH | data/loader.py | DataLoader num_workers=0 |

### MLOps
| Status | File | Issue |
|--------|------|-------|
| HIGH | train.py | No seed setting for reproducibility |
| MEDIUM | train.py | Hardcoded hyperparameters |

### Recommended Actions
1. [ ] Add model.eval() and torch.no_grad() for inference
2. [ ] Fix data leakage in preprocessing
3. [ ] Set random seeds for reproducibility
4. [ ] Add experiment tracking (wandb/mlflow)

Best Practices

  1. Training: eval mode, no_grad, gradient clipping, mixed precision
  2. Data: No leakage, proper splits, augmentation, balanced classes
  3. Performance: GPU operations, batch transfers, gradient accumulation
  4. MLOps: Seed setting, experiment tracking, config management
  5. Testing: Unit tests for data pipeline, model output shape tests

Integration

  • python-reviewer
    skill: General Python code quality
  • python-data-reviewer
    skill: Data preprocessing patterns
  • test-generator
    skill: ML test generation
  • docker-reviewer
    skill: ML containerization

Notes

  • Based on PyTorch 2.x, TensorFlow 2.x, scikit-learn 1.x
  • Supports distributed training patterns (DDP, FSDP)
  • Includes MLOps patterns (wandb, mlflow, hydra)