Claude-skill-registry-data ml-reviewer
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry-data
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry-data "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/ml-reviewer" ~/.claude/skills/majiayu000-claude-skill-registry-data-ml-reviewer && rm -rf "$T"
manifest:
data/ml-reviewer/SKILL.mdsource content
ML Reviewer Skill
Purpose
Reviews Machine Learning and Deep Learning code for PyTorch, TensorFlow, scikit-learn, and MLOps best practices.
When to Use
- ML/DL project code review
- "PyTorch", "TensorFlow", "Keras", "scikit-learn", "model training" mentions
- Model performance, training optimization inspection
- Projects with ML framework dependencies
Project Detection
,torch
,tensorflow
,keras
in requirements.txt/pyproject.tomlsklearn
,.pt
,.pth
,.h5
model files.pkl
,train.py
,model.py
filesdataset.py- Jupyter notebooks with ML imports
Workflow
Step 1: Analyze Project
**Framework**: PyTorch / TensorFlow / scikit-learn **Python**: 3.10+ **CUDA**: 11.x / 12.x **Task**: Classification / Regression / NLP / CV **Stage**: Research / Production
Step 2: Select Review Areas
AskUserQuestion:
"Which areas to review?" Options: - Full ML pattern check (recommended) - Model architecture review - Training loop optimization - Data pipeline efficiency - MLOps/deployment patterns multiSelect: true
Detection Rules
PyTorch Patterns
| Check | Recommendation | Severity |
|---|---|---|
| Missing model.eval() | Inconsistent inference | HIGH |
| Missing torch.no_grad() | Memory leak in inference | HIGH |
| In-place operations in autograd | Gradient computation error | CRITICAL |
| DataLoader num_workers=0 | CPU bottleneck | MEDIUM |
| Missing gradient clipping | Exploding gradients | MEDIUM |
# BAD: Missing eval() and no_grad() def predict(model, x): return model(x) # Dropout/BatchNorm inconsistent! # GOOD: Proper inference mode def predict(model, x): model.eval() with torch.no_grad(): return model(x) # BAD: In-place operation breaking autograd x = torch.randn(10, requires_grad=True) x += 1 # In-place! Breaks gradient computation # GOOD: Out-of-place operation x = torch.randn(10, requires_grad=True) x = x + 1 # BAD: DataLoader bottleneck loader = DataLoader(dataset, batch_size=32) # num_workers=0 # GOOD: Parallel data loading loader = DataLoader( dataset, batch_size=32, num_workers=4, pin_memory=True, # For GPU persistent_workers=True, ) # BAD: No gradient clipping optimizer.step() # GOOD: Clip gradients torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0) optimizer.step()
TensorFlow/Keras Patterns
| Check | Recommendation | Severity |
|---|---|---|
| Missing @tf.function | Performance loss | MEDIUM |
| Eager mode in production | Slow inference | HIGH |
| Large model in memory | OOM risk | HIGH |
| Missing mixed precision | Training inefficiency | MEDIUM |
# BAD: No @tf.function def train_step(x, y): with tf.GradientTape() as tape: pred = model(x) loss = loss_fn(y, pred) gradients = tape.gradient(loss, model.trainable_variables) optimizer.apply_gradients(zip(gradients, model.trainable_variables)) # GOOD: Use @tf.function @tf.function def train_step(x, y): with tf.GradientTape() as tape: pred = model(x, training=True) loss = loss_fn(y, pred) gradients = tape.gradient(loss, model.trainable_variables) optimizer.apply_gradients(zip(gradients, model.trainable_variables)) # BAD: Missing mixed precision model.fit(x_train, y_train, epochs=10) # GOOD: Enable mixed precision tf.keras.mixed_precision.set_global_policy('mixed_float16') model.fit(x_train, y_train, epochs=10)
scikit-learn Patterns
| Check | Recommendation | Severity |
|---|---|---|
| fit_transform on test data | Data leakage | CRITICAL |
| Missing cross-validation | Overfitting risk | HIGH |
| No feature scaling | Model performance | MEDIUM |
| Hardcoded random_state | Reproducibility | LOW |
# BAD: Data leakage scaler = StandardScaler() X_train_scaled = scaler.fit_transform(X_train) X_test_scaled = scaler.fit_transform(X_test) # LEAK! Re-fitting # GOOD: transform only on test scaler = StandardScaler() X_train_scaled = scaler.fit_transform(X_train) X_test_scaled = scaler.transform(X_test) # No re-fit # BAD: No cross-validation model.fit(X_train, y_train) score = model.score(X_test, y_test) # GOOD: Use cross-validation from sklearn.model_selection import cross_val_score scores = cross_val_score(model, X, y, cv=5) print(f"CV Score: {scores.mean():.3f} (+/- {scores.std():.3f})") # BAD: Pipeline without scaling model = LogisticRegression() model.fit(X_train, y_train) # GOOD: Use Pipeline with scaling from sklearn.pipeline import Pipeline pipeline = Pipeline([ ('scaler', StandardScaler()), ('classifier', LogisticRegression()) ]) pipeline.fit(X_train, y_train)
Data Pipeline
| Check | Problem | Solution |
|---|---|---|
| Loading full dataset to memory | OOM | Use generators/tf.data |
| No data augmentation | Overfitting | Add augmentation |
| Unbalanced classes | Biased model | Oversample/undersample/weights |
| No validation split | No early stopping | Use validation set |
# BAD: Full dataset in memory images = [] for path in all_image_paths: images.append(load_image(path)) # OOM for large datasets! # GOOD: Use generator def data_generator(paths, batch_size): for i in range(0, len(paths), batch_size): batch_paths = paths[i:i+batch_size] yield np.array([load_image(p) for p in batch_paths]) # GOOD: Use tf.data dataset = tf.data.Dataset.from_tensor_slices(paths) dataset = dataset.map(load_and_preprocess) dataset = dataset.batch(32).prefetch(tf.data.AUTOTUNE) # BAD: No class weights for imbalanced data model.fit(X_train, y_train) # GOOD: Add class weights from sklearn.utils.class_weight import compute_class_weight weights = compute_class_weight('balanced', classes=np.unique(y), y=y) class_weights = dict(enumerate(weights)) model.fit(X_train, y_train, class_weight=class_weights)
GPU/Performance
| Check | Recommendation | Severity |
|---|---|---|
| CPU tensor operations | Use GPU tensors | HIGH |
| Frequent GPU-CPU transfer | Batch transfers | HIGH |
| No gradient accumulation | OOM for large batch | MEDIUM |
| Missing torch.cuda.empty_cache() | Memory fragmentation | LOW |
# BAD: CPU operations x = torch.randn(1000, 1000) y = torch.randn(1000, 1000) z = x @ y # CPU computation # GOOD: GPU operations device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') x = torch.randn(1000, 1000, device=device) y = torch.randn(1000, 1000, device=device) z = x @ y # GPU computation # BAD: Frequent CPU-GPU transfer for x, y in dataloader: x = x.cuda() y = y.cuda() loss = model(x, y) print(loss.item()) # Sync every iteration! # GOOD: Batch logging losses = [] for x, y in dataloader: x, y = x.to(device), y.to(device) loss = model(x, y) losses.append(loss) if step % log_interval == 0: print(torch.stack(losses).mean().item()) # Gradient accumulation for large effective batch accumulation_steps = 4 for i, (x, y) in enumerate(dataloader): loss = model(x, y) / accumulation_steps loss.backward() if (i + 1) % accumulation_steps == 0: optimizer.step() optimizer.zero_grad()
MLOps/Experiment Tracking
| Check | Recommendation | Severity |
|---|---|---|
| No experiment tracking | Reproducibility | HIGH |
| Hardcoded hyperparameters | Config management | MEDIUM |
| No model versioning | Deployment issues | MEDIUM |
| Missing seed setting | Non-reproducible | HIGH |
# BAD: No seed setting model = train_model(X, y) # GOOD: Set all seeds import random import numpy as np import torch def set_seed(seed=42): random.seed(seed) np.random.seed(seed) torch.manual_seed(seed) torch.cuda.manual_seed_all(seed) torch.backends.cudnn.deterministic = True set_seed(42) # BAD: Hardcoded hyperparameters lr = 0.001 batch_size = 32 epochs = 100 # GOOD: Use config file or hydra import hydra from omegaconf import DictConfig @hydra.main(config_path="configs", config_name="train") def train(cfg: DictConfig): model = build_model(cfg.model) optimizer = torch.optim.Adam(model.parameters(), lr=cfg.lr) # GOOD: Use experiment tracking import wandb wandb.init(project="my-project", config=cfg) for epoch in range(epochs): loss = train_epoch(model, dataloader) wandb.log({"loss": loss, "epoch": epoch}) wandb.finish()
Response Template
## ML Code Review Results **Project**: [name] **Framework**: PyTorch/TensorFlow/scikit-learn **Task**: Classification/Regression/NLP/CV **Files Analyzed**: X ### Model Architecture | Status | File | Issue | |--------|------|-------| | MEDIUM | models/resnet.py | Missing dropout for regularization | | LOW | models/transformer.py | Consider gradient checkpointing | ### Training Loop | Status | File | Issue | |--------|------|-------| | HIGH | train.py | Missing model.eval() in validation (line 45) | | HIGH | train.py | No gradient clipping (line 67) | ### Data Pipeline | Status | File | Issue | |--------|------|-------| | CRITICAL | data/dataset.py | fit_transform on test data (line 23) | | HIGH | data/loader.py | DataLoader num_workers=0 | ### MLOps | Status | File | Issue | |--------|------|-------| | HIGH | train.py | No seed setting for reproducibility | | MEDIUM | train.py | Hardcoded hyperparameters | ### Recommended Actions 1. [ ] Add model.eval() and torch.no_grad() for inference 2. [ ] Fix data leakage in preprocessing 3. [ ] Set random seeds for reproducibility 4. [ ] Add experiment tracking (wandb/mlflow)
Best Practices
- Training: eval mode, no_grad, gradient clipping, mixed precision
- Data: No leakage, proper splits, augmentation, balanced classes
- Performance: GPU operations, batch transfers, gradient accumulation
- MLOps: Seed setting, experiment tracking, config management
- Testing: Unit tests for data pipeline, model output shape tests
Integration
skill: General Python code qualitypython-reviewer
skill: Data preprocessing patternspython-data-reviewer
skill: ML test generationtest-generator
skill: ML containerizationdocker-reviewer
Notes
- Based on PyTorch 2.x, TensorFlow 2.x, scikit-learn 1.x
- Supports distributed training patterns (DDP, FSDP)
- Includes MLOps patterns (wandb, mlflow, hydra)