Claude-skill-registry automl-optimizer
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/automl-optimizer" ~/.claude/skills/majiayu000-claude-skill-registry-automl-optimizer && rm -rf "$T"
manifest:
skills/data/automl-optimizer/SKILL.mdsource content
AutoML Optimizer
Overview
Automates the tedious process of hyperparameter tuning and model selection. Instead of manually trying different configurations, define a search space and let AutoML find the optimal configuration through intelligent exploration.
Why AutoML?
Manual Tuning Problems:
- Time-consuming (hours/days of trial and error)
- Subjective (depends on intuition)
- Incomplete (can't try all combinations)
- Not reproducible (hard to document search process)
AutoML Benefits:
- ✅ Systematic exploration of search space
- ✅ Intelligent sampling (Bayesian optimization)
- ✅ All experiments tracked automatically
- ✅ Find optimal configuration faster
- ✅ Reproducible (search process documented)
AutoML Strategies
Strategy 1: Hyperparameter Optimization (Optuna)
from specweave import OptunaOptimizer # Define search space def objective(trial): # Suggest hyperparameters params = { 'n_estimators': trial.suggest_int('n_estimators', 100, 1000), 'max_depth': trial.suggest_int('max_depth', 3, 10), 'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.3, log=True), 'subsample': trial.suggest_float('subsample', 0.5, 1.0), 'colsample_bytree': trial.suggest_float('colsample_bytree', 0.5, 1.0) } # Train model model = XGBClassifier(**params) # Cross-validation score scores = cross_val_score(model, X_train, y_train, cv=5, scoring='roc_auc') return scores.mean() # Run optimization optimizer = OptunaOptimizer( objective=objective, n_trials=100, direction='maximize', increment="0042" ) best_params = optimizer.optimize() # Creates: # - .specweave/increments/0042.../experiments/optuna-study/ # ├── study.db (Optuna database) # ├── optimization_history.png # ├── param_importances.png # ├── parallel_coordinate.png # └── best_params.json
Optimization Report:
# Optuna Optimization Report ## Search Space - n_estimators: [100, 1000] - max_depth: [3, 10] - learning_rate: [0.01, 0.3] (log scale) - subsample: [0.5, 1.0] - colsample_bytree: [0.5, 1.0] ## Trials: 100 - Completed: 98 - Pruned: 2 (early stopping) - Failed: 0 ## Best Trial (#47) - ROC AUC: 0.892 ± 0.012 - Parameters: - n_estimators: 673 - max_depth: 6 - learning_rate: 0.094 - subsample: 0.78 - colsample_bytree: 0.91 ## Parameter Importance 1. learning_rate (0.42) - Most important 2. n_estimators (0.28) 3. max_depth (0.18) 4. colsample_bytree (0.08) 5. subsample (0.04) - Least important ## Improvement over Default - Default params: ROC AUC = 0.856 - Optimized params: ROC AUC = 0.892 - Improvement: +4.2%
Strategy 2: Algorithm Selection + Tuning
from specweave import AutoMLPipeline # Define candidate algorithms with search spaces pipeline = AutoMLPipeline(increment="0042") # Add candidates pipeline.add_candidate( name="xgboost", model=XGBClassifier, search_space={ 'n_estimators': (100, 1000), 'max_depth': (3, 10), 'learning_rate': (0.01, 0.3) } ) pipeline.add_candidate( name="lightgbm", model=LGBMClassifier, search_space={ 'n_estimators': (100, 1000), 'max_depth': (3, 10), 'learning_rate': (0.01, 0.3) } ) pipeline.add_candidate( name="random_forest", model=RandomForestClassifier, search_space={ 'n_estimators': (100, 500), 'max_depth': (3, 20), 'min_samples_split': (2, 20) } ) pipeline.add_candidate( name="logistic_regression", model=LogisticRegression, search_space={ 'C': (0.001, 100), 'penalty': ['l1', 'l2'] } ) # Run AutoML (tries all algorithms + hyperparameters) results = pipeline.fit( X_train, y_train, n_trials_per_model=50, cv_folds=5, metric='roc_auc' ) # Best model automatically selected best_model = pipeline.best_model_ best_params = pipeline.best_params_
AutoML Comparison:
| Model | Trials | Best Score | Mean Score | Std | Best Params | |---------------------|--------|------------|------------|-------|--------------------------------------| | xgboost | 50 | 0.892 | 0.876 | 0.012 | n_est=673, depth=6, lr=0.094 | | lightgbm | 50 | 0.889 | 0.873 | 0.011 | n_est=542, depth=7, lr=0.082 | | random_forest | 50 | 0.871 | 0.858 | 0.015 | n_est=384, depth=12, min_split=5 | | logistic_regression | 50 | 0.845 | 0.840 | 0.008 | C=1.234, penalty=l2 | **Winner: XGBoost** (ROC AUC = 0.892)
Strategy 3: Neural Architecture Search (NAS)
from specweave import NeuralArchitectureSearch # For deep learning nas = NeuralArchitectureSearch(increment="0042") # Define search space search_space = { 'num_layers': (2, 5), 'layer_sizes': (32, 512), 'activation': ['relu', 'tanh', 'elu'], 'dropout': (0.0, 0.5), 'optimizer': ['adam', 'sgd', 'rmsprop'], 'learning_rate': (0.0001, 0.01) } # Search for best architecture best_architecture = nas.search( X_train, y_train, search_space=search_space, n_trials=100, max_epochs=50 ) # Creates: Best neural network architecture
AutoML Frameworks Integration
Optuna (Recommended)
import optuna from specweave import configure_optuna # Auto-configures Optuna to log to increment configure_optuna(increment="0042") def objective(trial): params = { 'n_estimators': trial.suggest_int('n_estimators', 100, 1000), 'max_depth': trial.suggest_int('max_depth', 3, 10), } model = XGBClassifier(**params) score = cross_val_score(model, X, y, cv=5).mean() return score study = optuna.create_study(direction='maximize') study.optimize(objective, n_trials=100) # Automatically logged to increment folder
Auto-sklearn
from specweave import AutoSklearnOptimizer # Automated model selection + feature engineering optimizer = AutoSklearnOptimizer( time_left_for_this_task=3600, # 1 hour increment="0042" ) optimizer.fit(X_train, y_train) # Auto-sklearn tries: # - Multiple algorithms # - Feature preprocessing combinations # - Ensemble methods # Returns best pipeline
H2O AutoML
from specweave import H2OAutoMLOptimizer optimizer = H2OAutoMLOptimizer( max_runtime_secs=3600, # 1 hour max_models=50, increment="0042" ) optimizer.fit(X_train, y_train) # H2O tries many algorithms in parallel # Returns leaderboard + best model
Best Practices
1. Start with Default Baseline
# Always compare AutoML to default hyperparameters baseline_model = XGBClassifier() # Default params baseline_score = cross_val_score(baseline_model, X, y, cv=5).mean() # Then optimize optimizer = OptunaOptimizer(objective, n_trials=100) optimized_params = optimizer.optimize() improvement = (optimized_score - baseline_score) / baseline_score * 100 print(f"Improvement: {improvement:.1f}%") # Only use optimized if significant improvement (>2-3%)
2. Use Cross-Validation
# ❌ Wrong: Single train/test split score = model.score(X_test, y_test) # ✅ Correct: Cross-validation scores = cross_val_score(model, X_train, y_train, cv=5) score = scores.mean() # Prevents overfitting to specific train/test split
3. Set Reasonable Search Budgets
# Quick exploration (development) optimizer.optimize(n_trials=20) # ~5-10 minutes # Moderate search (iteration) optimizer.optimize(n_trials=100) # ~30-60 minutes # Thorough search (final model) optimizer.optimize(n_trials=500) # ~2-4 hours # Don't overdo it: diminishing returns after ~100-200 trials
4. Prune Unpromising Trials
# Optuna can stop bad trials early study = optuna.create_study( direction='maximize', pruner=optuna.pruners.MedianPruner() ) # If trial is performing worse than median at epoch N, stop it # Saves time by not fully training bad models
5. Document Search Space Rationale
# Document why you chose specific ranges search_space = { # XGBoost recommends max_depth 3-10 for most tasks 'max_depth': (3, 10), # Learning rate: 0.01-0.3 covers slow to fast learning # Log scale to spend more trials on smaller values 'learning_rate': (0.01, 0.3, 'log'), # n_estimators: Balance accuracy vs training time 'n_estimators': (100, 1000) }
Integration with SpecWeave
Automatic Experiment Tracking
# All AutoML trials logged automatically optimizer = OptunaOptimizer(objective, increment="0042") optimizer.optimize(n_trials=100) # Creates: # .specweave/increments/0042.../experiments/ # ├── optuna-trial-001/ # ├── optuna-trial-002/ # ├── ... # ├── optuna-trial-100/ # └── optuna-summary.md
Living Docs Integration
/sw:sync-docs update
Updates:
<!-- .specweave/docs/internal/architecture/ml-optimization.md --> ## Hyperparameter Optimization (Increment 0042) ### Optimization Strategy - Framework: Optuna (Bayesian optimization) - Trials: 100 - Search space: 5 hyperparameters - Metric: ROC AUC (5-fold CV) ### Results - Best score: 0.892 ± 0.012 - Improvement over default: +4.2% - Most important param: learning_rate (0.42) ### Selected Hyperparameters ```python { 'n_estimators': 673, 'max_depth': 6, 'learning_rate': 0.094, 'subsample': 0.78, 'colsample_bytree': 0.91 }
Recommendation
XGBoost with optimized hyperparameters for production deployment.
## Commands ```bash # Run AutoML optimization /ml:optimize 0042 --trials 100 # Compare algorithms /ml:compare-algorithms 0042 # Show optimization history /ml:optimization-report 0042
Common Patterns
Pattern 1: Coarse-to-Fine Optimization
# Step 1: Coarse search (wide ranges, few trials) coarse_space = { 'n_estimators': (100, 1000, 'int'), 'max_depth': (3, 10, 'int'), 'learning_rate': (0.01, 0.3, 'log') } coarse_results = optimizer.optimize(coarse_space, n_trials=50) # Step 2: Fine search (narrow ranges around best) best_params = coarse_results['best_params'] fine_space = { 'n_estimators': (best_params['n_estimators'] - 100, best_params['n_estimators'] + 100), 'max_depth': (max(3, best_params['max_depth'] - 1), min(10, best_params['max_depth'] + 1)), 'learning_rate': (best_params['learning_rate'] * 0.5, best_params['learning_rate'] * 1.5, 'log') } fine_results = optimizer.optimize(fine_space, n_trials=50)
Pattern 2: Multi-Objective Optimization
# Optimize for multiple objectives (accuracy + speed) def multi_objective(trial): params = { 'n_estimators': trial.suggest_int('n_estimators', 100, 1000), 'max_depth': trial.suggest_int('max_depth', 3, 10), } model = XGBClassifier(**params) # Objective 1: Accuracy accuracy = cross_val_score(model, X, y, cv=5).mean() # Objective 2: Training time start = time.time() model.fit(X_train, y_train) training_time = time.time() - start return accuracy, -training_time # Maximize accuracy, minimize time # Optuna will find Pareto-optimal solutions study = optuna.create_study(directions=['maximize', 'minimize']) study.optimize(multi_objective, n_trials=100)
Summary
AutoML accelerates ML development by:
- ✅ Automating tedious hyperparameter tuning
- ✅ Exploring search space systematically
- ✅ Finding optimal configurations faster
- ✅ Tracking all experiments automatically
- ✅ Documenting optimization process
Don't spend days manually tuning—let AutoML do it in hours.