Vibeship-spawner-skills neural-architecture-search

id: neural-architecture-search

install

source · Clone the upstream repo

git clone https://github.com/vibeforge1111/vibeship-spawner-skills

manifest: ai/neural-architecture-search/skill.yaml

tags

#neural-architecture-search #auto-ml #bayesian-optimization #search-space #hyperparameter-tuning #model-optimization

source content

id: neural-architecture-search name: Neural Architecture Search category: ai description: Use when automating model architecture design, optimizing hyperparameters, or exploring neural network configurations - covers NAS algorithms, search spaces, Bayesian optimization, and AutoML tools

patterns: golden_rules: - rule: "Define search space carefully" reason: "Too large = intractable, too small = miss optima" - rule: "Use efficient evaluation strategies" reason: "Full training per candidate is prohibitively expensive" - rule: "Start with proxy tasks" reason: "Smaller dataset, fewer epochs for quick filtering" - rule: "Consider the deployment target" reason: "Best accuracy != best latency/memory tradeoff" - rule: "Track all experiments" reason: "Reproducibility is critical for NAS" - rule: "Set compute budgets upfront" reason: "NAS can consume unlimited resources"

three_pillars: search_space: description: "What architectures are possible?" components: - "Layer types" - "Number of layers" - "Connections" - "Hyperparameters" - "Operations" search_strategy: description: "How to explore the space?" methods: - "Random search" - "Grid search" - "Bayesian optimization" - "Evolutionary algorithms" - "Reinforcement learning" - "Gradient-based (DARTS)" performance_estimation: description: "How to evaluate efficiently?" methods: - "Full training (slow but accurate)" - "Early stopping (train partially)" - "Weight sharing (one-shot methods)" - "Surrogate models (predict performance)" - "Learning curve extrapolation"

search_strategies: random_search: description: "Surprisingly effective baseline" note: "Often outperforms grid search for same compute" bayesian_optimization: description: "Model objective with surrogate (GP)" acquisition: "Expected Improvement balances explore/exploit" best_for: "Expensive evaluations, continuous spaces" darts: description: "Differentiable architecture search" key_insight: "Continuous relaxation of discrete choices" formula: "mixed_op(x) = sum(softmax(alpha_i) * op_i(x))" evolutionary: description: "Population-based search" best_for: "Large discrete search spaces"

efficient_evaluation: successive_halving: description: "Start many configs, keep top 1/eta fraction" hyperband: "Multi-fidelity version" weight_sharing: description: "Train supernet, extract subnets" pros: "Train once, evaluate many" cons: "Weight coupling between architectures" early_stopping: description: "Stop unpromising trials early" tools: "Optuna pruning, Hyperband"

anti_patterns:

pattern: "Search space too large" problem: "Never converges" solution: "Constrain based on domain knowledge"
pattern: "No early stopping" problem: "Wastes compute on bad configs" solution: "Successive halving, Optuna pruning"
pattern: "Full training per trial" problem: "Prohibitively expensive" solution: "Weight sharing, proxy tasks"
pattern: "Ignoring transfer" problem: "Starting from scratch each time" solution: "Warm-start from similar tasks"
pattern: "No reproducibility" problem: "Can't replicate results" solution: "Log all configs, set seeds"
pattern: "Overfitting to proxy" problem: "Best on proxy != best on target" solution: "Validate on full task periodically"

implementation_checklist: before_starting: - "Define target metrics (accuracy, latency, memory)" - "Set compute budget (GPU hours, $ limit)" - "Design search space based on domain knowledge" - "Choose performance estimation strategy" - "Set up experiment tracking (W&B, MLflow)" during_search: - "Monitor for convergence" - "Check for mode collapse (all configs similar)" - "Validate best configs on full training" - "Track resource usage" after_search: - "Retrain best architecture from scratch" - "Compare to hand-designed baseline" - "Ablation study on found architecture" - "Document reproducibility steps"

handoffs:

skill: transformer-architecture trigger: "searching transformer architectures"
skill: model-optimization trigger: "optimizing found architecture"
skill: distributed-training trigger: "scaling NAS to multiple GPUs"

ecosystem: tools: - "Optuna - Modern hyperparameter optimization" - "Ray Tune - Distributed hyperparameter tuning" - "Auto-sklearn - AutoML for scikit-learn" - "NNI - Neural Network Intelligence (Microsoft)" - "Keras Tuner - HyperParameter tuning for Keras" frameworks: - "DARTS - Differentiable Architecture Search" - "ProxylessNAS - Hardware-aware NAS" - "Once-for-All - Train once, deploy anywhere"

sources: papers: - "Elsken et al. (2019). Neural Architecture Search: A Survey" - "Liu et al. (2019). DARTS: Differentiable Architecture Search" - "Li et al. (2017). Hyperband: A Novel Bandit-Based Approach" resources: - "AutoML.org NAS Overview" - "Optuna Documentation"