Skilllibrary model-merging
Merge multiple fine-tuned LLM checkpoints using mergekit with methods like linear interpolation, SLERP, TIES, DARE, task arithmetic, and frankenmerging. Use when combining specialized model capabilities without retraining — e.g., merging a code model with a chat model. Do not use for training, LoRA adapter composition, or inference serving.
install
source · Clone the upstream repo
git clone https://github.com/merceralex397-collab/skilllibrary
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/merceralex397-collab/skilllibrary "$T" && mkdir -p ~/.claude/skills && cp -r "$T/12-ai-llm-training-architecture-and-research/model-merging" ~/.claude/skills/merceralex397-collab-skilllibrary-model-merging && rm -rf "$T"
manifest:
12-ai-llm-training-architecture-and-research/model-merging/SKILL.mdsource content
Purpose
Combine multiple fine-tuned model checkpoints into a single model using weight-space merging techniques via mergekit, selecting the right merge strategy and validating results with benchmark evaluation.
When to use this skill
Use this skill when:
- Combining two or more fine-tuned models (e.g., code + chat + math specialists) into one checkpoint
- Writing a mergekit YAML config for linear, SLERP, TIES, DARE, or passthrough (frankenmerge) methods
- Performing task arithmetic: computing task vectors (
) and adding/subtracting themfine_tuned - base - Deciding merge strategy based on model similarity, task overlap, and parameter conflict density
- Evaluating whether a merged model retains capabilities from each source
Do not use this skill when
- The task involves training or fine-tuning a model from scratch — use
orfine-tuningpretraining-pipeline - The task is about combining LoRA adapters at inference time (adapter stacking) — not weight merging
- The task is about model architecture design — use
model-architecture - Models have incompatible architectures (different hidden sizes, vocab sizes, or tokenizers)
Operating procedure
- Verify compatibility: All source models must share identical architecture (hidden_size, num_layers, vocab_size). Tokenizer must match or be explicitly handled. Check with
.model.config.to_dict() - Select merge method:
- Linear (
): Weighted average of parameters. Simple, works well for similar models. Config:merge_method: linear
per model.weight: 0.6 - SLERP (
): Spherical interpolation between exactly 2 models. Smoother blending, better for dissimilar fine-tunes. Setmerge_method: slerp
for equal blend.t: 0.5 - TIES (
): Trim small deltas, elect sign by majority, merge. Best when models have conflicting parameter updates. Setmerge_method: ties
to keep top 50% of delta magnitudes.density: 0.5 - DARE (
): Randomly drop delta elements and rescale survivors. Effective for merging many models. Setmerge_method: dare_ties
for aggressive sparsification.density: 0.3 - Task arithmetic: Compute
, thentask_vector = fine_tuned_weights - base_weights
.merged = base + α * task_vector_A + β * task_vector_B - Frankenmerge (
): Interleave layers from different models to create a deeper model. E.g., take layers 0-15 from model A and layers 8-23 from model B to make a 32-layer hybrid.merge_method: passthrough
- Linear (
- Write mergekit config: Create a YAML file specifying
,merge_method
(layer ranges),slices
with paths and weights, andmodels
(density, t).parameters - Execute merge: Run
.mergekit-yaml config.yaml ./output_dir --cuda --trust-remote-code - Validate output: Run perplexity eval on a held-out set. Compare benchmark scores (MMLU, HumanEval, etc.) against source models. Check for catastrophic capability loss.
Decision rules
- Use SLERP for 2-model merges where models are fine-tuned from the same base — it preserves geometry better than linear
- Use TIES or DARE when merging 3+ models or when source models show conflicting parameter updates
- Use linear merge as a baseline; if it works, prefer it for simplicity
- Frankenmerging changes model depth — only use when you want a larger model and accept potential layer-boundary artifacts
- Task arithmetic requires access to the shared base model; without it, fall back to TIES
- If merged model perplexity degrades >10% vs best source, try adjusting weights or switching methods
Output requirements
— Complete mergekit YAML with method, models, weights, slice ranges, and parametersMerge Config
— List of source models with their base, fine-tune domain, and architecture hashSource Model Inventory
— Table of benchmark scores: each source model vs merged modelEvaluation Comparison
— Why the chosen merge method suits these specific modelsMethod Rationale
References
- mergekit library: https://github.com/arcee-ai/mergekit
- Yadav et al., "TIES-Merging: Resolving Interference When Merging Models" (arxiv 2306.01708)
- Yu et al., "Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch" — DARE (arxiv 2311.03099)
- Ilharco et al., "Editing Models with Task Arithmetic" (arxiv 2212.04089)
- Goddard et al., "Arcee's MergeKit: A Toolkit for Merging LLMs" (arxiv 2403.13257)
Related skills
— to verify source models share compatible architecturemodel-architecture
— the upstream process that produces models to be mergedfine-tuning
— merged models may lose alignment; re-evaluate safety post-mergesafety-alignment
— for deploying the merged checkpointserving-architecture
Failure handling
- If architecture mismatch is detected (different
,hidden_size
, ornum_layers
), abort and report which fields differ.vocab_size - If SLERP is requested with more than 2 models, switch to TIES or DARE with a warning.
- If merged model perplexity exceeds 2x the best source model, flag as failed merge and recommend retraining or different method.
- If mergekit OOMs, retry with
and--lazy-unpickle
flags, or merge on CPU with--low-cpu-memory
.--no-cuda