Skilllibrary model-merging

Merge multiple fine-tuned LLM checkpoints using mergekit with methods like linear interpolation, SLERP, TIES, DARE, task arithmetic, and frankenmerging. Use when combining specialized model capabilities without retraining — e.g., merging a code model with a chat model. Do not use for training, LoRA adapter composition, or inference serving.

install

source · Clone the upstream repo

git clone https://github.com/merceralex397-collab/skilllibrary

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/merceralex397-collab/skilllibrary "$T" && mkdir -p ~/.claude/skills && cp -r "$T/12-ai-llm-training-architecture-and-research/model-merging" ~/.claude/skills/merceralex397-collab-skilllibrary-model-merging && rm -rf "$T"

manifest: 12-ai-llm-training-architecture-and-research/model-merging/SKILL.md

source content

Purpose

Combine multiple fine-tuned model checkpoints into a single model using weight-space merging techniques via mergekit, selecting the right merge strategy and validating results with benchmark evaluation.

When to use this skill

Use this skill when:

Combining two or more fine-tuned models (e.g., code + chat + math specialists) into one checkpoint
Writing a mergekit YAML config for linear, SLERP, TIES, DARE, or passthrough (frankenmerge) methods
Performing task arithmetic: computing task vectors (
```
fine_tuned - base
```
) and adding/subtracting them
Deciding merge strategy based on model similarity, task overlap, and parameter conflict density
Evaluating whether a merged model retains capabilities from each source

Do not use this skill when

The task involves training or fine-tuning a model from scratch — use
```
fine-tuning
```
or
```
pretraining-pipeline
```
The task is about combining LoRA adapters at inference time (adapter stacking) — not weight merging
The task is about model architecture design — use
```
model-architecture
```
Models have incompatible architectures (different hidden sizes, vocab sizes, or tokenizers)

Operating procedure

Verify compatibility: All source models must share identical architecture (hidden_size, num_layers, vocab_size). Tokenizer must match or be explicitly handled. Check with
```
model.config.to_dict()
```
.
Select merge method:
- Linear (
```
merge_method: linear
```
  ): Weighted average of parameters. Simple, works well for similar models. Config:
```
weight: 0.6
```
  per model.
- SLERP (
```
merge_method: slerp
```
  ): Spherical interpolation between exactly 2 models. Smoother blending, better for dissimilar fine-tunes. Set
```
t: 0.5
```
  for equal blend.
- TIES (
```
merge_method: ties
```
  ): Trim small deltas, elect sign by majority, merge. Best when models have conflicting parameter updates. Set
```
density: 0.5
```
  to keep top 50% of delta magnitudes.
- DARE (
```
merge_method: dare_ties
```
  ): Randomly drop delta elements and rescale survivors. Effective for merging many models. Set
```
density: 0.3
```
  for aggressive sparsification.
- Task arithmetic: Compute
```
task_vector = fine_tuned_weights - base_weights
```
  , then
```
merged = base + α * task_vector_A + β * task_vector_B
```
  .
- Frankenmerge (
```
merge_method: passthrough
```
  ): Interleave layers from different models to create a deeper model. E.g., take layers 0-15 from model A and layers 8-23 from model B to make a 32-layer hybrid.
Write mergekit config: Create a YAML file specifying
```
merge_method
```
,
```
slices
```
(layer ranges),
```
models
```
with paths and weights, and
```
parameters
```
(density, t).

Execute merge: Run

mergekit-yaml config.yaml ./output_dir --cuda --trust-remote-code

Validate output: Run perplexity eval on a held-out set. Compare benchmark scores (MMLU, HumanEval, etc.) against source models. Check for catastrophic capability loss.

Decision rules

Use SLERP for 2-model merges where models are fine-tuned from the same base — it preserves geometry better than linear
Use TIES or DARE when merging 3+ models or when source models show conflicting parameter updates
Use linear merge as a baseline; if it works, prefer it for simplicity
Frankenmerging changes model depth — only use when you want a larger model and accept potential layer-boundary artifacts
Task arithmetic requires access to the shared base model; without it, fall back to TIES
If merged model perplexity degrades >10% vs best source, try adjusting weights or switching methods

Output requirements

```
Merge Config
```
— Complete mergekit YAML with method, models, weights, slice ranges, and parameters
```
Source Model Inventory
```
— List of source models with their base, fine-tune domain, and architecture hash
```
Evaluation Comparison
```
— Table of benchmark scores: each source model vs merged model
```
Method Rationale
```
— Why the chosen merge method suits these specific models

References

mergekit library: https://github.com/arcee-ai/mergekit
Yadav et al., "TIES-Merging: Resolving Interference When Merging Models" (arxiv 2306.01708)
Yu et al., "Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch" — DARE (arxiv 2311.03099)
Ilharco et al., "Editing Models with Task Arithmetic" (arxiv 2212.04089)
Goddard et al., "Arcee's MergeKit: A Toolkit for Merging LLMs" (arxiv 2403.13257)

Related skills

```
model-architecture
```
— to verify source models share compatible architecture
```
fine-tuning
```
— the upstream process that produces models to be merged
```
safety-alignment
```
— merged models may lose alignment; re-evaluate safety post-merge
```
serving-architecture
```
— for deploying the merged checkpoint

Failure handling

If architecture mismatch is detected (different
```
hidden_size
```
,
```
num_layers
```
, or
```
vocab_size
```
), abort and report which fields differ.
If SLERP is requested with more than 2 models, switch to TIES or DARE with a warning.
If merged model perplexity exceeds 2x the best source model, flag as failed merge and recommend retraining or different method.
If mergekit OOMs, retry with
```
--lazy-unpickle
```
and
```
--low-cpu-memory
```
flags, or merge on CPU with
```
--no-cuda
```
.