Learn-skills.dev model-fallback

模型自动降级与故障切换。当主模型请求失败、超时、达到速率限制或配额耗尽时，自动切换到备用模型，确保服务连续性。支持多供应商、多优先级的智能模型选择，提供健康监控、自动重试和错误恢复机制。

install

source · Clone the upstream repo

git clone https://github.com/NeverSight/learn-skills.dev

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/NeverSight/learn-skills.dev "$T" && mkdir -p ~/.claude/skills && cp -r "$T/data/skills-md/aaaaqwq/claude-code-skills/model-fallback" ~/.claude/skills/neversight-learn-skills-dev-model-fallback-661dbd && rm -rf "$T"

manifest: data/skills-md/aaaaqwq/claude-code-skills/model-fallback/SKILL.md

source content

模型自动降级与故障切换

概述

本技能提供完整的模型自动降级和故障切换解决方案，确保在主模型不可用时自动切换到备用模型，维持 AI 助手的持续服务能力。

核心功能

1. 智能模型选择

根据任务类型和模型状态智能选择最合适的模型：

优先级顺序:
  1. anapi/opus-4.5         # 最强能力，最高优先级
  2. zai/glm-4.7            # 中文优化，性价比高
  3. openrouter-vip/gpt-5.2-codex  # 编码专用
  4. github-copilot/claude-sonnet-4-5  # 免费备用

2. 故障检测与处理

自动检测以下故障并触发切换：

超时错误 (timeout): 请求超过设定时间
速率限制 (rate_limit): API 调用频率超限
配额耗尽 (quota_exceeded): token 配额用尽
认证错误 (authentication): API 密钥失效
服务不可用 (service_unavailable): 供应商服务故障
网络错误 (network_error): 连接失败

3. 重试机制

智能重试策略：

重试配置:
  最大重试次数: 3
  初始延迟: 1000ms
  最大延迟: 10000ms
  退避倍数: 2.0
  使用备用模型: true

重试时间线：

第1次失败 → 等待 1s → 重试
第2次失败 → 等待 2s → 重试
第3次失败 → 等待 4s → 切换模型

4. 健康监控

持续监控所有模型供应商的健康状态：

# 每5分钟检查一次
检查项目:
  - API 端点连通性
  - 响应时间
  - 错误率
  - 配额使用情况

快速开始

1. 配置模型降级

配置文件：

~/.openclaw/agents/main/agent/agent.json

{
  "model": "anapi/opus-4.5",
  "modelFallback": [
    "zai/glm-4.7",
    "openrouter-vip/gpt-5.2-codex",
    "github-copilot/claude-sonnet-4-5"
  ],
  "retry": {
    "maxAttempts": 3,
    "initialDelayMs": 1000,
    "maxDelayMs": 10000,
    "backoffMultiplier": 2.0,
    "useFallbackOnFailure": true
  }
}

2. 启动监控

# 启动后台监控
~/.openclaw/scripts/monitor-models.sh start

# 查看监控状态
~/.openclaw/scripts/monitor-models.sh status

# 停止监控
~/.openclaw/scripts/monitor-models.sh stop

3. 手动触发切换

# 运行降级检查脚本
~/.openclaw/scripts/model-fallback.sh

工作流程

正常请求流程

用户请求
    ↓
选择主模型 (anapi/opus-4.5)
    ↓
发送请求到 API
    ↓
成功 → 返回结果

故障切换流程

用户请求
    ↓
选择当前模型
    ↓
发送请求 → 失败/超时
    ↓
重试 (最多3次)
    ↓
仍然失败?
    ↓
切换到备用模型 (zai/glm-4.7)
    ↓
发送请求
    ↓
成功 → 返回结果并记录

监控流程

监控守护进程 (每5分钟)
    ↓
检查所有模型健康状态
    ↓
├─ 主模型健康 → 保持当前
└─ 主模型不健康 → 切换到最佳可用模型
    ↓
更新状态文件
    ↓
记录日志

错误处理策略

1. 超时错误 (Timeout)

{
  "timeout": {
    "switchModel": true,
    "retryCount": 2,
    "timeoutMs": 60000,
    "fallbackTo": "zai/glm-4.7"
  }
}

行为：

超过 60 秒视为超时
重试 2 次后仍超时则切换模型

2. 速率限制 (Rate Limit)

{
  "rateLimit": {
    "switchModel": true,
    "cooldownMs": 60000,
    "alert": true,
    "fallbackTo": "zai/glm-4.7"
  }
}

行为：

收到 429 错误码
冷却 60 秒后尝试恢复
立即切换到备用模型

3. 配额耗尽 (Quota Exceeded)

{
  "quotaExceeded": {
    "switchModel": true,
    "alert": true,
    "fallbackTo": "zai/glm-4.7",
    "checkInterval": 3600000
  }
}

行为：

配额用尽时切换模型
每小时检查一次主模型是否恢复
发送告警通知

4. 认证错误 (Authentication)

{
  "authenticationError": {
    "switchModel": true,
    "alert": true,
    "disableModel": true
  }
}

行为：

API 密钥失效时立即切换
禁用故障模型（不自动恢复）
发送紧急告警

智能路由规则

根据任务类型自动选择最合适的模型：

{
  "routing": {
    "strategy": "priority-fallback",
    "rules": [
      {
        "name": "coding-task",
        "match": {
          "contentContains": ["代码", "code", "编程", "函数"]
        },
        "preferModels": [
          "openrouter-vip/gpt-5.2-codex",
          "anapi/opus-4.5"
        ]
      },
      {
        "name": "chinese-task",
        "match": {
          "language": "zh"
        },
        "preferModels": [
          "zai/glm-4.7",
          "anapi/opus-4.5"
        ]
      },
      {
        "name": "vision-task",
        "match": {
          "hasImage": true
        },
        "preferModels": [
          "anapi/opus-4.5"
        ]
      }
    ]
  }
}

监控与日志

日志文件位置

~/.openclaw/logs/model-fallback.log    # 切换日志
~/.openclaw/logs/model-monitor.log     # 监控日志
~/.openclaw/logs/model-status.json     # 状态报告

查看实时日志

# 查看切换日志
tail -f ~/.openclaw/logs/model-fallback.log

# 查看监控日志
tail -f ~/.openclaw/logs/model-monitor.log

# 查看所有日志
tail -f ~/.openclaw/logs/*.log

状态报告

# 查看当前状态
~/.openclaw/scripts/monitor-models.sh status

# JSON 格式状态
cat ~/.openclaw/logs/model-status.json | python3 -m json.tool

脚本说明

model-fallback.sh

模型降级切换脚本，负责：

测试所有配置的模型
选择最佳可用模型
执行模型切换
记录切换日志

用法：

~/.openclaw/scripts/model-fallback.sh

monitor-models.sh

健康监控守护进程，负责：

定期检查模型健康状态
自动触发故障切换
生成状态报告
管理 PID 文件

用法：

~/.openclaw/scripts/monitor-models.sh {start|stop|restart|status|check}

test-model-fallback.sh

测试脚本，用于：

验证配置正确性
测试切换逻辑
模拟故障场景
生成测试报告

用法：

~/clawd/scripts/test-model-fallback.sh

配置文件详解

agent.json 完整配置

{
  "model": "anapi/opus-4.5",
  "modelFallback": [
    "zai/glm-4.7",
    "openrouter-vip/gpt-5.2-codex",
    "github-copilot/claude-sonnet-4-5"
  ],
  "retry": {
    "maxAttempts": 3,
    "initialDelayMs": 1000,
    "maxDelayMs": 10000,
    "backoffMultiplier": 2.0,
    "useFallbackOnFailure": true
  },
  "errorHandling": {
    "rateLimit": {
      "switchModel": true,
      "cooldownMs": 60000,
      "alert": true
    },
    "timeout": {
      "switchModel": true,
      "retryCount": 2,
      "timeoutMs": 60000
    },
    "quotaExceeded": {
      "switchModel": true,
      "alert": true,
      "fallbackTo": "zai/glm-4.7"
    },
    "authenticationError": {
      "switchModel": true,
      "alert": true,
      "disableModel": true
    }
  },
  "models": {
    "anapi/opus-4.5": {
      "provider": "anapi",
      "alias": "opus45",
      "maxTokens": 200000,
      "timeoutMs": 60000,
      "priority": 1,
      "supports": ["vision", "tools", "long-context"],
      "costFactor": "high"
    },
    "zai/glm-4.7": {
      "provider": "zai",
      "alias": "zai47",
      "maxTokens": 200000,
      "timeoutMs": 60000,
      "priority": 2,
      "supports": ["tools", "long-context"],
      "costFactor": "medium",
      "bestFor": ["chinese", "general-purpose"]
    },
    "openrouter-vip/gpt-5.2-codex": {
      "provider": "openrouter-vip",
      "alias": "codex52",
      "maxTokens": 100000,
      "timeoutMs": 30000,
      "priority": 3,
      "supports": ["coding"],
      "costFactor": "low",
      "bestFor": ["coding", "code-generation"]
    },
    "github-copilot/claude-sonnet-4-5": {
      "provider": "github-copilot",
      "alias": "sonnet",
      "maxTokens": 200000,
      "timeoutMs": 60000,
      "priority": 4,
      "supports": ["tools", "long-context"],
      "costFactor": "free",
      "bestFor": ["fallback", "general-purpose"]
    }
  },
  "monitoring": {
    "enabled": true,
    "checkIntervalMs": 300000,
    "logFile": "$HOME/.openclaw/logs/model-fallback.log",
    "alertOnFailure": true
  }
}

故障排查

问题 1: 模型未自动切换

检查：

# 查看配置文件
cat ~/.openclaw/agents/main/agent/agent.json | grep modelFallback

# 查看日志
tail -20 ~/.openclaw/logs/model-fallback.log

# 手动运行切换脚本
~/.openclaw/scripts/model-fallback.sh

问题 2: 监控未运行

检查：

# 查看进程
ps aux | grep monitor-models

# 查看PID文件
cat ~/.openclaw/logs/model-monitor.pid

# 重启监控
~/.openclaw/scripts/monitor-models.sh restart

问题 3: 所有模型都不可用

检查：

# 查看状态报告
~/.openclaw/scripts/monitor-models.sh status

# 检查 API 密钥
cat ~/.openclaw/agents/main/agent/auth-profiles.json

# 测试网络连接
ping -c 3 anapi.9w7.cn
ping -c 3 open.bigmodel.cn

性能优化

减少切换频率

{
  "retry": {
    "maxAttempts": 5,        // 增加重试次数
    "initialDelayMs": 2000   // 增加初始延迟
  }
}

优化响应时间

为不同任务选择最快的模型：

{
  "routing": {
    "rules": [
      {
        "name": "quick-response",
        "match": {
          "priority": "speed"
        },
        "preferModels": [
          "github-copilot/claude-sonnet-4-5",  // 通常响应最快
          "zai/glm-4.7"
        ]
      }
    ]
  }
}

集成到 OpenClaw

配置完成后，模型降级功能会自动集成到 OpenClaw Gateway：

自动重启 Gateway:

openclaw gateway restart

验证配置:

openclaw status | grep Model

查看日志:

journalctl -u openclaw-gateway -f | grep model

最佳实践

1. 定期检查

每周运行一次全面检查：

~/clawd/scripts/test-model-fallback.sh

2. 监控日志

每天查看切换日志：

grep "切换模型" ~/.openclaw/logs/model-fallback.log | tail -10

3. 更新配置

当添加新模型时，更新

agent.json

：

{
  "modelFallback": [
    "anapi/opus-4.5",
    "zai/glm-4.7",
    "new-model-here",  // 新模型
    "github-copilot/claude-sonnet-4-5"
  ]
}

4. 备份配置

定期备份配置文件：

cp ~/.openclaw/agents/main/agent/agent.json \
   ~/.openclaw/agents/main/agent/agent.json.backup

支持的模型

当前配置的模型及其特性：

模型	供应商	优先级	最大Token	特长
opus-4.5	anapi	1	200k	最强能力，视觉
glm-4.7	zai	2	200k	中文优化
gpt-5.2-codex	openrouter-vip	3	100k	编码专用
sonnet-4.5	github-copilot	4	200k	免费备用

TODO

添加更多供应商
实现基于成本的模型选择
添加模型性能指标收集
实现预测性模型切换
集成告警通知（Telegram/邮件）
添加 WebUI 监控面板