Commonly-used-high-value-skills python-performance

用于 Python 性能优化、内存分析和并发编程最佳实践。来源：skills.sh 12.8K installs。

install

source · Clone the upstream repo

git clone https://github.com/seaworld008/Commonly-used-high-value-skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/seaworld008/Commonly-used-high-value-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/developer-engineering/python-performance" ~/.claude/skills/seaworld008-commonly-used-high-value-skills-python-performance-f1a257 && rm -rf "$T"

manifest: skills/developer-engineering/python-performance/SKILL.md

Python Performance Optimization

Python is an incredibly productive language, but it can be slow if not handled correctly. This skill provides a systematic approach to identifying bottlenecks, optimizing memory usage, and choosing the right concurrency model.

触发条件

应用程序在生产环境中响应缓慢。
系统处理大量数据流时，CPU 或内存占用极高。
后台任务队列堆积，吞吐量无法满足业务需求。
需要针对特定的核心逻辑进行性能压测与调优。
项目正在从原型向高并发、高性能生产环境迁移。

核心能力

1. 性能分析 (Profiling)

在优化之前，必须先进行测量。

cProfile: 使用标准库进行全局函数调用分析，找出耗时最长的函数。
line_profiler: 对单个函数进行行级别的分析，精确定位热点代码行。
Py-spy: 生产环境非侵入式采样分析，生成火焰图。

2. 内存分析与泄漏排查

Memory Profiler: 逐行监测脚本的内存增量。
Objgraph: 可视化内存中的对象引用关系，追踪无法回收的循环引用。
tracemalloc: 标准库提供的内存分配追踪工具，定位泄露源头。

3. asyncio 异步编程

针对 I/O 密集型任务（网络请求、数据库操作、文件读写）。

Event Loop 管理: 理解单线程并发机制，避免在协程中编写阻塞性代码。
Gather/Wait/As_completed: 并发调度多个协程的模式。
第三方库选择: 优先选择 aiohttp, httpx, motor 等原生支持异步的客户端库。

4. 多进程与多线程选型

Multi-threading: 适用于 I/O 密集型，受限于 GIL，无法利用多核 CPU 处理计算任务。
Multi-processing: 适用于 CPU 密集型，通过派生进程绕过 GIL，利用多核计算资源。
Thread/Process Pool Executor: 标准库提供的池化管理，简化并发逻辑。

5. Cython 与 Numba 加速

当纯 Python 无法满足性能要求时。

Numba (@jit): 即时编译器，特别适合含有大量循环的数值计算（NumPy 场景）。
Cython: 将 Python 代码编译为 C/C++ 扩展，显式定义类型，性能可提升数十倍。

6. GIL (Global Interpreter Lock) 理解

原理解析: 理解 GIL 为什么存在及其对并发的影响。
绕过手段: 使用 C 扩展释放 GIL、使用多进程、或将计算任务下沉到 Rust/C++ 编写的底层库（如 Pandas/Polars）。

7. 数据结构选择与算法优化

Built-in Collections: 合理使用
```
dict
```
的哈希查找、
```
set
```
的去重、
```
deque
```
的高效双端队列。
List Comprehensions: 优先于传统的
```
for
```
循环追加。
Generator: 使用生成器处理海量数据，减少内存占用。

常用命令/模板

使用 cProfile 分析

python -m cProfile -s cumtime script.py

使用 Py-spy 生成火焰图

# 采样 30 秒并生成 svg 火焰图
py-spy record -o profile.svg --duration 30 --pid <PID>

Asyncio 并发请求模板

import asyncio
import httpx

async def fetch_url(client, url):
    response = await client.get(url)
    return response.status_code

async def main(urls):
    async with httpx.AsyncClient() as client:
        tasks = [fetch_url(client, url) for url in urls]
        results = await asyncio.gather(*tasks)
        print(f"Finished {len(results)} requests")

if __name__ == "__main__":
    urls = ["https://example.com" for _ in range(100)]
    asyncio.run(main(urls))

Numba 加速数值计算

from numba import jit
import numpy as np

@jit(nopython=True)
def sum_of_squares(arr):
    res = 0
    for i in range(len(arr)):
        res += arr[i] ** 2
    return res

data = np.random.rand(10_000_000)
# 第一次调用会有编译开销，后续飞快
print(sum_of_squares(data))

边界与限制

过度优化: 不要为了极小的性能提升而牺牲代码的可读性和可维护性（Premature optimization is the root of all evil）。
库的局限性: 某些 C 扩展库如果不小心在多线程中使用，可能会导致严重的内存错误或死锁。
GIL 限制: Python 不适合极高频率的锁竞争场景。
冷启动开销: 像 Numba 这种 JIT 工具在第一次运行或编译时会有显著延迟。
硬件局限: 性能调优不能替代由于带宽、磁盘 I/O 或数据库性能不足导致的系统瓶颈。

Generated by Skill Master - Professional Edition