LLMs-Universal-Life-Science-and-Clinical-Skills- Python_Pandas_Best_Practices

<!--

install

source · Clone the upstream repo

git clone https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills-

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/mdbabumiamssm/LLMs-Universal-Life-Science-and-Clinical-Skills- "$T" && mkdir -p ~/.claude/skills && cp -r "$T/Skills/Software_Engineering/Data_Science/Python_Pandas_Best_Practices" ~/.claude/skills/mdbabumiamssm-llms-universal-life-science-and-clinical-skills-python-pandas-best && rm -rf "$T"

manifest: Skills/Software_Engineering/Data_Science/Python_Pandas_Best_Practices/SKILL.md

source content

name: 'pandas-best-practices' description: 'Standards for efficient, readable, and performant data manipulation using Python''s Pandas library.' measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools:

read_file
run_shell_command
write_file

Pandas Best Practices

This skill provides guidelines for working with tabular data in Python. It focuses on vectorization, memory management, and method chaining to write "Modern Pandas" code.

When to Use This Skill

Data Cleaning: Preprocessing clinical or genomic datasets.
Analysis: Performing aggregations, merges, or statistical summaries.
Performance: Optimizing slow-running scripts that process large CSVs/DataFrames.

Core Capabilities

Vectorization: Replacing
```
for
```
loops with vectorized array operations.
Method Chaining: Writing readable, fluent data transformation pipelines.
Memory Optimization: Using appropriate dtypes (Categoricals, Nullable Ints) to reduce RAM usage.
Modern Indexing: Using
```
.loc
```
and
```
.iloc
```
correctly; avoiding
```
SettingWithCopyWarning
```
.

Workflow

Inspect Data: Check
```
df.info()
```
and
```
df.head()
```
.
Define Pipeline: Plan transformations (filter -> group -> aggregate).
Implement Chain: Write the logic as a chain of methods.
Optimize: Check for loops or
```
apply
```
calls that can be vectorized.

Example Usage

User: "Calculate the mean age by patient group, but exclude patients with missing IDs."

Agent Action:

Reads
```
references/rules.md
```
.

Generates:

result = (
    df
    .dropna(subset=['patient_id'])
    .groupby('patient_group')['age']
    .mean()
    .reset_index()
)