SciAgent-Skills cobrapy-metabolic-modeling

Constraint-based reconstruction and analysis (COBRA) of genome-scale metabolic models. FBA, FVA, gene/reaction knockouts, flux sampling, production envelopes, gapfilling, and media optimization. Use for metabolic engineering strain design, essential gene identification, and flux distribution analysis. For kinetic modeling use tellurium; for pathway visualization use Escher.

install

source · Clone the upstream repo

git clone https://github.com/jaechang-hits/SciAgent-Skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/jaechang-hits/SciAgent-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/systems-biology-multiomics/cobrapy-metabolic-modeling" ~/.claude/skills/jaechang-hits-sciagent-skills-cobrapy-metabolic-modeling && rm -rf "$T"

manifest: skills/systems-biology-multiomics/cobrapy-metabolic-modeling/SKILL.md

source content

COBRApy — Constraint-Based Metabolic Modeling

Overview

COBRApy is a Python package for constraint-based reconstruction and analysis (COBRA) of genome-scale metabolic models. It provides flux balance analysis (FBA), flux variability analysis (FVA), gene and reaction knockout screens, flux sampling, production envelopes, gapfilling, and media optimization on SBML-format metabolic networks.

When to Use

Predicting microbial growth rates under different nutrient conditions (FBA)
Identifying essential genes or reactions via single and double knockout screens
Determining flux ranges and alternative optimal solutions (FVA)
Sampling feasible flux distributions to characterize metabolic flexibility
Designing minimal growth media or optimizing carbon sources
Computing production envelopes for metabolic engineering targets
Gapfilling incomplete draft models using a universal reaction database
For kinetic modeling or dynamic ODE-based models, use Tellurium instead
For pathway visualization on metabolic maps, use Escher instead

Prerequisites

Python packages:
```
cobra
```
(includes GLPK solver),
```
numpy
```
,
```
pandas
```
Optional solvers: CPLEX or Gurobi (faster for large models, require license)
Data: SBML (.xml), JSON, or YAML metabolic model files; available from BiGG Models, AGORA, or ModelSEED

pip install cobra

Quick Start

from cobra.io import load_model

model = load_model("textbook")  # E. coli core model
print(f"Model: {model.id} — {len(model.reactions)} rxns, {len(model.metabolites)} mets, {len(model.genes)} genes")

solution = model.optimize()
print(f"Growth rate: {solution.objective_value:.4f} /h")
print(f"Status: {solution.status}")
# Model: e_coli_core — 95 rxns, 72 mets, 137 genes
# Growth rate: 0.8739 /h
# Status: optimal

Core API

1. Model I/O

Load bundled models and read/write standard formats.

from cobra.io import load_model, read_sbml_model, write_sbml_model, load_json_model, save_json_model

# Bundled: "textbook" (95 rxns), "ecoli" (2583 rxns), "salmonella"
model = load_model("textbook")
# model = read_sbml_model("my_model.xml")   # from SBML file
# model = load_json_model("my_model.json")  # from JSON file

write_sbml_model(model, "output_model.xml")
save_json_model(model, "output_model.json")
print(f"Saved model: {model.id}")

2. Model Structure and Components

Access reactions, metabolites, and genes via DictList containers.

from cobra.io import load_model
model = load_model("textbook")

# Inspect a reaction
rxn = model.reactions.get_by_id("PFK")
print(f"Reaction: {rxn.id} — {rxn.name}")
print(f"Equation: {rxn.reaction}")
print(f"Bounds: {rxn.bounds}, GPR: {rxn.gene_reaction_rule}")

# Inspect a metabolite
met = model.metabolites.get_by_id("atp_c")
print(f"Metabolite: {met.id}, Formula: {met.formula}, Compartment: {met.compartment}")

# Query and list exchange reactions
atp_rxns = model.reactions.query("atp", attribute="name")
print(f"ATP-related reactions: {len(atp_rxns)}, Exchange reactions: {len(model.exchanges)}")

3. Flux Balance Analysis (FBA)

Predict optimal flux distributions by maximizing an objective.

from cobra.io import load_model
from cobra.flux_analysis import pfba

model = load_model("textbook")

# Standard FBA
solution = model.optimize()
print(f"Growth: {solution.objective_value:.4f} /h, Active fluxes: {(solution.fluxes.abs() > 1e-6).sum()}")

# Parsimonious FBA — same growth, minimal total flux
pfba_sol = pfba(model)
print(f"pFBA total flux: {pfba_sol.fluxes.abs().sum():.1f} vs standard: {solution.fluxes.abs().sum():.1f}")

# Change objective; slim_optimize for speed
from cobra.io import load_model
model = load_model("textbook")

with model:
    model.objective = "ATPM"
    print(f"Max ATPM flux: {model.optimize().objective_value:.2f}")

print(f"Growth (slim): {model.slim_optimize():.4f}")  # no flux vector, faster

4. Flux Variability Analysis (FVA)

Determine feasible flux ranges at or near optimality.

from cobra.io import load_model
from cobra.flux_analysis import flux_variability_analysis

model = load_model("textbook")

fva = flux_variability_analysis(model, fraction_of_optimum=1.0)
fva_90 = flux_variability_analysis(model, fraction_of_optimum=0.9)
fva["range"] = fva["maximum"] - fva["minimum"]
fva_90["range"] = fva_90["maximum"] - fva_90["minimum"]
print(f"Mean range at 100%: {fva['range'].mean():.2f}, at 90%: {fva_90['range'].mean():.2f}")

# Loopless FVA on specific reactions
from cobra.io import load_model
from cobra.flux_analysis import flux_variability_analysis

model = load_model("textbook")
fva_ll = flux_variability_analysis(
    model, loopless=True, reaction_list=["PFK", "PGI", "FBA", "TPI", "GAPD"],
)
print(fva_ll)

5. Gene and Reaction Deletions

Screen for essential genes/reactions via knockout simulations.

from cobra.io import load_model
from cobra.flux_analysis import single_gene_deletion, double_gene_deletion

model = load_model("textbook")
wt_growth = model.slim_optimize()

# Single gene deletions
gene_results = single_gene_deletion(model)
gene_results["growth_fraction"] = gene_results["growth"] / wt_growth
essential = gene_results[gene_results["growth_fraction"] < 0.01]
print(f"Essential genes: {len(essential)} / {len(model.genes)}")

# Double deletions (synthetic lethality) — use multiprocessing
double_results = double_gene_deletion(model, processes=4)
print(f"Double deletion results: {double_results.shape}")

6. Growth Media and Minimal Media

Modify nutrient availability and compute minimal media.

from cobra.io import load_model
from cobra.medium import minimal_medium

model = load_model("textbook")

# View current medium
for rxn_id, flux in sorted(model.medium.items()):
    print(f"  {rxn_id}: {flux}")

# Anaerobic switch via context manager
with model:
    medium = model.medium
    medium["EX_o2_e"] = 0.0
    model.medium = medium
    print(f"Anaerobic growth: {model.slim_optimize():.4f} /h")

# Minimal medium
min_med = minimal_medium(model, minimize_components=True, open_exchanges=True)
print(f"Minimal medium: {len(min_med)} components")

7. Flux Sampling

Sample feasible flux distributions for variability analysis.

from cobra.io import load_model
from cobra.sampling import sample

model = load_model("textbook")
samples = sample(model, n=500, method="optgp")
print(f"Samples shape: {samples.shape}")  # (500, n_reactions)
print(f"PFK flux: mean={samples['PFK'].mean():.2f}, std={samples['PFK'].std():.2f}")

8. Production Envelopes and Gapfilling

Compute phenotype phase planes and fill model gaps.

from cobra.io import load_model
from cobra.flux_analysis import production_envelope

model = load_model("textbook")
envelope = production_envelope(
    model,
    reactions=model.reactions.get_by_id("EX_ac_e"),
    carbon_sources=model.reactions.get_by_id("EX_glc__D_e"),
)
print(f"Envelope: {len(envelope)} points")
print(envelope[["flux_minimum", "flux_maximum", "carbon_yield_minimum", "carbon_yield_maximum"]].head())

# Gapfilling: restore growth after reaction removal
from cobra.io import load_model
from cobra.flux_analysis.gapfilling import gapfill

model = load_model("textbook")
universal = load_model("textbook")  # In practice, use a universal reaction DB
with model:
    model.remove_reactions([model.reactions.get_by_id("PFK")])
    print(f"Growth after removing PFK: {model.slim_optimize():.4f}")
    for rxn in gapfill(model, universal)[0]:
        print(f"  Gapfill suggests: {rxn.id}")

Key Concepts

DictList Objects

Reactions, metabolites, and genes are stored in

DictList

— ordered, indexable, and accessible by ID.

rxn = model.reactions[0]                    # by index
rxn = model.reactions.get_by_id("PFK")      # by ID
matches = model.reactions.query("phospho")  # keyword search

Exchange Reactions

EX_

prefix reactions represent system boundary. Positive flux = secretion; negative = uptake. Managed via

model.medium

dict.

Gene-Reaction Rules (GPR)

Boolean expressions linking genes to reactions:

(b0726 and b0727) or b1234

. Gene knockout propagates through GPR logic.

Context Managers

with model:

snapshots state and reverts all changes on exit (bounds, objective, medium, knockouts). Nesting supported.

Common Workflows

Workflow 1: Gene Knockout Screen

Goal: Identify essential, growth-reducing, and neutral genes.

from cobra.io import load_model
from cobra.flux_analysis import single_gene_deletion

model = load_model("textbook")
wt_growth = model.slim_optimize()

results = single_gene_deletion(model)
results["growth_fraction"] = results["growth"] / wt_growth

essential = results[results["growth_fraction"] < 0.01]
reduced = results[(results["growth_fraction"] >= 0.01) & (results["growth_fraction"] < 0.9)]
neutral = results[results["growth_fraction"] >= 0.9]

print(f"Essential: {len(essential)}, Reduced: {len(reduced)}, Neutral: {len(neutral)}")
for idx in essential.index:
    print(f"  Essential gene: {list(idx)[0]}")

Workflow 2: Media Optimization

Goal: Find minimal medium at different growth targets; compare aerobic vs anaerobic.

from cobra.io import load_model
from cobra.medium import minimal_medium
import pandas as pd

model = load_model("textbook")
results = []
for frac in [0.1, 0.5, 0.8, 1.0]:
    with model:
        target = model.slim_optimize() * frac
        model.reactions.get_by_id("Biomass_Ecoli_core").lower_bound = target
        try:
            mm = minimal_medium(model, minimize_components=True, open_exchanges=True)
            results.append({"growth_frac": frac, "n_components": len(mm)})
        except Exception:
            results.append({"growth_frac": frac, "n_components": None})
print(pd.DataFrame(results).to_string(index=False))

for label, o2 in [("Aerobic", 1000.0), ("Anaerobic", 0.0)]:
    with model:
        medium = model.medium
        medium["EX_o2_e"] = o2
        model.medium = medium
        print(f"{label} growth: {model.slim_optimize():.4f} /h")

Workflow 3: Production Strain Design

Goal: Design a strain with maximized target metabolite production. Combines modules 3, 5, and 8.

Load model and compute wild-type production envelope for target metabolite
Screen single gene knockouts for overproducers (filter where target secretion increases)
Combine top knockouts and validate with FBA under production conditions
Gapfill if knockouts create infeasibilities
Compute final production envelope and compare to wild-type

Key Parameters

Parameter	Module/Function	Default	Range / Options	Effect
`fraction_of_optimum`	`flux_variability_analysis`	`1.0`	`0.0` - `1.0`	Fraction of max objective to maintain; lower = wider flux ranges
`loopless`	`flux_variability_analysis`	`False`	`True` , `False`	Eliminate thermodynamically infeasible loops; slower
`method`	`sample`	`"optgp"`	`"optgp"` , `"achr"`	Sampling algorithm; optgp supports parallelism
`n`	`sample`	required	`100` - `10000`	Number of flux samples to draw
`processes`	`sample` , `double_gene_deletion`	`1`	`1` - `N_cores`	Parallel worker processes
`minimize_components`	`minimal_medium`	`False`	`True` , `False`	True = fewest nutrients (MILP); False = minimize total flux
`open_exchanges`	`minimal_medium`	`False`	`True` , `False`	Allow all exchanges as nutrient candidates
`carbon_sources`	`production_envelope`	`None`	Reaction object	Compute carbon yield alongside flux envelope
`thinning`	`sample`	`100`	`1` - `1000`	Steps between kept samples; higher = less correlated

Best Practices

Use context managers for temporary changes:

with model:

reverts all modifications on exit.

with model:
    model.reactions.PFK.knock_out()
    print(model.slim_optimize())  # modified
# model is restored here

Validate with
```
slim_optimize()
```
before analysis: Quick feasibility check before expensive operations (FVA, sampling).
Check
```
solution.status
```
after optimization: Always verify
```
"optimal"
```
before interpreting fluxes.
Use loopless FVA when thermodynamic feasibility matters: Standard FVA can include infeasible internal cycles that inflate flux ranges.
Parallelize expensive operations: Sampling and double deletions support
```
processes
```
parameter.
Prefer SBML for model exchange: Community standard supported by all COBRA tools.
Use
```
slim_optimize()
```
in loops: Skips full flux vector construction, significantly faster for screening.
Validate flux samples: Use
```
sampler.validate(samples)
```
to check stoichiometric and bound constraints.

Common Recipes

Recipe: Parameter Scan (Glucose Uptake Rate)

from cobra.io import load_model

model = load_model("textbook")
print("Glucose_uptake | Growth_rate")
for glc_uptake in [1, 2, 5, 10, 15, 20]:
    with model:
        model.reactions.get_by_id("EX_glc__D_e").lower_bound = -glc_uptake
        growth = model.slim_optimize()
        print(f"  {glc_uptake:>13} | {growth:.4f}")

Recipe: Batch Condition Analysis

from cobra.io import load_model
import pandas as pd

model = load_model("textbook")
conditions = [
    {"name": "Rich aerobic", "EX_o2_e": 1000, "EX_glc__D_e": 10},
    {"name": "Anaerobic", "EX_o2_e": 0, "EX_glc__D_e": 10},
    {"name": "Low glucose", "EX_o2_e": 1000, "EX_glc__D_e": 1},
]
results = []
for c in conditions:
    with model:
        medium = model.medium
        medium["EX_o2_e"], medium["EX_glc__D_e"] = c["EX_o2_e"], c["EX_glc__D_e"]
        model.medium = medium
        results.append({"condition": c["name"], "growth": round(model.slim_optimize(), 4)})
print(pd.DataFrame(results).to_string(index=False))

Recipe: Model Validation Checklist

from cobra.io import load_model
from cobra.flux_analysis import find_blocked_reactions

model = load_model("textbook")

# Feasibility, mass balance, dead-ends, blocked reactions
print(f"Growth feasible: {model.slim_optimize() > 0}")
print(f"Missing formula: {sum(1 for m in model.metabolites if m.formula is None)}")
print(f"Dead-end metabolites: {sum(1 for m in model.metabolites if len(m.reactions) == 1)}")
print(f"Blocked reactions: {len(find_blocked_reactions(model))} / {len(model.reactions)}")

Troubleshooting

Problem	Cause	Solution
`solution.status == "infeasible"`	Constraints cannot be simultaneously satisfied	Check medium has required nutrients; verify reaction bounds; use `model.medium` to restore defaults
`solution.status == "unbounded"`	No upper bound on fluxes	Set finite upper bounds on exchange reactions
Very slow optimization	Large model + default GLPK solver	Install CPLEX or Gurobi: `model.solver = "cplex"`
`ValueError` setting bounds	`lower_bound > upper_bound` temporarily	Set as tuple: `rxn.bounds = (new_lb, new_ub)`
Gene deletion returns NaN	Knockout makes model infeasible	Expected for essential genes; classify as essential
`IOError` reading SBML	Invalid SBML or missing namespace	Validate at sbml.org; try `cobra.io.sbml.validate_sbml_model(path)`
Flux samples fail validation	Numerical solver tolerance	Increase `thinning` parameter; try `method="achr"`

Bundled Resources

1 reference file:

```
references/api_workflows.md
```
— Consolidates API quick reference and advanced workflows. Covers: detailed function signatures, solver configuration (GLPK/CPLEX/Gurobi), advanced analysis (
```
find_blocked_reactions
```
,
```
find_essential_genes
```
/
```
find_essential_reactions
```
), model manipulation (adding reactions/metabolites/genes), flux sample validation, and 5 workflow examples (knockout with visualization, media design, flux space exploration, production strain design, model validation). Relocated inline: basic FBA/FVA/deletion/sampling (Core API modules 3-7). Omitted: geometric FBA internals, MIP gap configuration — consult COBRApy docs.

Related Skills

escher — interactive metabolic map visualization; use JSON models from COBRApy
bioservices — retrieve models from BiGG, BioModels, or KEGG for import into COBRApy
networkx — metabolic network topology analysis; export stoichiometry matrix as graph

References

COBRApy documentation — official API reference and tutorials
BiGG Models — curated genome-scale metabolic model repository
COBRApy GitHub — source code and issue tracker
Ebrahim et al. (2013) "COBRApy: COnstraints-Based Reconstruction and Analysis for Python" — BMC Systems Biology