SciAgent-Skills networkx-graph-analysis

Graph and network analysis toolkit: create, manipulate, and analyze complex networks. Four graph types (directed, undirected, multi-edge), centrality measures, shortest paths, community detection, graph generators, I/O (GraphML, GML, edge list, pandas, NumPy), visualization with matplotlib. For large-scale graphs (100K+ nodes) use igraph or graph-tool; for graph neural networks use PyG.

install

source · Clone the upstream repo

git clone https://github.com/jaechang-hits/SciAgent-Skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/jaechang-hits/SciAgent-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/scientific-computing/networkx-graph-analysis" ~/.claude/skills/jaechang-hits-sciagent-skills-networkx-graph-analysis && rm -rf "$T"

manifest: skills/scientific-computing/networkx-graph-analysis/SKILL.md

source content

NetworkX Graph Analysis

Overview

NetworkX is a Python library for creating, manipulating, and analyzing complex networks and graphs. It provides data structures for undirected, directed, and multi-edge graphs along with a comprehensive collection of graph algorithms, generators, and I/O utilities. Use NetworkX when working with relationship data in social networks, biological interaction networks, transportation systems, citation graphs, or any domain involving pairwise entity relationships.

When to Use

Analyzing protein-protein interaction networks, gene regulatory networks, or metabolic pathways
Computing centrality measures (degree, betweenness, PageRank) to identify important nodes
Finding shortest paths or optimal routes in transportation or communication networks
Detecting communities or clusters in social networks or co-expression data
Generating synthetic networks (scale-free, small-world, random) for simulation or null models
Reading and writing graph data in standard formats (GraphML, GML, edge lists, JSON)
Visualizing network topology with node/edge attribute mapping
Checking graph properties: connectivity, planarity, isomorphism, DAG structure
For large-scale graphs (100K+ nodes) where speed is critical, use
```
igraph
```
or
```
graph-tool
```
instead
For billion-edge graphs or GPU-accelerated analytics, use
```
graph-tool
```
with OpenMP or
```
cuGraph
```
For graph neural networks and deep learning on graphs, use
```
torch-geometric-graph-neural-networks
```

Prerequisites

Python packages:
```
networkx
```
,
```
matplotlib
```
,
```
scipy
```
,
```
pandas
```
,
```
numpy
```
Optional:
```
pydot
```
or
```
pygraphviz
```
(Graphviz layouts)

pip install networkx matplotlib scipy pandas numpy

Quick Start

import networkx as nx

# Create a graph and add edges with weights
G = nx.karate_club_graph()
print(f"Nodes: {G.number_of_nodes()}, Edges: {G.number_of_edges()}")
# Nodes: 34, Edges: 78

# Compute centrality and find most central node
bc = nx.betweenness_centrality(G)
top_node = max(bc, key=bc.get)
print(f"Most central node: {top_node}, betweenness: {bc[top_node]:.3f}")

# Detect communities
from networkx.algorithms import community
comms = community.greedy_modularity_communities(G)
print(f"Communities found: {len(comms)}")

Core API

Module 1: Graph Creation and Types

import networkx as nx

# Undirected graph (most common)
G = nx.Graph()
G.add_node("protein_A", type="kinase", weight=1.5)
G.add_nodes_from(["protein_B", "protein_C"])
G.add_edge("protein_A", "protein_B", weight=0.9, interaction="phosphorylation")
G.add_edges_from([("protein_B", "protein_C"), ("protein_A", "protein_C")])
print(f"Nodes: {G.number_of_nodes()}, Edges: {G.number_of_edges()}")
# Nodes: 3, Edges: 3

# Directed graph (gene regulation, citations)
D = nx.DiGraph()
D.add_edges_from([("TF1", "geneA"), ("TF1", "geneB"), ("TF2", "geneA")])
print(f"TF1 out-degree: {D.out_degree('TF1')}")  # 2

# MultiGraph (multiple relationship types between same nodes)
M = nx.MultiGraph()
M.add_edge("A", "B", key="binding", affinity=0.8)
M.add_edge("A", "B", key="regulation", effect="inhibition")
print(f"Edges between A-B: {M.number_of_edges('A', 'B')}")  # 2

Module 2: Node and Edge Operations

import networkx as nx
G = nx.karate_club_graph()

# Query structure
print(f"Degree of node 0: {G.degree(0)}")
print(f"Neighbors of node 0: {list(G.neighbors(0))[:5]}")
print(f"Has edge 0-1: {G.has_edge(0, 1)}")

# Set and get attributes
G.nodes[0]["role"] = "instructor"
nx.set_node_attributes(G, {0: "high", 33: "high"}, "importance")
G[0][1]["weight"] = 0.95

# Iterate with data
for u, v, data in G.edges(data=True):
    if "weight" in data:
        print(f"  Edge {u}-{v}: weight={data['weight']}")
        break

# Subgraphs (returns read-only view; use .copy() for mutable)
H = G.subgraph([0, 1, 2, 3, 4, 5]).copy()
print(f"Subgraph: {H.number_of_nodes()} nodes, {H.number_of_edges()} edges")

Module 3: Graph Analysis (Centrality)

import networkx as nx
G = nx.karate_club_graph()

degree_c = nx.degree_centrality(G)
between_c = nx.betweenness_centrality(G, weight="weight")
# For large graphs, approximate: nx.betweenness_centrality(G, k=100)
close_c = nx.closeness_centrality(G)
eigen_c = nx.eigenvector_centrality(G, max_iter=1000)
pr = nx.pagerank(G, alpha=0.85)

# Compare top nodes across measures
for name, metric in [("Degree", degree_c), ("Betweenness", between_c),
                     ("Closeness", close_c), ("PageRank", pr)]:
    top = max(metric, key=metric.get)
    print(f"{name:12s}: top node={top}, score={metric[top]:.4f}")

Module 4: Path and Connectivity

import networkx as nx
G = nx.karate_club_graph()

# Shortest path
path = nx.shortest_path(G, source=0, target=33)
length = nx.shortest_path_length(G, source=0, target=33)
print(f"Shortest path 0->33: {path} (length {length})")
print(f"Average shortest path length: {nx.average_shortest_path_length(G):.3f}")

# Connected components
print(f"Connected: {nx.is_connected(G)}")
components = list(nx.connected_components(G))
print(f"Components: {len(components)}, largest: {len(max(components, key=len))}")

# For directed graphs: strong/weak connectivity
D = nx.DiGraph([(0,1),(1,2),(2,0),(3,4)])
print(f"Strongly connected: {list(nx.strongly_connected_components(D))}")

# Connectivity measures
print(f"Node connectivity: {nx.node_connectivity(G)}")
print(f"Edge connectivity: {nx.edge_connectivity(G)}")

Module 5: Community Detection

Partition networks into densely connected groups.

import networkx as nx
from networkx.algorithms import community
import itertools

G = nx.karate_club_graph()

# Greedy modularity maximization
comms_greedy = community.greedy_modularity_communities(G)
mod_score = community.modularity(G, comms_greedy)
print(f"Greedy: {len(comms_greedy)} communities, modularity={mod_score:.4f}")

# Label propagation (fast, non-deterministic)
comms_lpa = community.label_propagation_communities(G)
print(f"Label propagation: {len(list(comms_lpa))} communities")

# Girvan-Newman (hierarchical, edge betweenness removal)
gn = community.girvan_newman(G)
# Get first level of partition
first_level = next(gn)
print(f"Girvan-Newman first split: {len(first_level)} groups")
print(f"  Sizes: {[len(c) for c in first_level]}")

Module 6: I/O and Serialization

import networkx as nx
import pandas as pd
import json

G = nx.karate_club_graph()

# Edge list (simple text format)
nx.write_edgelist(G, "karate.edgelist")
G_loaded = nx.read_edgelist("karate.edgelist", nodetype=int)

# GraphML (preserves all attributes, XML-based)
nx.write_graphml(G, "karate.graphml")
G_xml = nx.read_graphml("karate.graphml")

# JSON (node-link format, web-friendly for d3.js)
data = nx.node_link_data(G)
with open("karate.json", "w") as f:
    json.dump(data, f)

# Pandas integration
df = pd.DataFrame({"source": [1,2,3], "target": [2,3,4], "weight": [0.5,1.0,0.75]})
G_pd = nx.from_pandas_edgelist(df, "source", "target", edge_attr="weight")
df_out = nx.to_pandas_edgelist(G_pd)
print(f"Pandas round-trip: {len(df_out)} edges")

# NumPy/SciPy matrices
A = nx.to_numpy_array(G)
print(f"Adjacency matrix shape: {A.shape}")
A_sparse = nx.to_scipy_sparse_array(G, format="csr")  # Memory-efficient

Module 7: Visualization

import networkx as nx
import matplotlib.pyplot as plt

G = nx.karate_club_graph()
pos = nx.spring_layout(G, seed=42)

# Color by degree, size by betweenness centrality
bc = nx.betweenness_centrality(G)
fig, ax = plt.subplots(figsize=(10, 8))
nx.draw(G, pos=pos, ax=ax,
        node_color=[G.degree(n) for n in G.nodes()], cmap=plt.cm.viridis,
        node_size=[3000 * bc[n] + 100 for n in G.nodes()],
        edge_color="gray", alpha=0.8, with_labels=True, font_size=8)
plt.tight_layout()
plt.savefig("network.png", dpi=300, bbox_inches="tight")
plt.savefig("network.pdf", bbox_inches="tight")  # Vector format
print("Saved network.png and network.pdf")

Module 8: Generators

import networkx as nx

# Erdos-Renyi random graph: n nodes, edge probability p
G_er = nx.erdos_renyi_graph(n=200, p=0.05, seed=42)
print(f"ER: {G_er.number_of_nodes()} nodes, {G_er.number_of_edges()} edges")

# Barabasi-Albert scale-free (power-law degree distribution)
G_ba = nx.barabasi_albert_graph(n=200, m=3, seed=42)

# Watts-Strogatz small-world
G_ws = nx.watts_strogatz_graph(n=200, k=6, p=0.1, seed=42)
print(f"WS clustering: {nx.average_clustering(G_ws):.3f}")

# Stochastic block model (community structure)
sizes, probs = [50, 50, 50], [[0.25,0.05,0.02],[0.05,0.35,0.07],[0.02,0.07,0.40]]
G_sbm = nx.stochastic_block_model(sizes, probs, seed=42)

# Built-in datasets and classic graphs
G_karate = nx.karate_club_graph()       # Zachary's karate club
G_grid = nx.grid_2d_graph(5, 7)         # 2D lattice
G_tree = nx.random_tree(n=50, seed=42)  # Random tree
G_geo = nx.random_geometric_graph(n=100, radius=0.2, seed=42)
# See references/algorithms_generators.md for full generator catalog

Key Concepts

Graph Types

Class	Directed	Multi-edge	Self-loops	Use Case
`Graph`	No	No	Yes	Undirected networks: social, PPI
`DiGraph`	Yes	No	Yes	Gene regulation, citations, web
`MultiGraph`	No	Yes	Yes	Multiple relationship types
`MultiDiGraph`	Yes	Yes	Yes	Transportation with routes

Attribute Patterns

Attributes are stored as dictionaries at graph, node, and edge levels:

import networkx as nx
G = nx.Graph(name="example")              # Graph-level attribute
G.add_node(1, label="hub", weight=1.5)    # Node attributes
G.add_edge(1, 2, weight=0.8, type="ppi")  # Edge attributes

# Bulk set/get
nx.set_node_attributes(G, {1: "red", 2: "blue"}, "color")
colors = nx.get_node_attributes(G, "color")  # {1: 'red', 2: 'blue'}

Layout Algorithms

Layout	Function	Best For
Spring (force-directed)	`spring_layout(G, seed=42)`	General networks
Circular	`circular_layout(G)`	Regular graphs, cycles
Kamada-Kawai	`kamada_kawai_layout(G)`	Small-medium networks
Spectral	`spectral_layout(G)`	Highlighting clusters
Shell (concentric)	`shell_layout(G, nlist=[[...],[...]])`	Layered/hierarchical
Planar	`planar_layout(G)`	Planar graphs only

Common Workflows

Workflow 1: Social Network Analysis

Goal: Identify influential actors, detect communities, and visualize.

import networkx as nx
import matplotlib.pyplot as plt
from networkx.algorithms import community

# Step 1: Load network and basic stats
G = nx.karate_club_graph()
print(f"Network: {G.number_of_nodes()} actors, {G.number_of_edges()} ties")
print(f"Density: {nx.density(G):.4f}, Clustering: {nx.average_clustering(G):.4f}")

# Step 2: Identify influential nodes
bc = nx.betweenness_centrality(G)
top_bc = sorted(bc.items(), key=lambda x: x[1], reverse=True)[:5]
print("Top 5 by betweenness:", [(n, f"{s:.3f}") for n, s in top_bc])

# Step 3: Detect communities
comms = community.greedy_modularity_communities(G)
print(f"Communities: {len(comms)}, modularity: {community.modularity(G, comms):.4f}")

# Step 4: Visualize with community coloring
pos = nx.spring_layout(G, seed=42)
fig, ax = plt.subplots(figsize=(10, 8))
for i, comm in enumerate(comms):
    nx.draw_networkx_nodes(G, pos, nodelist=list(comm), ax=ax,
                           node_color=[plt.cm.Set2(i)]*len(comm), node_size=400)
nx.draw_networkx_edges(G, pos, ax=ax, alpha=0.3)
nx.draw_networkx_labels(G, pos, ax=ax, font_size=8)
plt.axis("off")
plt.tight_layout()
plt.savefig("social_network_analysis.png", dpi=300, bbox_inches="tight")
print("Saved social_network_analysis.png")

Workflow 2: Biological Interaction Network

Goal: Build a PPI network from tabular data, analyze topology, and identify hub proteins.

import networkx as nx
import pandas as pd

# Step 1: Load interaction data from DataFrame
interactions = pd.DataFrame({
    "protein_a": ["TP53","TP53","BRCA1","BRCA1","MDM2","ATM","ATM","CHEK2","RB1","CDK2"],
    "protein_b": ["MDM2","BRCA1","ATM","CHEK2","RB1","CHEK2","BRCA2","CDC25A","CDK2","CCNA2"],
    "score": [0.99, 0.95, 0.92, 0.88, 0.91, 0.97, 0.85, 0.90, 0.87, 0.93]
})
G = nx.from_pandas_edgelist(interactions, "protein_a", "protein_b",
                             edge_attr="score")
print(f"PPI network: {G.number_of_nodes()} proteins, {G.number_of_edges()} interactions")

# Step 2: Network statistics
print(f"Connected: {nx.is_connected(G)}")
print(f"Diameter: {nx.diameter(G)}")
print(f"Avg path length: {nx.average_shortest_path_length(G):.2f}")
print(f"Transitivity: {nx.transitivity(G):.4f}")

# Step 3: Hub identification (multiple centrality measures)
degree_c = nx.degree_centrality(G)
between_c = nx.betweenness_centrality(G)
close_c = nx.closeness_centrality(G)

results = pd.DataFrame({
    "protein": list(G.nodes()),
    "degree_centrality": [degree_c[n] for n in G.nodes()],
    "betweenness": [between_c[n] for n in G.nodes()],
    "closeness": [close_c[n] for n in G.nodes()],
}).sort_values("betweenness", ascending=False)
print("\nHub proteins:")
print(results.head(5).to_string(index=False))

# Step 4: Export for downstream analysis
nx.write_graphml(G, "ppi_network.graphml")
results.to_csv("protein_centrality.csv", index=False)
print("Exported ppi_network.graphml and protein_centrality.csv")

Key Parameters

Parameter	Module	Default	Range / Options	Effect
`weight`	Paths/Centrality	`None`	Edge attribute name	Use weighted edges for path/centrality calculations
`alpha`	`pagerank`	`0.85`	`0.0` - `1.0`	Damping factor; lower = more uniform distribution
`k`	`betweenness_centrality`	`None`	`int`	Sample k nodes for approximation on large graphs
`max_iter`	`eigenvector_centrality`	`100`	`int`	Max iterations for convergence
`seed`	Generators/Layouts	`None`	`int`	Random seed for reproducibility
`n` / `p` / `m`	ER/BA generators	varies	`int` / `float`	Node count, edge probability, edges per new node
`k` / `p`	Watts-Strogatz	varies	`int` / `float`	Nearest neighbors, rewiring probability
`nodetype`	`read_edgelist`	`str`	`int` , `float` , `str`	Type conversion for node identifiers
`edge_attr`	`from_pandas_edgelist`	`None`	Column name(s)	Edge attribute columns to include from DataFrame
`format`	`to_scipy_sparse_array`	`"csc"`	`"csr"` , `"csc"` , `"coo"`	Sparse matrix format

Best Practices

Always set random seeds for reproducible generators and layouts:
```
seed=42
```
in both
```
erdos_renyi_graph()
```
and
```
spring_layout()
```
.
Use approximate algorithms for large graphs:
```
nx.betweenness_centrality(G, k=500)
```
samples k nodes instead of all pairs.
Prefer
```
from_pandas_edgelist
```
over manual
```
add_edge
```
loops for bulk data loading -- handles attributes cleanly and is faster.
Copy subgraphs before modification:
```
G.subgraph(nodes)
```
returns a read-only view; call
```
.copy()
```
for a mutable independent graph.
Use GraphML or GML for persistent storage to preserve all node/edge attributes. Edge lists lose metadata unless explicitly handled.
Convert graph types explicitly:
```
D.to_undirected()
```
(DiGraph -> Graph),
```
nx.Graph(M)
```
(MultiGraph -> Graph, collapses multi-edges).
Use sparse matrices for large adjacency exports:
```
to_scipy_sparse_array()
```
is far more memory-efficient than
```
to_numpy_array()
```
.

Anti-pattern -- Don't use

nx.info()

: Deprecated; use

G.number_of_nodes()

G.number_of_edges()

nx.density(G)

directly.

Anti-pattern -- Don't assume node ordering: Algorithms may return results in different orders. Always index by node key, not position.

Common Recipes

Recipe: Minimum Spanning Tree

Extract the minimum spanning tree and compare to the original graph.

import networkx as nx

# Create weighted graph
G = nx.erdos_renyi_graph(50, 0.15, seed=42)
for u, v in G.edges():
    G[u][v]["weight"] = round(nx.utils.py_random_state(42).random(), 2)

mst = nx.minimum_spanning_tree(G, weight="weight")
print(f"Original: {G.number_of_edges()} edges")
print(f"MST: {mst.number_of_edges()} edges")
total_weight = sum(d["weight"] for _, _, d in mst.edges(data=True))
print(f"MST total weight: {total_weight:.2f}")

Recipe: Graph Coloring and Cliques

Find cliques and compute graph coloring.

import networkx as nx

G = nx.karate_club_graph()

# Find all maximal cliques
cliques = list(nx.find_cliques(G))
print(f"Maximal cliques: {len(cliques)}")
largest_clique = max(cliques, key=len)
print(f"Largest clique size: {len(largest_clique)}, nodes: {largest_clique}")

# Greedy graph coloring
coloring = nx.greedy_color(G, strategy="largest_first")
n_colors = max(coloring.values()) + 1
print(f"Chromatic number (greedy upper bound): {n_colors}")

Recipe: DAG and Topological Sort

Build a directed acyclic graph and find execution order.

import networkx as nx

# Task dependency DAG
D = nx.DiGraph()
D.add_edges_from([
    ("download_data", "preprocess"),
    ("download_data", "validate"),
    ("preprocess", "analyze"),
    ("validate", "analyze"),
    ("analyze", "visualize"),
    ("analyze", "report"),
    ("visualize", "report"),
])

print(f"Is DAG: {nx.is_directed_acyclic_graph(D)}")
order = list(nx.topological_sort(D))
print(f"Execution order: {order}")

# Find all paths from start to end
paths = list(nx.all_simple_paths(D, "download_data", "report"))
print(f"Paths to report: {len(paths)}")
for p in paths:
    print(f"  {' -> '.join(p)}")

Troubleshooting

Problem	Cause	Solution
`NetworkXError: Graph is not connected`	Algorithm requires connected graph	Extract largest component: `G.subgraph(max(nx.connected_components(G), key=len)).copy()`
`PowerIterationFailedConvergence`	Eigenvector/PageRank did not converge	Increase `max_iter` (e.g., 1000) or check for disconnected components
Very slow centrality computation	O(n*m) complexity on large graphs	Use `k` parameter for sampling: `betweenness_centrality(G, k=500)`
`nx.NetworkXNotImplemented`	Algorithm not available for graph type	Convert graph type: `G.to_undirected()` or `G.to_directed()`
Memory error on large graphs	Dense adjacency matrix	Use `to_scipy_sparse_array()` instead of `to_numpy_array()`
Node IDs read as strings from file	`read_edgelist` defaults to `str`	Pass `nodetype=int` : `nx.read_edgelist(f, nodetype=int)`
Community detection returns frozen sets	Normal return type for communities	Convert: `[list(c) for c in communities]`
Self-loops in generated graphs	Configuration model allows self-loops	Remove: `G.remove_edges_from(nx.selfloop_edges(G))`
Visualization too cluttered	Too many nodes/edges	Filter to subgraph, adjust `alpha` , increase figure size, or use interactive tools (Plotly, PyVis)

Bundled Resources

Migrated from original entry (STUB: 436-line main file + 2,014 lines across 5 reference files, main/total = 17.8%).

references/algorithms_generators.md

Covers: Detailed algorithm parameters for traversal (DFS/BFS), cycles, cliques, graph coloring, isomorphism, matching/covering, tree algorithms (MST variants). Full generator catalog: classic graphs, lattice/grid, tree, bipartite, degree sequence, graph operations (union, compose, complement, products). Relocated inline: Core algorithms (centrality, paths, connectivity, community, flow) -> Core API Modules 3-5. Core generators (ER, BA, WS, SBM) -> Module 8. Omitted: A* heuristic customization, Bellman-Ford negative weights -- consult official docs.

Original file disposition:

```
algorithms.md
```
(383 lines): Top algorithms relocated to Core API Modules 3-5 + Recipes. Remaining (traversal, cliques, coloring, isomorphism, matching, cycles, trees) -> this reference.
```
generators.md
```
(378 lines): Core generators relocated to Module 8. Full catalog (classic, lattice, tree, bipartite, degree sequence, operators) -> this reference.

references/io_visualization.md

Covers: All I/O formats (adjacency list, GEXF, Pajek, LEDA, Cytoscape JSON, DOT/Graphviz, Matrix Market, CSV, database/SQL, compressed gzip). Format selection guide. Advanced visualization: Plotly interactive, PyVis HTML, Graphviz layouts, 3D networks, bipartite layout, community coloring, subgraph highlighting, multi-panel figures, edge labels, directed arrows. Relocated inline: Core I/O (edge list, GraphML, JSON, pandas, NumPy/SciPy) -> Module 6. Basic matplotlib -> Module 7. Omitted:

write_gpickle

read_gpickle

(deprecated),

read_shp

write_shp

(removed in NetworkX 3.0; use geopandas).

Original file disposition:

```
io.md
```
(441 lines): Core formats relocated to Module 6. Remaining formats + format selection guide -> this reference.
```
visualization.md
```
(529 lines): Basic matplotlib relocated to Module 7. Advanced techniques (Plotly, PyVis, 3D, bipartite, community coloring) -> this reference.

Fully consolidated original file

```
graph-basics.md
```
(283 lines): Fully consolidated into main SKILL.md. Graph types -> Key Concepts. Node/edge operations, attributes, subgraphs -> Core API Modules 1-2. Diagnostics -> Common Workflows. Memory/float-point considerations -> Best Practices + Troubleshooting. Omitted:
```
nx.info()
```
(deprecated).

Related Skills

torch-geometric-graph-neural-networks -- graph neural networks (GCN, GAT, GraphSAGE) for node/graph classification and link prediction on graph-structured data
matplotlib-scientific-plotting -- advanced figure customization beyond NetworkX's built-in
```
nx.draw
```
plotly-interactive-visualization -- interactive network plots with hover, zoom, and pan
pandas (planned) -- DataFrame operations for preparing edge/node data before graph construction
scipy (planned) -- sparse matrix operations and numerical algorithms used by NetworkX internally

References

NetworkX documentation -- official docs and API reference
NetworkX tutorial -- official getting started guide
NetworkX GitHub -- source code and issue tracker
NetworkX gallery -- example gallery with visualizations
Hagberg, A., Schult, D., & Swart, P. (2008). Exploring network structure, dynamics, and function using NetworkX. SciPy 2008.