SciAgent-Skills networkx-graph-analysis

Graph and network analysis toolkit: create, manipulate, and analyze complex networks. Four graph types (directed, undirected, multi-edge), centrality measures, shortest paths, community detection, graph generators, I/O (GraphML, GML, edge list, pandas, NumPy), visualization with matplotlib. For large-scale graphs (100K+ nodes) use igraph or graph-tool; for graph neural networks use PyG.

install
source · Clone the upstream repo
git clone https://github.com/jaechang-hits/SciAgent-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jaechang-hits/SciAgent-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/scientific-computing/networkx-graph-analysis" ~/.claude/skills/jaechang-hits-sciagent-skills-networkx-graph-analysis && rm -rf "$T"
manifest: skills/scientific-computing/networkx-graph-analysis/SKILL.md
source content

NetworkX Graph Analysis

Overview

NetworkX is a Python library for creating, manipulating, and analyzing complex networks and graphs. It provides data structures for undirected, directed, and multi-edge graphs along with a comprehensive collection of graph algorithms, generators, and I/O utilities. Use NetworkX when working with relationship data in social networks, biological interaction networks, transportation systems, citation graphs, or any domain involving pairwise entity relationships.

When to Use

  • Analyzing protein-protein interaction networks, gene regulatory networks, or metabolic pathways
  • Computing centrality measures (degree, betweenness, PageRank) to identify important nodes
  • Finding shortest paths or optimal routes in transportation or communication networks
  • Detecting communities or clusters in social networks or co-expression data
  • Generating synthetic networks (scale-free, small-world, random) for simulation or null models
  • Reading and writing graph data in standard formats (GraphML, GML, edge lists, JSON)
  • Visualizing network topology with node/edge attribute mapping
  • Checking graph properties: connectivity, planarity, isomorphism, DAG structure
  • For large-scale graphs (100K+ nodes) where speed is critical, use
    igraph
    or
    graph-tool
    instead
  • For billion-edge graphs or GPU-accelerated analytics, use
    graph-tool
    with OpenMP or
    cuGraph
  • For graph neural networks and deep learning on graphs, use
    torch-geometric-graph-neural-networks

Prerequisites

  • Python packages:
    networkx
    ,
    matplotlib
    ,
    scipy
    ,
    pandas
    ,
    numpy
  • Optional:
    pydot
    or
    pygraphviz
    (Graphviz layouts)
pip install networkx matplotlib scipy pandas numpy

Quick Start

import networkx as nx

# Create a graph and add edges with weights
G = nx.karate_club_graph()
print(f"Nodes: {G.number_of_nodes()}, Edges: {G.number_of_edges()}")
# Nodes: 34, Edges: 78

# Compute centrality and find most central node
bc = nx.betweenness_centrality(G)
top_node = max(bc, key=bc.get)
print(f"Most central node: {top_node}, betweenness: {bc[top_node]:.3f}")

# Detect communities
from networkx.algorithms import community
comms = community.greedy_modularity_communities(G)
print(f"Communities found: {len(comms)}")

Core API

Module 1: Graph Creation and Types

import networkx as nx

# Undirected graph (most common)
G = nx.Graph()
G.add_node("protein_A", type="kinase", weight=1.5)
G.add_nodes_from(["protein_B", "protein_C"])
G.add_edge("protein_A", "protein_B", weight=0.9, interaction="phosphorylation")
G.add_edges_from([("protein_B", "protein_C"), ("protein_A", "protein_C")])
print(f"Nodes: {G.number_of_nodes()}, Edges: {G.number_of_edges()}")
# Nodes: 3, Edges: 3

# Directed graph (gene regulation, citations)
D = nx.DiGraph()
D.add_edges_from([("TF1", "geneA"), ("TF1", "geneB"), ("TF2", "geneA")])
print(f"TF1 out-degree: {D.out_degree('TF1')}")  # 2

# MultiGraph (multiple relationship types between same nodes)
M = nx.MultiGraph()
M.add_edge("A", "B", key="binding", affinity=0.8)
M.add_edge("A", "B", key="regulation", effect="inhibition")
print(f"Edges between A-B: {M.number_of_edges('A', 'B')}")  # 2

Module 2: Node and Edge Operations

import networkx as nx
G = nx.karate_club_graph()

# Query structure
print(f"Degree of node 0: {G.degree(0)}")
print(f"Neighbors of node 0: {list(G.neighbors(0))[:5]}")
print(f"Has edge 0-1: {G.has_edge(0, 1)}")

# Set and get attributes
G.nodes[0]["role"] = "instructor"
nx.set_node_attributes(G, {0: "high", 33: "high"}, "importance")
G[0][1]["weight"] = 0.95

# Iterate with data
for u, v, data in G.edges(data=True):
    if "weight" in data:
        print(f"  Edge {u}-{v}: weight={data['weight']}")
        break

# Subgraphs (returns read-only view; use .copy() for mutable)
H = G.subgraph([0, 1, 2, 3, 4, 5]).copy()
print(f"Subgraph: {H.number_of_nodes()} nodes, {H.number_of_edges()} edges")

Module 3: Graph Analysis (Centrality)

import networkx as nx
G = nx.karate_club_graph()

degree_c = nx.degree_centrality(G)
between_c = nx.betweenness_centrality(G, weight="weight")
# For large graphs, approximate: nx.betweenness_centrality(G, k=100)
close_c = nx.closeness_centrality(G)
eigen_c = nx.eigenvector_centrality(G, max_iter=1000)
pr = nx.pagerank(G, alpha=0.85)

# Compare top nodes across measures
for name, metric in [("Degree", degree_c), ("Betweenness", between_c),
                     ("Closeness", close_c), ("PageRank", pr)]:
    top = max(metric, key=metric.get)
    print(f"{name:12s}: top node={top}, score={metric[top]:.4f}")

Module 4: Path and Connectivity

import networkx as nx
G = nx.karate_club_graph()

# Shortest path
path = nx.shortest_path(G, source=0, target=33)
length = nx.shortest_path_length(G, source=0, target=33)
print(f"Shortest path 0->33: {path} (length {length})")
print(f"Average shortest path length: {nx.average_shortest_path_length(G):.3f}")

# Connected components
print(f"Connected: {nx.is_connected(G)}")
components = list(nx.connected_components(G))
print(f"Components: {len(components)}, largest: {len(max(components, key=len))}")

# For directed graphs: strong/weak connectivity
D = nx.DiGraph([(0,1),(1,2),(2,0),(3,4)])
print(f"Strongly connected: {list(nx.strongly_connected_components(D))}")

# Connectivity measures
print(f"Node connectivity: {nx.node_connectivity(G)}")
print(f"Edge connectivity: {nx.edge_connectivity(G)}")

Module 5: Community Detection

Partition networks into densely connected groups.

import networkx as nx
from networkx.algorithms import community
import itertools

G = nx.karate_club_graph()

# Greedy modularity maximization
comms_greedy = community.greedy_modularity_communities(G)
mod_score = community.modularity(G, comms_greedy)
print(f"Greedy: {len(comms_greedy)} communities, modularity={mod_score:.4f}")

# Label propagation (fast, non-deterministic)
comms_lpa = community.label_propagation_communities(G)
print(f"Label propagation: {len(list(comms_lpa))} communities")

# Girvan-Newman (hierarchical, edge betweenness removal)
gn = community.girvan_newman(G)
# Get first level of partition
first_level = next(gn)
print(f"Girvan-Newman first split: {len(first_level)} groups")
print(f"  Sizes: {[len(c) for c in first_level]}")

Module 6: I/O and Serialization

import networkx as nx
import pandas as pd
import json

G = nx.karate_club_graph()

# Edge list (simple text format)
nx.write_edgelist(G, "karate.edgelist")
G_loaded = nx.read_edgelist("karate.edgelist", nodetype=int)

# GraphML (preserves all attributes, XML-based)
nx.write_graphml(G, "karate.graphml")
G_xml = nx.read_graphml("karate.graphml")

# JSON (node-link format, web-friendly for d3.js)
data = nx.node_link_data(G)
with open("karate.json", "w") as f:
    json.dump(data, f)

# Pandas integration
df = pd.DataFrame({"source": [1,2,3], "target": [2,3,4], "weight": [0.5,1.0,0.75]})
G_pd = nx.from_pandas_edgelist(df, "source", "target", edge_attr="weight")
df_out = nx.to_pandas_edgelist(G_pd)
print(f"Pandas round-trip: {len(df_out)} edges")

# NumPy/SciPy matrices
A = nx.to_numpy_array(G)
print(f"Adjacency matrix shape: {A.shape}")
A_sparse = nx.to_scipy_sparse_array(G, format="csr")  # Memory-efficient

Module 7: Visualization

import networkx as nx
import matplotlib.pyplot as plt

G = nx.karate_club_graph()
pos = nx.spring_layout(G, seed=42)

# Color by degree, size by betweenness centrality
bc = nx.betweenness_centrality(G)
fig, ax = plt.subplots(figsize=(10, 8))
nx.draw(G, pos=pos, ax=ax,
        node_color=[G.degree(n) for n in G.nodes()], cmap=plt.cm.viridis,
        node_size=[3000 * bc[n] + 100 for n in G.nodes()],
        edge_color="gray", alpha=0.8, with_labels=True, font_size=8)
plt.tight_layout()
plt.savefig("network.png", dpi=300, bbox_inches="tight")
plt.savefig("network.pdf", bbox_inches="tight")  # Vector format
print("Saved network.png and network.pdf")

Module 8: Generators

import networkx as nx

# Erdos-Renyi random graph: n nodes, edge probability p
G_er = nx.erdos_renyi_graph(n=200, p=0.05, seed=42)
print(f"ER: {G_er.number_of_nodes()} nodes, {G_er.number_of_edges()} edges")

# Barabasi-Albert scale-free (power-law degree distribution)
G_ba = nx.barabasi_albert_graph(n=200, m=3, seed=42)

# Watts-Strogatz small-world
G_ws = nx.watts_strogatz_graph(n=200, k=6, p=0.1, seed=42)
print(f"WS clustering: {nx.average_clustering(G_ws):.3f}")

# Stochastic block model (community structure)
sizes, probs = [50, 50, 50], [[0.25,0.05,0.02],[0.05,0.35,0.07],[0.02,0.07,0.40]]
G_sbm = nx.stochastic_block_model(sizes, probs, seed=42)

# Built-in datasets and classic graphs
G_karate = nx.karate_club_graph()       # Zachary's karate club
G_grid = nx.grid_2d_graph(5, 7)         # 2D lattice
G_tree = nx.random_tree(n=50, seed=42)  # Random tree
G_geo = nx.random_geometric_graph(n=100, radius=0.2, seed=42)
# See references/algorithms_generators.md for full generator catalog

Key Concepts

Graph Types

ClassDirectedMulti-edgeSelf-loopsUse Case
Graph
NoNoYesUndirected networks: social, PPI
DiGraph
YesNoYesGene regulation, citations, web
MultiGraph
NoYesYesMultiple relationship types
MultiDiGraph
YesYesYesTransportation with routes

Attribute Patterns

Attributes are stored as dictionaries at graph, node, and edge levels:

import networkx as nx
G = nx.Graph(name="example")              # Graph-level attribute
G.add_node(1, label="hub", weight=1.5)    # Node attributes
G.add_edge(1, 2, weight=0.8, type="ppi")  # Edge attributes

# Bulk set/get
nx.set_node_attributes(G, {1: "red", 2: "blue"}, "color")
colors = nx.get_node_attributes(G, "color")  # {1: 'red', 2: 'blue'}

Layout Algorithms

LayoutFunctionBest For
Spring (force-directed)
spring_layout(G, seed=42)
General networks
Circular
circular_layout(G)
Regular graphs, cycles
Kamada-Kawai
kamada_kawai_layout(G)
Small-medium networks
Spectral
spectral_layout(G)
Highlighting clusters
Shell (concentric)
shell_layout(G, nlist=[[...],[...]])
Layered/hierarchical
Planar
planar_layout(G)
Planar graphs only

Common Workflows

Workflow 1: Social Network Analysis

Goal: Identify influential actors, detect communities, and visualize.

import networkx as nx
import matplotlib.pyplot as plt
from networkx.algorithms import community

# Step 1: Load network and basic stats
G = nx.karate_club_graph()
print(f"Network: {G.number_of_nodes()} actors, {G.number_of_edges()} ties")
print(f"Density: {nx.density(G):.4f}, Clustering: {nx.average_clustering(G):.4f}")

# Step 2: Identify influential nodes
bc = nx.betweenness_centrality(G)
top_bc = sorted(bc.items(), key=lambda x: x[1], reverse=True)[:5]
print("Top 5 by betweenness:", [(n, f"{s:.3f}") for n, s in top_bc])

# Step 3: Detect communities
comms = community.greedy_modularity_communities(G)
print(f"Communities: {len(comms)}, modularity: {community.modularity(G, comms):.4f}")

# Step 4: Visualize with community coloring
pos = nx.spring_layout(G, seed=42)
fig, ax = plt.subplots(figsize=(10, 8))
for i, comm in enumerate(comms):
    nx.draw_networkx_nodes(G, pos, nodelist=list(comm), ax=ax,
                           node_color=[plt.cm.Set2(i)]*len(comm), node_size=400)
nx.draw_networkx_edges(G, pos, ax=ax, alpha=0.3)
nx.draw_networkx_labels(G, pos, ax=ax, font_size=8)
plt.axis("off")
plt.tight_layout()
plt.savefig("social_network_analysis.png", dpi=300, bbox_inches="tight")
print("Saved social_network_analysis.png")

Workflow 2: Biological Interaction Network

Goal: Build a PPI network from tabular data, analyze topology, and identify hub proteins.

import networkx as nx
import pandas as pd

# Step 1: Load interaction data from DataFrame
interactions = pd.DataFrame({
    "protein_a": ["TP53","TP53","BRCA1","BRCA1","MDM2","ATM","ATM","CHEK2","RB1","CDK2"],
    "protein_b": ["MDM2","BRCA1","ATM","CHEK2","RB1","CHEK2","BRCA2","CDC25A","CDK2","CCNA2"],
    "score": [0.99, 0.95, 0.92, 0.88, 0.91, 0.97, 0.85, 0.90, 0.87, 0.93]
})
G = nx.from_pandas_edgelist(interactions, "protein_a", "protein_b",
                             edge_attr="score")
print(f"PPI network: {G.number_of_nodes()} proteins, {G.number_of_edges()} interactions")

# Step 2: Network statistics
print(f"Connected: {nx.is_connected(G)}")
print(f"Diameter: {nx.diameter(G)}")
print(f"Avg path length: {nx.average_shortest_path_length(G):.2f}")
print(f"Transitivity: {nx.transitivity(G):.4f}")

# Step 3: Hub identification (multiple centrality measures)
degree_c = nx.degree_centrality(G)
between_c = nx.betweenness_centrality(G)
close_c = nx.closeness_centrality(G)

results = pd.DataFrame({
    "protein": list(G.nodes()),
    "degree_centrality": [degree_c[n] for n in G.nodes()],
    "betweenness": [between_c[n] for n in G.nodes()],
    "closeness": [close_c[n] for n in G.nodes()],
}).sort_values("betweenness", ascending=False)
print("\nHub proteins:")
print(results.head(5).to_string(index=False))

# Step 4: Export for downstream analysis
nx.write_graphml(G, "ppi_network.graphml")
results.to_csv("protein_centrality.csv", index=False)
print("Exported ppi_network.graphml and protein_centrality.csv")

Key Parameters

ParameterModuleDefaultRange / OptionsEffect
weight
Paths/Centrality
None
Edge attribute nameUse weighted edges for path/centrality calculations
alpha
pagerank
0.85
0.0
-
1.0
Damping factor; lower = more uniform distribution
k
betweenness_centrality
None
int
Sample k nodes for approximation on large graphs
max_iter
eigenvector_centrality
100
int
Max iterations for convergence
seed
Generators/Layouts
None
int
Random seed for reproducibility
n
/
p
/
m
ER/BA generatorsvaries
int
/
float
Node count, edge probability, edges per new node
k
/
p
Watts-Strogatzvaries
int
/
float
Nearest neighbors, rewiring probability
nodetype
read_edgelist
str
int
,
float
,
str
Type conversion for node identifiers
edge_attr
from_pandas_edgelist
None
Column name(s)Edge attribute columns to include from DataFrame
format
to_scipy_sparse_array
"csc"
"csr"
,
"csc"
,
"coo"
Sparse matrix format

Best Practices

  1. Always set random seeds for reproducible generators and layouts:

    seed=42
    in both
    erdos_renyi_graph()
    and
    spring_layout()
    .

  2. Use approximate algorithms for large graphs:

    nx.betweenness_centrality(G, k=500)
    samples k nodes instead of all pairs.

  3. Prefer

    from_pandas_edgelist
    over manual
    add_edge
    loops for bulk data loading -- handles attributes cleanly and is faster.

  4. Copy subgraphs before modification:

    G.subgraph(nodes)
    returns a read-only view; call
    .copy()
    for a mutable independent graph.

  5. Use GraphML or GML for persistent storage to preserve all node/edge attributes. Edge lists lose metadata unless explicitly handled.

  6. Convert graph types explicitly:

    D.to_undirected()
    (DiGraph -> Graph),
    nx.Graph(M)
    (MultiGraph -> Graph, collapses multi-edges).

  7. Use sparse matrices for large adjacency exports:

    to_scipy_sparse_array()
    is far more memory-efficient than
    to_numpy_array()
    .

  8. Anti-pattern -- Don't use

    nx.info()
    : Deprecated; use
    G.number_of_nodes()
    ,
    G.number_of_edges()
    ,
    nx.density(G)
    directly.

  9. Anti-pattern -- Don't assume node ordering: Algorithms may return results in different orders. Always index by node key, not position.

Common Recipes

Recipe: Minimum Spanning Tree

Extract the minimum spanning tree and compare to the original graph.

import networkx as nx

# Create weighted graph
G = nx.erdos_renyi_graph(50, 0.15, seed=42)
for u, v in G.edges():
    G[u][v]["weight"] = round(nx.utils.py_random_state(42).random(), 2)

mst = nx.minimum_spanning_tree(G, weight="weight")
print(f"Original: {G.number_of_edges()} edges")
print(f"MST: {mst.number_of_edges()} edges")
total_weight = sum(d["weight"] for _, _, d in mst.edges(data=True))
print(f"MST total weight: {total_weight:.2f}")

Recipe: Graph Coloring and Cliques

Find cliques and compute graph coloring.

import networkx as nx

G = nx.karate_club_graph()

# Find all maximal cliques
cliques = list(nx.find_cliques(G))
print(f"Maximal cliques: {len(cliques)}")
largest_clique = max(cliques, key=len)
print(f"Largest clique size: {len(largest_clique)}, nodes: {largest_clique}")

# Greedy graph coloring
coloring = nx.greedy_color(G, strategy="largest_first")
n_colors = max(coloring.values()) + 1
print(f"Chromatic number (greedy upper bound): {n_colors}")

Recipe: DAG and Topological Sort

Build a directed acyclic graph and find execution order.

import networkx as nx

# Task dependency DAG
D = nx.DiGraph()
D.add_edges_from([
    ("download_data", "preprocess"),
    ("download_data", "validate"),
    ("preprocess", "analyze"),
    ("validate", "analyze"),
    ("analyze", "visualize"),
    ("analyze", "report"),
    ("visualize", "report"),
])

print(f"Is DAG: {nx.is_directed_acyclic_graph(D)}")
order = list(nx.topological_sort(D))
print(f"Execution order: {order}")

# Find all paths from start to end
paths = list(nx.all_simple_paths(D, "download_data", "report"))
print(f"Paths to report: {len(paths)}")
for p in paths:
    print(f"  {' -> '.join(p)}")

Troubleshooting

ProblemCauseSolution
NetworkXError: Graph is not connected
Algorithm requires connected graphExtract largest component:
G.subgraph(max(nx.connected_components(G), key=len)).copy()
PowerIterationFailedConvergence
Eigenvector/PageRank did not convergeIncrease
max_iter
(e.g., 1000) or check for disconnected components
Very slow centrality computationO(n*m) complexity on large graphsUse
k
parameter for sampling:
betweenness_centrality(G, k=500)
nx.NetworkXNotImplemented
Algorithm not available for graph typeConvert graph type:
G.to_undirected()
or
G.to_directed()
Memory error on large graphsDense adjacency matrixUse
to_scipy_sparse_array()
instead of
to_numpy_array()
Node IDs read as strings from file
read_edgelist
defaults to
str
Pass
nodetype=int
:
nx.read_edgelist(f, nodetype=int)
Community detection returns frozen setsNormal return type for communitiesConvert:
[list(c) for c in communities]
Self-loops in generated graphsConfiguration model allows self-loopsRemove:
G.remove_edges_from(nx.selfloop_edges(G))
Visualization too clutteredToo many nodes/edgesFilter to subgraph, adjust
alpha
, increase figure size, or use interactive tools (Plotly, PyVis)

Bundled Resources

Migrated from original entry (STUB: 436-line main file + 2,014 lines across 5 reference files, main/total = 17.8%).

references/algorithms_generators.md

Covers: Detailed algorithm parameters for traversal (DFS/BFS), cycles, cliques, graph coloring, isomorphism, matching/covering, tree algorithms (MST variants). Full generator catalog: classic graphs, lattice/grid, tree, bipartite, degree sequence, graph operations (union, compose, complement, products). Relocated inline: Core algorithms (centrality, paths, connectivity, community, flow) -> Core API Modules 3-5. Core generators (ER, BA, WS, SBM) -> Module 8. Omitted: A* heuristic customization, Bellman-Ford negative weights -- consult official docs.

Original file disposition:

  • algorithms.md
    (383 lines): Top algorithms relocated to Core API Modules 3-5 + Recipes. Remaining (traversal, cliques, coloring, isomorphism, matching, cycles, trees) -> this reference.
  • generators.md
    (378 lines): Core generators relocated to Module 8. Full catalog (classic, lattice, tree, bipartite, degree sequence, operators) -> this reference.

references/io_visualization.md

Covers: All I/O formats (adjacency list, GEXF, Pajek, LEDA, Cytoscape JSON, DOT/Graphviz, Matrix Market, CSV, database/SQL, compressed gzip). Format selection guide. Advanced visualization: Plotly interactive, PyVis HTML, Graphviz layouts, 3D networks, bipartite layout, community coloring, subgraph highlighting, multi-panel figures, edge labels, directed arrows. Relocated inline: Core I/O (edge list, GraphML, JSON, pandas, NumPy/SciPy) -> Module 6. Basic matplotlib -> Module 7. Omitted:

write_gpickle
/
read_gpickle
(deprecated),
read_shp
/
write_shp
(removed in NetworkX 3.0; use geopandas).

Original file disposition:

  • io.md
    (441 lines): Core formats relocated to Module 6. Remaining formats + format selection guide -> this reference.
  • visualization.md
    (529 lines): Basic matplotlib relocated to Module 7. Advanced techniques (Plotly, PyVis, 3D, bipartite, community coloring) -> this reference.

Fully consolidated original file

  • graph-basics.md
    (283 lines): Fully consolidated into main SKILL.md. Graph types -> Key Concepts. Node/edge operations, attributes, subgraphs -> Core API Modules 1-2. Diagnostics -> Common Workflows. Memory/float-point considerations -> Best Practices + Troubleshooting. Omitted:
    nx.info()
    (deprecated).

Related Skills

  • torch-geometric-graph-neural-networks -- graph neural networks (GCN, GAT, GraphSAGE) for node/graph classification and link prediction on graph-structured data
  • matplotlib-scientific-plotting -- advanced figure customization beyond NetworkX's built-in
    nx.draw
  • plotly-interactive-visualization -- interactive network plots with hover, zoom, and pan
  • pandas (planned) -- DataFrame operations for preparing edge/node data before graph construction
  • scipy (planned) -- sparse matrix operations and numerical algorithms used by NetworkX internally

References