SciAgent-Skills torch-geometric-graph-neural-networks

PyTorch Geometric (PyG) for graph neural networks. Node classification, graph classification, link prediction with GCN, GAT, GraphSAGE, GIN layers. Message passing framework, mini-batch processing, heterogeneous graphs, neighbor sampling for large-scale learning, model explainability. Supports molecular property prediction (QM9, MoleculeNet), social networks, knowledge graphs, 3D point clouds. For non-graph deep learning use PyTorch directly; for traditional graph algorithms use NetworkX.

install
source · Clone the upstream repo
git clone https://github.com/jaechang-hits/SciAgent-Skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jaechang-hits/SciAgent-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/scientific-computing/torch-geometric-graph-neural-networks" ~/.claude/skills/jaechang-hits-sciagent-skills-torch-geometric-graph-neural-networks && rm -rf "$T"
manifest: skills/scientific-computing/torch-geometric-graph-neural-networks/SKILL.md
source content

PyTorch Geometric (PyG) — Graph Neural Networks

Overview

PyTorch Geometric is a library built on PyTorch for developing and training Graph Neural Networks (GNNs). It provides 40+ convolutional layers, mini-batch processing via block-diagonal adjacency matrices, neighbor sampling for large-scale graphs, and heterogeneous graph support for multi-type node/edge networks.

When to Use

  • Node classification on citation, social, or biological networks
  • Graph-level classification (molecular activity, protein function)
  • Link prediction (knowledge graphs, recommendation systems)
  • Molecular property prediction (drug discovery, quantum chemistry)
  • 3D point cloud processing and mesh analysis
  • Large-scale graph learning with neighbor sampling (>100K nodes)
  • Heterogeneous graphs with multiple node/edge types
  • For non-graph deep learning → use PyTorch directly
  • For traditional graph algorithms (shortest path, centrality) → use NetworkX

Prerequisites

pip install torch torch_geometric
# Optional sparse operations (recommended):
# pip install pyg_lib torch_scatter torch_sparse torch_cluster
import torch
import torch.nn.functional as F
from torch_geometric.data import Data
from torch_geometric.nn import GCNConv

Quick Start

from torch_geometric.datasets import Planetoid
from torch_geometric.nn import GCNConv
import torch, torch.nn.functional as F

dataset = Planetoid(root='/tmp/Cora', name='Cora')
data = dataset[0]

class GCN(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = GCNConv(dataset.num_features, 16)
        self.conv2 = GCNConv(16, dataset.num_classes)
    def forward(self, data):
        x = F.relu(self.conv1(data.x, data.edge_index))
        return self.conv2(x, data.edge_index)

model = GCN()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)
for epoch in range(200):
    model.train(); optimizer.zero_grad()
    F.cross_entropy(model(data)[data.train_mask], data.y[data.train_mask]).backward()
    optimizer.step()

model.eval()
pred = model(data).argmax(dim=1)
acc = (pred[data.test_mask] == data.y[data.test_mask]).float().mean()
print(f'Test Accuracy: {acc:.4f}')  # ~0.81

Core API

1. Data Representation

import torch
from torch_geometric.data import Data

# Create a graph: 3 nodes, 4 edges (undirected)
edge_index = torch.tensor([[0, 1, 1, 2],
                           [1, 0, 2, 1]], dtype=torch.long)
x = torch.randn(3, 16)  # Node features [num_nodes, features]
y = torch.tensor([0, 1, 0])  # Node labels

data = Data(x=x, edge_index=edge_index, y=y)
print(f'Nodes: {data.num_nodes}, Edges: {data.num_edges}')
print(f'Features: {data.num_node_features}')
print(f'Has self-loops: {data.has_self_loops()}')
print(f'Is undirected: {data.is_undirected()}')

# Optional attributes
data.edge_attr = torch.randn(4, 8)   # Edge features [num_edges, features]
data.pos = torch.randn(3, 3)          # Node positions (3D)
data.train_mask = torch.tensor([True, True, False])  # Custom masks
# Mini-batch processing — graphs concatenated as block-diagonal
from torch_geometric.loader import DataLoader

loader = DataLoader(dataset, batch_size=32, shuffle=True)
for batch in loader:
    print(f'Graphs: {batch.num_graphs}, Nodes: {batch.num_nodes}')
    # batch.batch maps each node → its source graph index
    # No padding needed — computationally efficient

2. Convolutional Layers

from torch_geometric.nn import GCNConv, GATConv, SAGEConv, GINConv
import torch.nn as nn

# GCNConv — spectral graph convolution (baseline)
conv = GCNConv(in_channels=16, out_channels=32)
# Supports: edge_weight, SparseTensor, Bipartite, Lazy init

# GATConv — attention-based neighbor weighting
conv = GATConv(16, 32, heads=8, dropout=0.6)
# Output: [N, heads * out_channels] (concat) or [N, out_channels] (concat=False)

# SAGEConv — inductive learning via sampling
conv = SAGEConv(16, 32, aggr='mean')  # 'mean', 'max', 'lstm'

# GINConv — maximally powerful for graph isomorphism
nn_module = nn.Sequential(nn.Linear(16, 32), nn.ReLU(), nn.Linear(32, 32))
conv = GINConv(nn_module)

# TransformerConv — graph transformer
from torch_geometric.nn import TransformerConv
conv = TransformerConv(16, 32, heads=8, beta=True)

# All layers: x_out = conv(x, edge_index)
x_out = conv(x, edge_index)
print(f'Output shape: {x_out.shape}')  # [num_nodes, out_channels]

3. Custom Message Passing

from torch_geometric.nn import MessagePassing
from torch_geometric.utils import add_self_loops, degree

class CustomConv(MessagePassing):
    def __init__(self, in_channels, out_channels):
        super().__init__(aggr='add')  # 'add', 'mean', 'max'
        self.lin = torch.nn.Linear(in_channels, out_channels)

    def forward(self, x, edge_index):
        edge_index, _ = add_self_loops(edge_index, num_nodes=x.size(0))
        x = self.lin(x)

        # Degree-based normalization
        row, col = edge_index
        deg = degree(col, x.size(0), dtype=x.dtype)
        norm = deg.pow(-0.5)
        norm = norm[row] * norm[col]

        return self.propagate(edge_index, x=x, norm=norm)

    def message(self, x_j, norm):
        # x_j: source node features (automatic via _j suffix)
        return norm.view(-1, 1) * x_j

# Key methods: forward(), message(), aggregate(), update()
# _i suffix → target node, _j suffix → source node

4. Pooling & Graph-Level Readout

from torch_geometric.nn import (
    global_mean_pool, global_max_pool, global_add_pool,
    TopKPooling, SAGPooling
)

# Global pooling: node features → graph-level representation
x_graph = global_mean_pool(x, batch)  # [num_graphs, features]

# Hierarchical pooling: coarsen graph
pool = TopKPooling(64, ratio=0.8)  # Keep top 80% nodes
x, edge_index, _, batch, _, _ = pool(x, edge_index, None, batch)

# Graph classification model
class GraphClassifier(torch.nn.Module):
    def __init__(self, num_features, num_classes):
        super().__init__()
        self.conv1 = GCNConv(num_features, 64)
        self.conv2 = GCNConv(64, 64)
        self.pool = TopKPooling(64, ratio=0.8)
        self.lin = torch.nn.Linear(64, num_classes)

    def forward(self, data):
        x, edge_index, batch = data.x, data.edge_index, data.batch
        x = F.relu(self.conv1(x, edge_index))
        x, edge_index, _, batch, _, _ = self.pool(x, edge_index, None, batch)
        x = F.relu(self.conv2(x, edge_index))
        x = global_mean_pool(x, batch)
        return self.lin(x)

5. Heterogeneous Graphs

from torch_geometric.data import HeteroData
from torch_geometric.nn import HeteroConv, GCNConv, SAGEConv, to_hetero

# Create heterogeneous graph
data = HeteroData()
data['paper'].x = torch.randn(100, 128)
data['author'].x = torch.randn(200, 64)
data['author', 'writes', 'paper'].edge_index = torch.randint(0, 200, (2, 500))
data['paper', 'cites', 'paper'].edge_index = torch.randint(0, 100, (2, 300))
print(data)  # Shows all node/edge types

# Method 1: Auto-convert homogeneous model
model = GCN(...)
model = to_hetero(model, data.metadata(), aggr='sum')
out = model(data.x_dict, data.edge_index_dict)
# Method 2: Custom per-edge-type convolutions
class HeteroGNN(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = HeteroConv({
            ('paper', 'cites', 'paper'): GCNConv(-1, 64),
            ('author', 'writes', 'paper'): SAGEConv((-1, -1), 64),
        }, aggr='sum')

    def forward(self, x_dict, edge_index_dict):
        x_dict = self.conv1(x_dict, edge_index_dict)
        return {k: F.relu(v) for k, v in x_dict.items()}

6. Transforms & Preprocessing

from torch_geometric.transforms import (
    NormalizeFeatures, AddSelfLoops, ToUndirected,
    RandomNodeSplit, RandomLinkSplit, Compose,
    KNNGraph, RadiusGraph, AddLaplacianEigenvectorPE
)

# Single transform
dataset = Planetoid(root='/tmp/Cora', name='Cora', transform=NormalizeFeatures())

# Compose multiple transforms
transform = Compose([
    ToUndirected(),
    AddSelfLoops(),
    NormalizeFeatures(),
])

# Data splitting
node_split = RandomNodeSplit(num_val=0.1, num_test=0.2)
link_split = RandomLinkSplit(num_val=0.1, num_test=0.2, is_undirected=True)

# Point cloud → graph
pc_transform = Compose([KNNGraph(k=6), NormalizeFeatures()])

# Positional encodings (for Graph Transformers)
pe_transform = AddLaplacianEigenvectorPE(k=10)

Key Concepts

Layer Selection Guide

TaskLayerKey Feature
Baseline / general
GCNConv
Spectral, cached, edge_weight
Variable neighbor importance
GATConv
/
GATv2Conv
Multi-head attention
Large-scale inductive
SAGEConv
Sampling-friendly, mean/max/lstm aggr
Graph classification
GINConv
Maximally powerful WL-test
Long-range dependencies
TransformerConv
Graph transformer
Spectral filtering
ChebConv
Chebyshev polynomials, K hops
Rich edge features
NNConv
Edge NN processes edge_attr
Molecular / 3D structures
SchNet
,
DimeNet
Continuous filters, angles
Heterogeneous / multi-relation
RGCNConv
,
HGTConv
Multiple edge types
Point clouds
EdgeConv
,
PointNetConv
Dynamic graphs, local features
Deep GNNs (avoid oversmoothing)
APPNP
+
PairNorm
Separated propagation

Data Flow Architecture

  • edge_index:
    [2, num_edges]
    COO format. Row 0 = source, Row 1 = target
  • Mini-batch: Block-diagonal adjacency +
    batch
    vector mapping nodes → graphs. No padding
  • Neighbor sampling:
    NeighborLoader
    samples K-hop subgraphs per seed node. Output is directed, relabeled
  • Heterogeneous:
    x_dict
    (per-type features),
    edge_index_dict
    (per-relation edges),
    metadata()
    for schema

Aggregation Options

AggregationClassUse Case
Sum
SumAggregation
Counting-sensitive tasks
Mean
MeanAggregation
Degree-invariant
Max
MaxAggregation
Salient feature detection
Softmax
SoftmaxAggregation(learn=True)
Learnable attention
Multi
MultiAggregation(['mean','max','std'])
Combined signals

Common Workflows

1. Node Classification (Full Graph)

import torch
import torch.nn.functional as F
from torch_geometric.datasets import Planetoid
from torch_geometric.nn import GCNConv

dataset = Planetoid(root='/tmp/Cora', name='Cora')
data = dataset[0]

class GCN(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = GCNConv(dataset.num_features, 16)
        self.conv2 = GCNConv(16, dataset.num_classes)
    def forward(self, data):
        x = F.dropout(F.relu(self.conv1(data.x, data.edge_index)), p=0.5, training=self.training)
        return self.conv2(x, data.edge_index)

model = GCN()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)

# Training
for epoch in range(200):
    model.train(); optimizer.zero_grad()
    out = model(data)
    F.cross_entropy(out[data.train_mask], data.y[data.train_mask]).backward()
    optimizer.step()

# Evaluation
model.eval()
pred = model(data).argmax(dim=1)
acc = (pred[data.test_mask] == data.y[data.test_mask]).float().mean()
print(f'Test Accuracy: {acc:.4f}')

2. Graph Classification (Mini-Batch)

from torch_geometric.datasets import TUDataset
from torch_geometric.loader import DataLoader
from torch_geometric.nn import GCNConv, global_mean_pool

dataset = TUDataset(root='/tmp/ENZYMES', name='ENZYMES')
train_dataset = dataset[:int(0.8 * len(dataset))]
test_dataset = dataset[int(0.8 * len(dataset)):]
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)

class GraphNet(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = GCNConv(dataset.num_features, 64)
        self.conv2 = GCNConv(64, 64)
        self.lin = torch.nn.Linear(64, dataset.num_classes)
    def forward(self, data):
        x = F.relu(self.conv1(data.x, data.edge_index))
        x = F.relu(self.conv2(x, data.edge_index))
        x = global_mean_pool(x, data.batch)
        return self.lin(x)

model = GraphNet()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
for epoch in range(100):
    model.train()
    for batch in train_loader:
        optimizer.zero_grad()
        F.cross_entropy(model(batch), batch.y).backward()
        optimizer.step()

3. Large-Scale with Neighbor Sampling

from torch_geometric.loader import NeighborLoader

# Sample 25 1-hop and 10 2-hop neighbors per seed node
train_loader = NeighborLoader(
    data,
    num_neighbors=[25, 10],
    batch_size=128,
    input_nodes=data.train_mask,
)

model.train()
for batch in train_loader:
    optimizer.zero_grad()
    out = model(batch)
    # Only compute loss on seed nodes (first batch_size nodes)
    loss = F.cross_entropy(out[:batch.batch_size], batch.y[:batch.batch_size])
    loss.backward()
    optimizer.step()
# Note: output subgraphs are directed, indices relabeled 0..N-1

Key Parameters

ParameterModuleDefaultRangeEffect
in_channels
All Conv layersintInput feature dimension
out_channels
All Conv layersintOutput feature dimension
heads
GATConv11-16Number of attention heads
dropout
GATConv0.00-0.8Attention weight dropout
aggr
MessagePassing'add'add/mean/maxNeighbor aggregation
K
ChebConv2-5Chebyshev polynomial order
num_neighbors
NeighborLoaderlist[int]Neighbors per hop (e.g., [25,10])
batch_size
DataLoader16-512Graphs or seed nodes per batch
ratio
TopKPooling0.50.1-0.9Fraction of nodes to keep
lr
Adam1e-4 to 0.01Learning rate
weight_decay
Adam00 to 5e-3L2 regularization

Best Practices

  1. Start with GCNConv: Use 2-layer GCN as baseline before trying complex architectures
  2. Use lazy initialization: Pass
    -1
    as
    in_channels
    to infer dimensions automatically:
    GCNConv(-1, 64)
  3. Normalize features: Apply
    NormalizeFeatures()
    transform for citation/social networks
  4. Anti-pattern — too many layers: GNNs typically need only 2-3 layers. Deeper causes oversmoothing. Use
    JumpingKnowledge
    or
    PairNorm
    if you need depth
  5. GPU transfer: Move both model AND data to GPU:
    model.to(device)
    ,
    data.to(device)
  6. Anti-pattern — ignoring batch vector: In graph classification, always use
    global_mean_pool(x, batch)
    — forgetting
    batch
    pools across all graphs

Common Recipes

Recipe: Model Explainability (GNNExplainer)

from torch_geometric.explain import Explainer, GNNExplainer

explainer = Explainer(
    model=model,
    algorithm=GNNExplainer(epochs=200),
    explanation_type='model',
    node_mask_type='attributes',
    edge_mask_type='object',
    model_config=dict(mode='multiclass_classification', task_level='node', return_type='log_probs'),
)

explanation = explainer(data.x, data.edge_index, index=10)
print(f'Important edges: {explanation.edge_mask.topk(5).indices}')
print(f'Important features: {explanation.node_mask[10].topk(5).indices}')

Recipe: Custom InMemoryDataset

from torch_geometric.data import InMemoryDataset, Data

class MyDataset(InMemoryDataset):
    def __init__(self, root, transform=None, pre_transform=None):
        super().__init__(root, transform, pre_transform)
        self.load(self.processed_paths[0])

    @property
    def raw_file_names(self):
        return ['data.csv']

    @property
    def processed_file_names(self):
        return ['data.pt']

    def process(self):
        data_list = []
        # Build Data objects from raw files
        edge_index = torch.tensor([[0, 1], [1, 0]], dtype=torch.long)
        x = torch.randn(2, 16)
        data_list.append(Data(x=x, edge_index=edge_index, y=torch.tensor([0])))

        if self.pre_filter is not None:
            data_list = [d for d in data_list if self.pre_filter(d)]
        if self.pre_transform is not None:
            data_list = [self.pre_transform(d) for d in data_list]
        self.save(data_list, self.processed_paths[0])

Recipe: Deep GNN with JumpingKnowledge

from torch_geometric.nn import GCNConv, JumpingKnowledge, LayerNorm

class DeepGNN(torch.nn.Module):
    def __init__(self, in_ch, hidden, num_layers, out_ch):
        super().__init__()
        self.convs = torch.nn.ModuleList()
        self.norms = torch.nn.ModuleList()
        self.convs.append(GCNConv(in_ch, hidden))
        self.norms.append(LayerNorm(hidden))
        for _ in range(num_layers - 2):
            self.convs.append(GCNConv(hidden, hidden))
            self.norms.append(LayerNorm(hidden))
        self.convs.append(GCNConv(hidden, hidden))
        self.jk = JumpingKnowledge(mode='cat')
        self.lin = torch.nn.Linear(hidden * num_layers, out_ch)

    def forward(self, x, edge_index, batch):
        xs = []
        for conv, norm in zip(self.convs[:-1], self.norms):
            x = F.relu(norm(conv(x, edge_index)))
            xs.append(x)
        xs.append(self.convs[-1](x, edge_index))
        return self.lin(global_mean_pool(self.jk(xs), batch))

Troubleshooting

ProblemCauseSolution
edge_index
shape error
Wrong format (should be [2, E])Ensure COO format:
torch.tensor([[src...],[dst...]], dtype=torch.long)
OOM on large graphFull-graph forward passUse
NeighborLoader
for mini-batch training
Low accuracyOversmoothing (too many layers)Reduce to 2-3 layers, add
JumpingKnowledge
or
PairNorm
NaN in trainingExploding gradientsAdd gradient clipping, reduce learning rate, check feature scale
Wrong graph-level outputMissing
batch
in pooling
Pass
batch
tensor to
global_mean_pool(x, batch)
Heterogeneous type errorMismatched node/edge typesCheck
data.metadata()
matches model definition
Slow DataLoaderLarge graph, no samplingUse
NeighborLoader
with reasonable
num_neighbors
(e.g., [25,10])
x
dimension mismatch
Multi-head attention outputFor GATConv: output is
heads*out_channels
unless
concat=False
Import error for sparse opsMissing optional dependenciesInstall
torch_scatter
,
torch_sparse
from PyG wheels
Pre-transform not appliedDataset already processedDelete
processed/
directory and reload

Bundled Resources

  • references/layers_transforms_reference.md
    — Complete catalog of 40+ convolutional layers (GCN, GAT, SAGE, GIN, molecular layers, hypergraph), aggregation operators, pooling (global + hierarchical), normalization layers, pre-built models, auto-encoders, knowledge graph embeddings, utility layers. Transform catalog: structure, feature, spatial, augmentation, mesh, specialized. Consolidated from original layers_reference.md (486 lines) + transforms_reference.md (680 lines). Script functionality (benchmark_model.py, create_gnn_template.py, visualize_graph.py) covered by Core API code blocks and Common Recipes
  • references/datasets_catalog.md
    — Comprehensive dataset catalog organized by domain: citation networks (Planetoid, Coauthor, Amazon), graph classification (TUDataset 120+ benchmarks), molecular (QM9, ZINC, MoleculeNet), social (Reddit, Twitch), knowledge graphs (WordNet, FB15k), heterogeneous (OGB_MAG, MovieLens, DBLP), temporal (JODIE), 3D meshes (ShapeNet, ModelNet), OGB integration. Consolidated from original datasets_reference.md (575 lines)

Related Skills

  • matplotlib-scientific-plotting — Visualize graph structures, training curves, attention weights

References