Skillsbench gamma-phase-associator

An overview of the python package for running the GaMMA earthquake phase association algorithm. The algorithm expects phase picks data and station data as input and produces (through unsupervised clustering) earthquake events with source information like earthquake location, origin time and magnitude. The skill explains commonly used functions and the expected input/output format.

install
source · Clone the upstream repo
git clone https://github.com/benchflow-ai/skillsbench
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/benchflow-ai/skillsbench "$T" && mkdir -p ~/.claude/skills && cp -r "$T/tasks/earthquake-phase-association/environment/skills/gamma-phase-associator" ~/.claude/skills/benchflow-ai-skillsbench-gamma-phase-associator && rm -rf "$T"
manifest: tasks/earthquake-phase-association/environment/skills/gamma-phase-associator/SKILL.md
source content

GaMMA Associator Library

What is GaMMA?

GaMMA is an earthquake phase association algorithm that treats association as an unsupervised clustering problem. It uses multivariate Gaussian distribution to model the collection of phase picks of an event, and uses Expectation-Maximization to carry out pick assignment and estimate source parameters i.e., earthquake location, origin time, and magnitude.

GaMMA
is a python library implementing the algorithm. For the input earthquake traces, this library assumes P/S wave picks have already been extracted. We provide documentation of its core API.

Zhu, W., McBrearty, I. W., Mousavi, S. M., Ellsworth, W. L., & Beroza, G. C. (2022). Earthquake phase association using a Bayesian Gaussian mixture model. Journal of Geophysical Research: Solid Earth, 127(5).

The skill is a derivative of the repo https://github.com/AI4EPS/GaMMA

Installing GaMMA

pip install git+https://github.com/wayneweiqiang/GaMMA.git

GaMMA core API

association

Function Signature

def association(picks, stations, config, event_idx0=0, method="BGMM", **kwargs)

Purpose

Associates seismic phase picks (P and S waves) to earthquake events using Bayesian or standard Gaussian Mixture Models. It clusters picks based on arrival time and amplitude information, then fits GMMs to estimate earthquake locations, times, and magnitudes.

1. Input Parameters

ParameterTypeDefaultDescription
picks
DataFramerequiredSeismic phase pick data
stations
DataFramerequiredStation metadata with locations
config
dictrequiredConfiguration parameters
event_idx0
int
0
Starting event index for numbering
method
str
"BGMM"
"BGMM"
(Bayesian) or
"GMM"
(standard)

2. Required DataFrame Columns

picks
DataFrame
ColumnTypeDescriptionExample
id
strStation identifier (must match
stations
)
network.station.
or
network.station.location.channel
timestamp
datetime/strPick arrival time (ISO format or datetime)
"2019-07-04T22:00:06.084"
type
strPhase type:
"p"
or
"s"
(lowercase)
"p"
prob
floatPick probability/weight (0-1)
0.94
amp
floatAmplitude in m/s (required if
use_amplitude=True
)
0.000017

Notes:

  • Timestamps must be in UTC or converted to UTC
  • Phase types are forced to lowercase internally
  • Picks with
    amp == 0
    or
    amp == -1
    are filtered when
    use_amplitude=True
  • The DataFrame index is used to track pick identities in the output
stations
DataFrame
ColumnTypeDescriptionExample
id
strStation identifier
"CI.CCC..BH"
x(km)
floatX coordinate in km (projected)
-35.6
y(km)
floatY coordinate in km (projected)
45.2
z(km)
floatZ coordinate (elevation, typically negative)
-0.67

Notes:

  • Coordinates should be in a projected local coordinate system (e.g., you can use the
    pyproj
    package)
  • The
    id
    column must match the
    id
    values in the picks DataFrame (e.g.,
    network.station.
    or
    network.station.location.channel
    )
  • Group stations by unique
    id
    , identical attribute are collapsed to a single value and conflicting metadata are preseved as a sorted list.

3. Config Dictionary Keys

Required Keys
KeyTypeDescriptionExample
dims
list[str]Location dimensions to solve for
["x(km)", "y(km)", "z(km)"]
min_picks_per_eq
intMinimum picks required per earthquake
5
max_sigma11
floatMaximum allowed time residual in seconds
2.0
use_amplitude
boolWhether to use amplitude in clustering
True
bfgs_bounds
tupleBounds for BFGS optimization
((-35, 92), (-128, 78), (0, 21), (None, None))
oversample_factor
floatFactor for oversampling initial GMM components
5.0
for
BGMM
,
1.0
for
GMM

Notes on

dims
:

  • Options:
    ["x(km)", "y(km)", "z(km)"]
    ,
    ["x(km)", "y(km)"]
    , or
    ["x(km)"]

Notes on

bfgs_bounds
:

  • Format:
    ((x_min, x_max), (y_min, y_max), (z_min, z_max), (None, None))
  • The last tuple is for time (unbounded)
Velocity Model Keys
KeyTypeDefaultDescription
vel
dict
{"p": 6.0, "s": 3.47}
Uniform velocity model (km/s)
eikonal
dict/None
None
1D velocity model for travel times
DBSCAN Pre-clustering Keys (Optional)
KeyTypeDefaultDescription
use_dbscan
bool
True
Enable DBSCAN pre-clustering
dbscan_eps
float
25
Max time between picks (seconds)
dbscan_min_samples
int
3
Min samples in DBSCAN neighborhood
dbscan_min_cluster_size
int
500
Min cluster size for hierarchical splitting
dbscan_max_time_space_ratio
float
10
Max time/space ratio for splitting
  • dbscan_eps
    is obtained from
    estimate_eps
    Function
Filtering Keys (Optional)

| Key | Type | Default | Description | |-----|------|-------------| |

max_sigma22
| float |
1.0
| Max phase amplitude residual in log scale (required if
use_amplitude=True
) | |
max_sigma12
| float |
1.0
| Max covariance | |
max_sigma11
| float |
2.0
| Max phase time residual (s) | |
min_p_picks_per_eq
| int |
0
| Min P-phase picks per event | |
min_s_picks_per_eq
| int |
0
|Min S-phase picks per event | |
min_stations
| int |
5
|Min unique stations per event |

Other Optional Keys
KeyTypeDefaultDescription
covariance_prior
list[float]autoPrior for covariance
[time, amp]
ncpu
intautoNumber of CPUs for parallel processing

4. Return Values

Returns a tuple

(events, assignments)
:

events
(list[dict])

List of dictionaries, each representing an associated earthquake:

KeyTypeDescription
time
strOrigin time (ISO 8601 with milliseconds)
magnitude
floatEstimated magnitude (999 if
use_amplitude=False
)
sigma_time
floatTime uncertainty (seconds)
sigma_amp
floatAmplitude uncertainty (log10 scale)
cov_time_amp
floatTime-amplitude covariance
gamma_score
floatAssociation quality score
num_picks
intTotal picks assigned
num_p_picks
intP-phase picks assigned
num_s_picks
intS-phase picks assigned
event_index
intUnique event index
x(km)
floatX coordinate of hypocenter
y(km)
floatY coordinate of hypocenter
z(km)
floatZ coordinate (depth)
assignments
(list[tuple])

List of tuples

(pick_index, event_index, gamma_score)
:

  • pick_index
    : Index in the original
    picks
    DataFrame
  • event_index
    : Associated event index
  • gamma_score
    : Probability/confidence of assignment

estimate_eps
Function Documentation

Function Signature

def estimate_eps(stations, vp, sigma=2.0)

Purpose

Estimates an appropriate DBSCAN epsilon (eps) parameter for clustering seismic phase picks based on station spacing. The eps parameter controls the maximum time distance between picks that should be considered neighbors in the DBSCAN clustering algorithm.

1. Input Parameters

ParameterTypeDefaultDescription
stations
DataFramerequiredStation metadata with 3D coordinates
vp
floatrequiredP-wave velocity in km/s
sigma
float
2.0
Number of standard deviations above the mean

2. Required DataFrame Columns

stations
DataFrame
ColumnTypeDescriptionExample
x(km)
floatX coordinate in km
-35.6
y(km)
floatY coordinate in km
45.2
z(km)
floatZ coordinate in km
-0.67

3. Return Value

TypeDescription
floatEpsilon value in seconds for use with DBSCAN clustering

4. Example Usage

from gamma.utils import estimate_eps

# Assuming stations DataFrame is already prepared with x(km), y(km), z(km) columns
vp = 6.0  # P-wave velocity in km/s

# Estimate eps automatically based on station spacing
eps = estimate_eps(stations, vp, sigma=2.0)

# Use in config
config = {
    "use_dbscan": True,
    "dbscan_eps": eps,  # or use estimate_eps(stations, config["vel"]["p"])
    "dbscan_min_samples": 3,
    # ... other config options
}
Typical Usage Pattern
from gamma.utils import association, estimate_eps

# Automatic eps estimation
config["dbscan_eps"] = estimate_eps(stations, config["vel"]["p"])

# Or manual override (common in practice)
config["dbscan_eps"] = 15  # seconds

5. Practical Notes

  • In example notebooks, the function is often commented out in favor of hardcoded values (10-15 seconds)
  • Practitioners may prefer manual tuning for specific networks/regions
  • Typical output values range from 10-20 seconds depending on station density
  • Useful when optimal eps is unknown or when working with new networks

6. Related Configuration

The output is typically used with these config parameters:

config["dbscan_eps"] = estimate_eps(stations, config["vel"]["p"])
config["dbscan_min_samples"] = 3
config["dbscan_min_cluster_size"] = 500
config["dbscan_max_time_space_ratio"] = 10