Agent-almanac simulate-stochastic-process
git clone https://github.com/pjt222/agent-almanac
T=$(mktemp -d) && git clone --depth=1 https://github.com/pjt222/agent-almanac "$T" && mkdir -p ~/.claude/skills && cp -r "$T/i18n/es/skills/simulate-stochastic-process" ~/.claude/skills/pjt222-agent-almanac-simulate-stochastic-process-0ed808 && rm -rf "$T"
i18n/es/skills/simulate-stochastic-process/SKILL.mdSimulate Stochastic Process
Simulate sample paths from stochastic processes -- including discrete Markov chains, continuous-time processes, stochastic differential equations, and MCMC samplers -- with convergence diagnostics, variance reduction techniques, and trajectory visualization.
Cuándo Usar
- You need to generate sample paths from a stochastic process for estimation, prediction, or visualization
- Analytical solutions are intractable and simulation is the only feasible approach
- You are running Monte Carlo estimation and need convergence guarantees and uncertainty quantification
- You want to validate analytical results (stationary distributions, hitting times) against empirical simulation
- You need to sample from a complex posterior distribution using MCMC
- You are prototyping a stochastic model before committing to full analytical treatment
Entradas
Required
| Input | Type | Description |
|---|---|---|
| string | Type of process: , , , , , |
| dict | Process-specific parameters (transition matrix, drift/diffusion coefficients, target density, etc.) |
| integer | Number of independent sample paths to simulate |
| integer | Number of time steps per path (or total MCMC iterations) |
Optional
| Input | Type | Default | Description |
|---|---|---|---|
| scalar/vector | process-specific | Starting state or distribution for each path |
| float | 0.01 | Time step size for continuous-time discretization |
| integer | random | Random seed for reproducibility |
| integer | | Number of initial steps to discard (MCMC) |
| integer | 1 | Keep every k-th sample to reduce autocorrelation |
| string | | Method: , , , |
| callable | none | Function to evaluate along paths for Monte Carlo estimation |
Procedimiento
Paso 1: Define Process Model and Parameters
1.1. Identify the process type and gather all required parameters:
- DTMC: Transition matrix
and state space. ValidateP
is row-stochastic.P - CTMC: Rate matrix
. Validate rows sum to 0 and off-diagonal entries are non-negative.Q - Random walk: Step distribution (e.g.,
with equal probability), boundaries if any.{-1, +1} - Brownian motion: Drift
, volatilitymu
, dimensionsigma
.d - SDE (Ito): Drift function
, diffusion functiona(x,t)
.b(x,t) - MCMC: Target log-density, proposal mechanism (random walk Metropolis, Hamiltonian, Gibbs components).
1.2. Validate parameter consistency:
- Matrix dimensions match state space size.
- SDE coefficients satisfy growth and Lipschitz conditions (at least informally) for the chosen solver.
- MCMC proposal is well-defined for the support of the target distribution.
1.3. Set the random seed for reproducibility.
Esperado: A fully specified stochastic model with validated parameters and a reproducible random state.
En caso de fallo: If parameters are inconsistent (e.g., non-stochastic matrix), correct them before proceeding. If SDE coefficients are pathological, consider a different discretization scheme.
Paso 2: Select Simulation Method
2.1. Choose the appropriate algorithm based on process type:
| Process | Method | Key Property |
|---|---|---|
| DTMC | Direct sampling from transition row | Exact |
| CTMC | Gillespie algorithm (SSA) | Exact, event-driven |
| CTMC (approx.) | Tau-leaping | Approximate, faster for high rates |
| Random walk | Direct sampling of increments | Exact |
| Brownian motion | Cumulative sum of Gaussian increments | Exact for fixed |
| SDE (general) | Euler-Maruyama | Order 0.5 strong, order 1.0 weak |
| SDE (higher order) | Milstein | Order 1.0 strong (scalar noise) |
| SDE (stiff) | Implicit Euler-Maruyama | Stable for stiff drift |
| MCMC (general) | Metropolis-Hastings | Asymptotically exact |
| MCMC (gradient) | Hamiltonian Monte Carlo (HMC) | Better mixing for high dimensions |
| MCMC (conditional) | Gibbs sampler | Exact conditionals when available |
2.2. For SDE methods, choose
dt small enough for numerical stability. A useful heuristic: start with dt = 0.01 and halve it until results stabilize.
2.3. For MCMC, tune the proposal scale to achieve an acceptance rate of approximately:
- 23.4% for high-dimensional random walk Metropolis
- 57.4% for one-dimensional targets
- 65-90% for HMC (depends on trajectory length)
2.4. If variance reduction is requested, configure it:
- Antithetic variates: For each path with random increments
, also simulate withZ
.-Z - Stratified sampling: Partition the probability space and sample within each stratum.
- Control variates: Identify a correlated quantity with known expectation to reduce variance.
Esperado: A selected simulation algorithm matched to the process type with appropriate tuning parameters.
En caso de fallo: If the chosen method is unstable (e.g., Euler-Maruyama diverging), switch to an implicit method or reduce
dt.
Paso 3: Implement and Run Simulation
3.1. Allocate storage for
n_paths trajectories, each of length n_steps (or dynamically for event-driven methods like Gillespie).
3.2. For each path
i = 1, ..., n_paths:
DTMC / Random Walk:
- Set
x[0] = initial_state - For
: samplet = 1, ..., n_steps
from the transition distribution givenx[t]x[t-1]
CTMC (Gillespie):
- Set
,x[0] = initial_statetime = 0 - While
:time < T_max- Compute total rate
lambda = -Q[x, x] - Sample holding time
tau ~ Exp(lambda) - Sample next state from transition probabilities
forQ[x, j] / lambdaj != x - Update
, record transitiontime += tau
- Compute total rate
SDE (Euler-Maruyama):
- Set
x[0] = initial_state - For
:t = 1, ..., n_steps
(Wiener increment)dW = sqrt(dt) * N(0, I)x[t] = x[t-1] + a(x[t-1], t*dt) * dt + b(x[t-1], t*dt) * dW
MCMC (Metropolis-Hastings):
- Set
x[0] = initial_state - For
:t = 1, ..., n_steps- Propose
x' ~ q(x' | x[t-1]) - Compute acceptance ratio
alpha = min(1, p(x') * q(x[t-1]|x') / (p(x[t-1]) * q(x'|x[t-1]))) - Accept with probability
:alpha
if accepted, elsex[t] = x'x[t] = x[t-1] - Record acceptance decision
- Propose
3.3. If
target_function is provided, evaluate it at each state along each path and store the values.
3.4. Apply thinning: keep every
thinning-th sample.
3.5. Discard
burn_in samples from the beginning of each path (primarily for MCMC).
Esperado:
n_paths complete trajectories stored in memory, with optional function evaluations. MCMC acceptance rate is within the target range.
En caso de fallo: If simulation produces NaN or Inf values, reduce
dt for SDE methods or check parameter validity. If MCMC acceptance rate is near 0% or 100%, adjust proposal scale.
Paso 4: Apply Convergence Diagnostics
4.1. Trace plots: Plot the value of each component over time for a subset of paths. Visual inspection for stationarity (no trends, stable variance).
4.2. Gelman-Rubin diagnostic (R-hat): For MCMC with multiple chains:
- Compute within-chain variance
and between-chain varianceW
.B R_hat = sqrt((n-1)/n + B/(n*W))- Convergence indicated by
(strict) orR_hat < 1.01
(lenient).R_hat < 1.1
4.3. Effective sample size (ESS):
- Estimate autocorrelation at increasing lags.
ESS = n_samples / (1 + 2 * sum(autocorrelations))- Rule of thumb:
for reliable posterior summaries.ESS > 400
4.4. Geweke diagnostic: Compare the mean of the first 10% and last 50% of each chain. The z-score should be within [-2, 2] for convergence.
4.5. For non-MCMC processes: Verify that time-averaged statistics (mean, variance) stabilize as path length increases. Plot running averages.
4.6. Report a summary table:
| Diagnostic | Value | Threshold | Status |
|---|---|---|---|
| R-hat (max) | ... | < 1.01 | ... |
| ESS (min) | ... | > 400 | ... |
| Geweke z (max abs) | ... | < 2.0 | ... |
| Acceptance rate | ... | 0.15-0.50 | ... |
Esperado: All convergence diagnostics pass their thresholds. Trace plots show stable, well-mixing chains.
En caso de fallo: If R-hat > 1.1, run longer chains or improve the proposal. If ESS is very low, increase thinning or switch to a better sampler (e.g., HMC). If Geweke fails, extend burn-in.
Paso 5: Compute Summary Statistics with Confidence Intervals
5.1. For each quantity of interest (state occupancy, function expectation, hitting times):
- Compute the point estimate as the sample mean across paths (after burn-in and thinning).
- Compute the standard error using the effective sample size:
.SE = SD / sqrt(ESS)
5.2. Construct confidence intervals:
- Normal approximation:
estimate +/- z_{alpha/2} * SE - For skewed distributions, use percentile bootstrap or batch means.
5.3. If variance reduction was applied, compute the variance reduction factor:
VRF = Var(naive estimator) / Var(reduced estimator)- Report the effective speedup.
5.4. For Monte Carlo integration estimates:
- Report the estimate, standard error, 95% CI, ESS, and number of function evaluations.
5.5. For distribution estimates:
- Compute empirical quantiles (median, 2.5th, 97.5th percentiles).
- Kernel density estimates for continuous quantities.
5.6. Tabulate all summary statistics with their uncertainties.
Esperado: Point estimates with associated standard errors and confidence intervals. Variance reduction (if applied) yields a VRF > 1.
En caso de fallo: If confidence intervals are too wide, increase
n_paths or n_steps. If variance reduction worsens estimates (VRF < 1), disable it -- the control variate or antithetic scheme may not suit the problem.
Paso 6: Visualize Trajectories and Distributions
6.1. Trajectory plots: Plot a representative subset of sample paths (5-20 paths) over time. Use transparency for overlapping paths.
6.2. Ensemble statistics: Overlay the mean trajectory and pointwise 95% confidence bands across all paths.
6.3. Marginal distributions: At selected time points, plot histograms or density estimates of the state distribution across paths.
6.4. Stationary distribution comparison: If an analytical stationary distribution is available, overlay it on the empirical histogram from the final time slice.
6.5. Autocorrelation plots: For MCMC, plot the autocorrelation function (ACF) for each component up to a reasonable lag.
6.6. Diagnostic dashboard: Combine trace plots, ACF plots, running mean plots, and marginal densities into a single multi-panel figure for comprehensive assessment.
6.7. Save all figures in both vector (PDF/SVG) and raster (PNG) formats for documentation.
Esperado: Publication-quality figures showing trajectory behavior, distributional convergence, and diagnostic summaries. Analytical solutions (where available) match empirical results.
En caso de fallo: If visualizations reveal non-stationarity or multimodality not expected from the model, revisit Steps 1-2 for parameter or method errors. If plots are cluttered, reduce the number of displayed paths or increase figure size.
Validación
- All simulated trajectories remain in the valid state space (no out-of-bounds values, no NaN/Inf)
- For DTMC/CTMC: empirical stationary distribution converges to the analytical one (within expected Monte Carlo error)
- For SDE: halving
does not qualitatively change the results (convergence order check)dt - For MCMC: R-hat < 1.01, ESS > 400, Geweke z-scores within [-2, 2]
- Confidence interval widths decrease proportionally to
(central limit theorem)1/sqrt(n_paths) - Variance reduction techniques yield VRF > 1 (estimates improve, not worsen)
- Reproducibility: re-running with the same seed produces identical results
Errores Comunes
- Insufficient burn-in for MCMC: Starting from a poor initial state requires a long burn-in before samples represent the target distribution. Always inspect trace plots and use convergence diagnostics rather than guessing the burn-in length.
- Euler-Maruyama instability for stiff SDEs: If the drift term has large gradients, explicit Euler-Maruyama can diverge. Switch to implicit methods or use adaptive step sizing.
- Confusing strong and weak convergence for SDEs: Strong convergence measures pathwise error (important for individual trajectories); weak convergence measures distributional error (sufficient for expectations). Euler-Maruyama has weak order 1.0 but strong order 0.5.
- Pseudorandom number generator quality: For very long simulations, low-quality RNGs can produce correlated samples. Use a well-tested generator (Mersenne Twister, PCG, or Xoshiro) and verify independence.
- Ignoring autocorrelation in MCMC: Treating autocorrelated MCMC samples as independent underestimates uncertainty. Always use effective sample size, not raw sample count, for standard errors.
- Antithetic variates for non-monotone functions: Antithetic sampling reduces variance only when the estimand is a monotone function of the underlying uniforms. For non-monotone functions, it can increase variance.
- Memory for large simulations: Storing all time steps of many long paths can exhaust memory. Use online statistics (running mean, variance) when full trajectories are not needed for visualization.
Habilidades Relacionadas
- Model Markov Chain -- provides the transition matrices and analytical solutions that simulation validates
- Fit Hidden Markov Model -- simulation from fitted HMMs enables posterior predictive checking and synthetic data generation