Claude-skill-registry instance-resource-design
Guide for designing Instance resources in OptAIC. Use when creating DatasetInstance, SignalInstance, ExperimentInstance, ModelInstance, PortfolioOptimizerInstance, or BacktestInstance. Covers definition references, config patterns, composition, flow execution pairing, and scheduling.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/instance-resource-design" ~/.claude/skills/majiayu000-claude-skill-registry-instance-resource-design && rm -rf "$T"
skills/data/instance-resource-design/SKILL.mdInstance Resource Design Patterns
Guide for designing Instance resources that configure and execute Definition plugins.
When to Use
Apply when:
- Creating configured dataset/signal/model instances
- Designing composition patterns (Pipeline + Store + Accessor)
- Implementing scheduling and freshness tracking
- Pairing Flow Execution Resources with Instances
- Building special cases like BacktestInstance (no definition)
Core Concept: Configured Usage
Instances reference Definitions and provide runtime configuration:
Instance = Configured Usage ├── definition_resource_id # Which Definition to use ├── definition_version_id # Pinned version (optional) ├── config_json # Runtime configuration ├── schedule_json # Cron/refresh schedule ├── upstream_refs # Connected upstream resources └── flow_execution_handles # Prefect deployments, MLflow experiments
Instance ↔ Flow Pairing
Critical Concept: When an Instance is created, Flow Execution Resources are also created.
Flow Execution Resources are static Prefect deployments (or equivalent orchestration handles) that are:
- Created when Instance is created
- Paired 1:1 or 1:N with Instance (some Instances have multiple flows)
- Stored as handles in the Instance extension table
- The "execution capability" vs Runs which are "execution activities"
DatasetInstance creation: ├── Create Resource record ├── Create extension table record ├── Create Prefect deployment for refresh flow └── Store deployment_id in instance.prefect_deployment_id
See references/flow-pairing.md.
Instance Types
| Type | Parent | Definition Ref | Flow Count | Notes |
|---|---|---|---|---|
| Project | PipelineDef + StoreDef + AccessorDef | 1 | refresh_flow |
| Project | Inherits from DatasetInstance | 1 | Promoted dataset |
| Project | OpDef/OpMacroDef | 1 | preview_flow |
| Project | MLModuleDef | 3 | train/infer/monitor |
| Project | PortfolioOptimizerDef | 1 | optimize_flow |
| Project | None | 1 | Fixed procedure |
Multi-Flow Instances
Some Instance types have multiple Flow Execution Resources:
ModelInstance: ├── training_flow → TrainingRun activities ├── inference_flow → InferenceRun activities └── monitoring_flow → MonitoringRun activities Instance Extension Table: ├── prefect_training_deployment_id ├── prefect_inference_deployment_id ├── prefect_monitoring_deployment_id ├── mlflow_experiment_id (training tracking) ├── mlflow_registered_model_name (after promotion) └── evidently_project_id (monitoring dashboard)
Lineage is Flow-to-Flow
Dependencies track flow statuses, not instance relationships:
DatasetInstance.refresh_flow ↓ depends on UpstreamDataset.refresh_flow status = READY
Lineage checking uses
check_upstream_freshness() to verify all upstream
flow statuses before executing a downstream flow.
Status Aggregation
Instance status aggregates from its Flow(s):
# Single-flow Instance (DatasetInstance) instance.status = flow.status # Multi-flow Instance (ModelInstance) instance.status = aggregate([ training_flow.status, inference_flow.status, monitoring_flow.status, ]) # Uses min-severity: READY only if ALL flows are READY
Definition specifies the
status_aggregation_contract:
{ "status_aggregation_contract": { "aggregation_method": "min_severity", "status_priority": ["ERROR", "STALE", "RUNNING", "READY"] } }
Composition Pattern
DatasetInstance composes multiple definitions:
DatasetInstance ├── pipeline_instance_id → PipelineInstance → PipelineDef ├── store_instance_id → StoreInstance → StoreDef └── accessor_instance_id → AccessorInstance → AccessorDef
See references/composition.md.
Config Structure
instance_metadata = { "definition_resource_id": "uuid", "definition_version_id": "uuid (optional)", "config_json": { "symbols": ["AAPL", "MSFT", "GOOGL"], "start_date": "2020-01-01", "lookback_days": 252 }, "schedule_json": { "type": "cron", "expression": "0 6 * * 1-5", "timezone": "America/New_York" }, "upstream_refs": [ {"resource_id": "uuid", "role": "input"}, {"resource_id": "uuid", "role": "covariance"} ] }
Special Case: BacktestInstance
BacktestInstance has no Definition - the backtest procedure is fixed:
backtest_instance = { "type": "BacktestInstance", "name": "Q1_2024_Backtest", "metadata_json": { # No definition_resource_id "assets_json": { "universe": ["SPY", "QQQ", "IWM"], "benchmark": "SPY" }, "signals_json": { "primary": "uuid-of-signal-instance", "secondary": ["uuid-1", "uuid-2"] }, "date_range_json": { "start": "2024-01-01", "end": "2024-03-31" }, "config_json": { "rebalance_frequency": "daily", "transaction_costs": 0.001, "slippage_model": "linear" } } }
Implementation Checklist
- Reference parent Definition via
definition_resource_id - Pin version if reproducibility needed (
)definition_version_id - Design
matching Definition'sconfig_jsonparameters_schema - Track
for lineageupstream_refs - Add freshness tracking fields if scheduled
- Create extension table in
libs/db/models/ - Create Flow Execution Resources on Instance creation
- Create Prefect deployment(s) for each flow type
- Store deployment IDs in extension table
- Register with external systems (MLflow, EvidentlyAI)
- Implement status aggregation if multi-flow Instance
- Set up real-time subscriptions via Centrifugo
Reference Files
- Composition - Dataset composition pattern
- Examples - Complete Instance examples
- Scheduling - Schedule configuration
- Flow Pairing - Flow Execution Resource pairing