Mc-agent-toolkit monte-carlo-monitoring-advisor

Analyze data coverage, create monitors for warehouse tables and AI agents. Covers coverage gaps, use-case analysis, data monitor creation, and agent observability.

install

source · Clone the upstream repo

git clone https://github.com/monte-carlo-data/mc-agent-toolkit

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/monte-carlo-data/mc-agent-toolkit "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/monitoring-advisor" ~/.claude/skills/monte-carlo-data-mc-agent-toolkit-monte-carlo-monitoring-advisor && rm -rf "$T"

manifest: skills/monitoring-advisor/SKILL.md

source content

Monte Carlo Monitoring Advisor Skill

This skill handles all monitoring requests -- coverage analysis, data monitor creation, and AI agent monitoring. It routes to the right reference file based on the user's intent.

Reference files live next to this skill file. Use the Read tool (not MCP resources) to access them:

Data monitor creation procedure:
```
references/data-monitor-creation.md
```
(relative to this file)
Agent monitor creation procedure:
```
references/agent-monitor-creation.md
```
(relative to this file)
Per-type references:
```
references/data-*.md
```
and
```
references/agent-*.md
```
(relative to this file)

When to activate this skill

Activate when the user:

Asks about monitoring coverage, data coverage, or coverage gaps
Wants to understand what's monitored vs. not in their warehouse
Asks about use cases, use-case criticality, or use-case analysis
Wants to explore their data estate and find what needs monitoring
Says things like "what should I monitor?", "where are my coverage gaps?", "show me my use cases"
Asks about unmonitored tables with anomalies or importance-based prioritization
Asks to create, add, or set up a monitor (e.g. "add a monitor for...", "create a freshness check on...", "set up validation for...")
Mentions monitoring a specific table, field, or metric
Wants to check data quality rules or enforce data contracts
Asks about monitoring options for a table or dataset
Requests monitors-as-code YAML generation
Wants to add monitoring after new transformation logic (when the prevent skill is not active)
Asks about monitoring AI agents, agent latency, agent token usage, or agent quality
Wants to set up alerts on agent behavior or execution patterns
Asks about investigating agent traces or conversations
Says things like "monitor my agent", "track agent latency", "alert on agent errors"
Asks about agent evaluation monitors, trajectory monitors, or validation monitors
Mentions agent observability or agent monitoring

When NOT to activate this skill

Do not activate when the user is:

Just querying data or exploring table contents
Triaging or responding to active alerts (use the prevent skill's Workflow 3)
Running impact assessments before code changes (use the prevent skill's Workflow 4)
Asking about existing monitor configuration (use
```
get_monitors
```
directly)
Editing or deleting existing monitors

Prerequisites

Required: Monte Carlo MCP server (
```
monte-carlo-mcp
```
) must be configured and authenticated
Optional: A database MCP server (Snowflake, BigQuery, Redshift, Databricks) for SQL profiling of table usage patterns

Available MCP tools

All tools are available via the

monte-carlo

MCP server.

Coverage and discovery tools

Tool	Purpose
`get_warehouses`	List accessible warehouses (needed first -- `get_use_cases` requires `warehouse_id` )
`get_use_cases`	List use cases with criticality, descriptions, table counts, precomputed tag names
`get_use_case_table_summary`	Criticality distribution (HIGH/MEDIUM/LOW table counts) for a use case
`get_use_case_tables`	Paginated tables with criticality, golden-table status, MCONs
`get_monitors`	Check monitoring status on specific tables via `mcons` filter
`get_asset_lineage`	Upstream/downstream dependencies for tables (takes MCONs + direction)
`get_audiences`	List notification audiences
`get_unmonitored_tables_with_anomalies`	Tables with muted OOTB anomalies but no monitors (takes ISO 8601 time range)
`search`	Find tables by name; supports `is_monitored` filter
`get_table`	Table details, fields, stats, domain membership
`get_queries_for_table`	Query logs for a table (source/destination)
`get_field_metric_definitions`	Available metrics per field type for a warehouse
`get_domains`	List Monte Carlo domains
`get_validation_predicates`	Available validation rule types

Data monitor creation tools

Tool	Purpose
`create_table_monitor_mac`	Generate table monitor YAML (dry-run)
`create_metric_monitor_mac`	Generate metric monitor YAML (dry-run)
`create_validation_monitor_mac`	Generate validation monitor YAML (dry-run)
`create_custom_sql_monitor_mac`	Generate custom SQL monitor YAML (dry-run)
`create_comparison_monitor_mac`	Generate comparison monitor YAML (dry-run)

Agent monitoring tools

Tool	Purpose
`get_agent_metadata`	List AI agents -- returns agent names, trace table MCONs, source types
`get_agent_conversation`	Retrieve recent LLM interactions/conversations for an agent
`get_agent_trace`	Inspect execution traces and span trees
`create_agent_metric_monitor`	Create monitors for quantitative span-level metrics
`create_agent_evaluation_monitor`	Create monitors for LLM-evaluated quality metrics
`create_agent_trajectory`	Create trajectory monitors for execution pattern alerts
`create_agent_validation`	Create validation monitors for logical assertions

Routing

When the user's request comes in, determine which workflow to follow:

User intent	Workflow
Coverage analysis, use-case exploration, "what should I monitor?"	Coverage workflow (below)
Create a specific data monitor for a known table	Read `references/data-monitor-creation.md` and follow its procedure
Monitor AI agents, agent latency, agent quality, agent traces	Read `references/agent-monitor-creation.md` and follow its procedure
Coverage analysis leads to monitor creation	Complete coverage workflow, then read `references/data-monitor-creation.md` for creation

When reading reference files, always use the Read tool with the path relative to this skill file.

Coverage workflow

This is the primary flow when the user asks about monitoring coverage, coverage gaps, or what to monitor.

Step 1: Discover warehouses

Call

get_warehouses

to list all accessible warehouses.

If one warehouse: select it automatically, proceed to Step 2.
If multiple warehouses: present warehouse names (never UUIDs) and ask the user which one to explore.

Step 2: Discover use cases

Call

get_use_cases(warehouse_id=<selected>)

to discover use cases for the chosen warehouse.

If use cases exist --> proceed to the Use-case exploration (below).
If no use cases --> proceed to the Importance-based fallback (below).

Step 3: Check for database MCP (optional)

Check if the user has a database MCP server available by looking for tools containing

snowflake

bigquery

redshift

, or

databricks

in the tool list. If found, note it for the SQL profiling step later. If not found, skip SQL profiling gracefully.

Use-case exploration

This is the primary flow when use cases are defined.

Present use cases

Sort by criticality: HIGH before MEDIUM before LOW.
For each use case, show the description and explain the reasoning for its criticality level so the user understands why it matters.
Call
```
get_use_case_tables
```
with
```
golden_tables_only=true
```
and mention specific golden-table names as concrete examples. Golden tables are the last layer in the warehouse -- they feed ML models, dashboards, and reports. Explain this when relevant.
Use
```
get_asset_lineage
```
to explain how tables in a use case are connected and why certain tables are important (e.g. a golden table with many upstream dependencies).

Analyze coverage

Call
```
get_use_case_table_summary
```
to show how many tables exist at each criticality level (HIGH / MEDIUM / LOW) for the use case.
Call
```
get_use_case_tables
```
to obtain table MCONs, then call
```
get_monitors(mcons=[...])
```
to report how many are already monitored vs. not.
Ask the user which criticality scope they prefer:
- HIGH only -- monitor only the most critical tables
- MEDIUM + HIGH -- broader coverage
- ALL -- full coverage including LOW-criticality tables
You may suggest covering multiple use cases in one session.

Identify coverage gaps with anomaly data

Use

get_unmonitored_tables_with_anomalies

to discover tables that are not monitored but already have muted out-of-the-box anomalies. This reveals real coverage gaps -- places where Monte Carlo detected data issues but no monitor was configured to alert anyone.

Call it with a recent time window (e.g. last 7-30 days) using ISO 8601 timestamps.
Results are ranked by importance score -- the most critical gaps appear first.
Each result includes a sample of anomaly events showing what types of issues were detected (freshness, volume, schema changes).
Use this to prioritize which unmonitored tables to cover first -- a table with recent anomalies is a stronger candidate than one with no activity.
Cross-reference with use-case data: if an unmonitored table with anomalies belongs to a critical use case, escalate its priority.

Importance-based fallback

When no use cases are defined, fall back to importance-based table discovery.

Find unmonitored tables: Use
```
search(query="", is_monitored=false)
```
to find unmonitored tables sorted by importance.
Find tables with anomalies: Use
```
get_unmonitored_tables_with_anomalies
```
with a recent time window (last 14-30 days) to find tables with recent anomalies but no monitors.
Inspect top candidates: Use
```
get_table
```
to check table details, fields, and stats for the most important unmonitored tables.
Understand criticality via lineage: Use
```
get_asset_lineage
```
to understand which tables are most connected -- tables with many downstream dependencies are higher priority.
Prioritize: Rank candidates by importance score and anomaly activity. Present the top candidates to the user with reasoning.

SQL profiling (optional)

If a database MCP server was detected in Step 3 of the coverage workflow:

Call
```
get_queries_for_table
```
to see recent query patterns on candidate tables.
Use the database MCP tools (e.g.
```
snowflake_query
```
,
```
bigquery_query
```
) to profile table usage -- identify which tables are queried most frequently, which columns are used in JOINs and WHERE clauses.
Use this information to refine monitor suggestions -- heavily-queried tables with no monitors are high-priority gaps.

If no database MCP is available, skip this step entirely. Do not ask the user to configure one.

Pre-creation context (coverage-driven)

When coverage analysis leads to monitor creation, gather this context before reading the creation reference file:

Call
```
get_audiences
```
to list available notification audiences. Ask the user which audience they want notifications sent to.
Ask whether the monitor should be created as a DRAFT or active.
When passing
```
audiences
```
or
```
failure_audiences
```
, use the audience name/label (not UUID).

Use-case tag monitors

The most common output of coverage analysis is a table monitor scoped by use-case tags via

create_table_monitor_mac

. The

asset_selection

parameter uses this structure:

{
  "databases": ["<database_name>"],
  "schemas": ["<schema_name>"],
  "filters": [
    {
      "type": "TABLE_TAG",
      "tableTags": ["<tag_key>:<criticality>"],
      "tableTagsOperator": "HAS_ANY"
    }
  ]
}

Rules:

Filter
```
type
```
is always
```
TABLE_TAG
```
for use-case monitors.
```
tableTagsOperator
```
should be
```
HAS_ANY
```
.
Each entry in
```
tableTags
```
is
```
"<tag_key>:<value>"
```
where the tag key is the precomputed tag name from
```
get_use_cases
```
output and the value is the criticality level in lowercase (
```
high
```
,
```
medium
```
,
```
low
```
).
To monitor only HIGH-criticality tables:
```
["tag_name:high"]
```
To monitor MEDIUM + HIGH:
```
["tag_name:high", "tag_name:medium"]
```

To monitor ALL:

["tag_name:high", "tag_name:medium", "tag_name:low"]

Monitor description guidelines

Write a clear, meaningful

description

that explains what the monitor covers and why. The backend auto-generates the monitor

name

-- you cannot control it, but the description is what users see.

Bad:

"Data Quality Monitoring - HIGH criticality table monitor"

Good:

"Monitor HIGH criticality tables in the Revenue Reporting use case to catch issues before they affect dashboards and financial reports."

The description should mention the criticality scope, the use case name, and a brief reason why this monitoring matters.

Transient and truncate-and-reload tables

Some tables show 0 rows when queried directly but have recent write activity in Monte Carlo metadata. These are transient tables -- fully replaced on each pipeline run (truncate-and-reload pattern). Recognize this pattern early to avoid wasting time querying empty tables.

Signs of a transient table:

```
get_table
```
shows recent
```
last_write
```
timestamp and high read/write activity
Direct SQL query returns 0 rows or all-NULL timestamp columns
Monte Carlo detected freshness anomalies (the table stayed empty longer than expected between loads)

Graceful degradation

Handle missing or unavailable tools gracefully:

Scenario	Behavior
No use cases defined	Fall back to importance-based discovery
No database MCP available	Skip SQL profiling, rely on MC tools only
`get_unmonitored_tables_with_anomalies` returns empty	Note that no recent anomalies were found; proceed with use-case or importance-based prioritization
`get_use_case_tables` returns no tables	Note the use case has no tables; suggest exploring other use cases
`get_audiences` returns empty	Inform user no audiences are configured; monitors can still be created without notification routing
User has no warehouses	Inform user that no warehouses are accessible; they may need to check their Monte Carlo permissions

Never error out or stop the conversation because one tool returned empty results. Explain what happened and offer the next best path.

Rules

Never expose UUIDs, MCONs, or internal identifiers to the user -- always use human-readable names for warehouses, audiences, use cases, and tables. Keep internal identifiers for tool calls only.
When the user asks about relationships between tables, use
```
get_asset_lineage
```
to fetch upstream/downstream connections and explain the data flow.
Be concise but thorough. Use bullet points and tables for clarity.
Always use ISO 8601 format for datetime values in tool calls.
Never reformat YAML values returned by creation tools.
When passing
```
audiences
```
or
```
failure_audiences
```
to monitor creation tools, use the audience name/label (not UUID). The API accepts audience names.