Claude-skill-registry filtering-event-datasets

Filter and search event datasets (logs) using OPAL. Use when you need to find specific log events by text search, regex patterns, or field values. Covers contains(), tilda operator ~, field comparisons, boolean logic, and limit for sampling results. Does NOT cover aggregation (see aggregating-event-datasets skill).

install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/filtering-event-datasets" ~/.claude/skills/majiayu000-claude-skill-registry-filtering-event-datasets && rm -rf "$T"
manifest: skills/data/filtering-event-datasets/SKILL.md
source content

Filtering Event Datasets

Event datasets (logs) represent point-in-time occurrences with a single timestamp. This skill teaches you how to filter and search log data to find specific events using OPAL.

When to Use This Skill

  • Searching logs for specific text patterns (errors, warnings, exceptions)
  • Finding logs matching a regex pattern (log levels, error codes, structured formats)
  • Filtering by field values (namespace, pod, container, stream, severity)
  • Combining multiple filter conditions with boolean logic
  • Sampling raw log events for investigation

Prerequisites

  • Access to Observe tenant via MCP
  • Understanding that Events have a single timestamp (not duration)
  • Dataset with
    log
    interface (or any Event dataset)

Key Concepts

Event Datasets

Event datasets represent discrete occurrences at specific points in time. Unlike Intervals (which have start/end times), Events are instantaneous observations.

Characteristics:

  • Single
    timestamp
    field (sometimes named
    time
    ,
    eventTime
    , etc.)
  • Commonly used for logs
  • Interface type often
    log
  • High volume, text-heavy data

Common Field Structure

Most log datasets follow this pattern:

  • body
    ,
    message
    , or
    log
    - The actual log message text
  • timestamp
    ,
    time
    ,
    eventTime
    , etc - When the log occurred
  • resource_attributes.*
    or
    labels
    - Nested metadata about the source
  • Standard fields like
    cluster
    ,
    namespace
    ,
    pod
    ,
    container
    ,
    stream
    ,
    level

Nested Field Access

Fields with dots in their names MUST be quoted:

resource_attributes."k8s.namespace.name"  ✓ Correct
resource_attributes.k8s.namespace.name    ✗ Wrong

Discovery Workflow

Always start by finding and exploring your log dataset.

Step 1: Search for log datasets

discover_context("logs kubernetes")

Look for datasets with interface type

log
or category "Logs".

Step 2: Get detailed schema

discover_context(dataset_id="YOUR_DATASET_ID")

Note the exact field names (case-sensitive!) and nested field structure. Pay attention to:

  • The
    body
    ,
    message
    ,
    log
    , or similar field for main log content
  • resource_attributes.*
    ,
    labels
    or other nested objects for dimensions
  • Fields like
    namespace
    ,
    pod
    ,
    container
    ,
    stream
    ,
    level

Basic Patterns

Pattern 1: Simple Text Search

Use case: Find logs containing specific text

filter contains(body, "error")
| limit 100

Explanation: Searches the

body
field for the substring "error" and returns the first 100 matching logs in their default order (typically reverse chronological, most recent first).

When to use: Quick text search to find recent examples of specific log messages.

Pattern 2: Case-Insensitive Regex Search

Use case: Find logs matching a pattern regardless of case

filter body ~ /error/i
| limit 100

Explanation: Uses regex operator

~
with forward slashes
/pattern/
for POSIX ERE pattern matching. The
/i
flag makes matching case-insensitive (matches "error", "ERROR", "Error", etc.). Without
/i
, matching is case-sensitive.

Regex syntax:

  • /pattern/
    - Case-sensitive regex
  • /pattern/i
    - Case-insensitive regex
  • POSIX Extended Regular Expression (ERE) syntax
  • Special chars need escaping:
    /\[ERROR\]/
    to match literal "[ERROR]"

Pattern 3: Structured Pattern Matching

Use case: Find logs matching a specific format (log levels, codes, structured fields)

filter body ~ /level=warn/i
| limit 100

Explanation: Matches logs containing "level=warn" (case-insensitive). Useful for structured logging formats with key=value pairs.

More examples:

# Match HTTP status codes
filter body ~ /status=[45][0-9]{2}/

# Match IP addresses
filter body ~ /[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/

# Match specific log levels
filter body ~ /\[WARN\]|\[ERROR\]/  # Note: May have issues with MCP tool due to pipe

Pattern 4: Field-Based Filtering

Use case: Filter by specific field values (not text search)

filter stream = "stderr"
| limit 100

Explanation: Filters logs written to stderr stream. Field comparisons use

=
,
!=
,
>
,
<
,
>=
,
<=
operators.

More examples:

# Filter by namespace (nested field)
filter string(resource_attributes."k8s.namespace.name") = "production"
| limit 100

# Filter by pod
filter pod = "api-gateway-abc123"
| limit 100

Pattern 5: Multiple Conditions

Use case: Combine multiple filters with boolean logic

filter (contains(body, "error") or contains(body, "exception"))
| filter string(resource_attributes."k8s.namespace.name") = "production"
| limit 100

Explanation: Chains multiple filters together. First filters for text content (error OR exception), then filters for specific namespace. Each

filter
verb can be chained with
|
.

Boolean operators:

  • and
    - Both conditions must be true
  • or
    - At least one condition must be true
  • not
    - Negates a condition

More examples:

# Multiple fields
filter namespace = "production" and stream = "stderr"
| limit 100

# Negation
filter not contains(body, "debug")
| limit 100

# Complex conditions
filter (level = "error" or level = "warn") and namespace != "test"
| limit 100

Complete Example

End-to-end workflow for finding recent errors in production.

Scenario: You need to investigate errors in the production namespace from the last hour.

Step 1: Discovery

discover_context("kubernetes logs")

Found: Dataset "Kubernetes Explorer/Kubernetes Logs" (ID: 42161740) with interface

log

Step 2: Get schema details

discover_context(dataset_id="42161740")

Key fields identified:

body
,
namespace
,
pod
,
container
,
stream
,
resource_attributes.*

Step 3: Build query

filter (contains(body, "error") or contains(body, "ERROR"))
| filter string(resource_attributes."k8s.namespace.name") = "production"
| filter stream = "stderr"
| limit 100

Step 4: Execute

execute_opal_query(
    query="[query above]",
    primary_dataset_id="42161740",
    time_range="1h"
)

Step 5: Interpret results

Returns up to 100 most recent logs (reverse chronological) that:

  • Contain "error" or "ERROR" in the body
  • Are from the production namespace
  • Were written to stderr

You can now examine these logs to identify the issue.

Common Pitfalls

Pitfall 1: Not Quoting Nested Fields

Wrong:

filter resource_attributes.k8s.namespace.name = "production"

Correct:

filter string(resource_attributes."k8s.namespace.name") = "production"

Why: Field names containing dots must be quoted. Also wrap in

string()
for type safety.

Pitfall 2: Using String Quotes for Regex

Wrong:

filter body ~ "error[0-9]+"  # This is string matching, not regex

Correct:

filter body ~ /error[0-9]+/  # Forward slashes for regex

Why: Regex patterns must use forward slashes

/pattern/
. Double quotes are for string literals.

Pitfall 3: Forgetting Case Sensitivity

Wrong:

filter contains(body, "Error")  # Only finds "Error", not "error" or "ERROR"

Correct:

# Option 1: Multiple conditions
filter contains(body, "error") or contains(body, "Error") or contains(body, "ERROR")

# Option 2: Case-insensitive regex
filter body ~ /error/i

Why:

contains()
is case-sensitive. Use regex with
/i
flag for case-insensitive matching.

Pitfall 4: Confusing limit with topk

Wrong (for filtered raw events):

filter contains(body, "error")
| topk 100, max(timestamp)  # topk is for aggregated results!

Correct:

filter contains(body, "error")
| limit 100  # limit is for raw events

Why:

limit
returns the first N results in their current order (for raw events).
topk
is for aggregated results (see aggregating-event-datasets skill).

Tips and Best Practices

  • Start broad, then narrow: Begin with simple filters, add specificity iteratively
  • Use
    limit
    for sampling
    : Always add
    | limit N
    when exploring to avoid overwhelming results
  • Default ordering: Events are typically ordered reverse chronological (most recent first)
  • Check field names: Use
    discover_context()
    to get exact field names (case-sensitive!)
  • Quote nested fields: Any field with dots in the name must be quoted
  • Type conversion: Wrap nested fields in
    string()
    ,
    int64()
    , etc. for type safety
  • Test patterns: Use small time ranges (1h) when developing queries
  • Regex testing: Test regex patterns with simple examples first before adding to complex queries

Regex Reference

Common POSIX ERE patterns:

  • .
    - Any character
  • *
    - Zero or more of previous
  • +
    - One or more of previous
  • ?
    - Zero or one of previous
  • [abc]
    - Character class (a, b, or c)
  • [0-9]
    - Digit
  • [a-z]
    - Lowercase letter
  • ^
    - Start of line
  • $
    - End of line
  • \
    - Escape special character

Flags:

  • /i
    - Case-insensitive
  • Default (no flag) - Case-sensitive

Additional Resources

For more details, see:

Related Skills

  • [aggregating-event-datasets] - For summarizing events with statsby, make_col, group_by
  • [time-series-analysis] - For trending events over time with timechart
  • [working-with-nested-fields] - Deep dive on nested field access

Last Updated: November 14, 2025 Version: 2.0 (Split from combined skill) Tested With: Observe OPAL v2.x