Babysitter Stream Processing Windowing Designer

Designs optimal windowing strategies for stream processing

install
source · Clone the upstream repo
git clone https://github.com/a5c-ai/babysitter
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/a5c-ai/babysitter "$T" && mkdir -p ~/.claude/skills && cp -r "$T/library/specializations/data-engineering-analytics/skills/stream-processing-windowing-designer" ~/.claude/skills/a5c-ai-babysitter-stream-processing-windowing-designer && rm -rf "$T"
manifest: library/specializations/data-engineering-analytics/skills/stream-processing-windowing-designer/SKILL.md
source content

Stream Processing Windowing Designer

Overview

Designs optimal windowing strategies for stream processing. This skill provides expertise in window types, watermarks, and trigger strategies for streaming applications.

Capabilities

  • Window type selection (tumbling, sliding, session, global)
  • Watermark strategy design
  • Late data handling
  • Trigger configuration
  • Window aggregation optimization
  • State management recommendations
  • Exactly-once semantics configuration

Input Schema

{
  "useCase": "string",
  "eventTimeField": "string",
  "latencyRequirements": {
    "maxLatencyMs": "number",
    "allowedLateMs": "number"
  },
  "aggregations": ["object"]
}

Output Schema

{
  "windowConfig": {
    "type": "string",
    "size": "string",
    "slide": "string"
  },
  "watermarkConfig": "object",
  "triggerConfig": "object",
  "lateDataHandling": "object"
}

Target Processes

  • Streaming Pipeline
  • Feature Store Setup

Usage Guidelines

  1. Define use case and event time field
  2. Specify latency requirements
  3. List aggregation operations needed
  4. Consider late data arrival patterns

Best Practices

  • Choose window type based on business requirements
  • Configure watermarks based on expected lateness
  • Use appropriate triggers for latency vs completeness tradeoff
  • Plan state management for long windows
  • Test with realistic event time distributions