Skills aws-ecs-monitor
AWS ECS production health monitoring with CloudWatch log analysis — monitors ECS service health, ALB targets, SSL certificates, and provides deep CloudWatch log analysis for error categorization, restart detection, and production alerts.
install
source · Clone the upstream repo
git clone https://github.com/openclaw/skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/openclaw/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/briancolinger/aws-ecs-monitor" ~/.claude/skills/clawdbot-skills-aws-ecs-monitor && rm -rf "$T"
manifest:
skills/briancolinger/aws-ecs-monitor/SKILL.mdsource content
AWS ECS Monitor
Production health monitoring and log analysis for AWS ECS services.
What It Does
- Health Checks: HTTP probes against your domain, ECS service status (desired vs running), ALB target group health, SSL certificate expiry
- Log Analysis: Pulls CloudWatch logs, categorizes errors (panics, fatals, OOM, timeouts, 5xx), detects container restarts, filters health check noise
- Auto-Diagnosis: Reads health status and automatically investigates failing services via log analysis
Prerequisites
CLI configured with appropriate IAM permissions:aws
,ecs:ListServicesecs:DescribeServices
,elasticloadbalancing:DescribeTargetGroupselasticloadbalancing:DescribeTargetHealth
,logs:FilterLogEventslogs:DescribeLogGroups
for HTTP health checkscurl
for JSON processing and log analysispython3
for SSL certificate checks (optional)openssl
Configuration
All configuration is via environment variables:
| Variable | Required | Default | Description |
|---|---|---|---|
| Yes | — | ECS cluster name |
| No | | AWS region |
| No | — | Domain for HTTP/SSL checks (skip if unset) |
| No | auto-detect | Comma-separated service names to monitor |
| No | | Path to write health state JSON |
| No | | Output directory for logs and alerts |
| No | | CloudWatch log group pattern ( is replaced) |
| No | — | Comma-separated pairs for HTTP probes |
Directories Written
(default:ECS_HEALTH_STATE
) — Health state JSON file./data/ecs-health.json
(default:ECS_HEALTH_OUTDIR
) — Output directory for logs, alerts, and analysis reports./data/
Scripts
scripts/ecs-health.sh
— Health Monitor
scripts/ecs-health.sh# Full check ECS_CLUSTER=my-cluster ECS_DOMAIN=example.com ./scripts/ecs-health.sh # JSON output only ECS_CLUSTER=my-cluster ./scripts/ecs-health.sh --json # Quiet mode (no alerts, just status file) ECS_CLUSTER=my-cluster ./scripts/ecs-health.sh --quiet
Exit codes:
0 = healthy, 1 = unhealthy/degraded, 2 = script error
scripts/cloudwatch-logs.sh
— Log Analyzer
scripts/cloudwatch-logs.sh# Pull raw logs from a service ECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh pull my-api --minutes 30 # Show errors across all services ECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh errors all --minutes 120 # Deep analysis with error categorization ECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh diagnose --minutes 60 # Detect container restarts ECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh restarts my-api # Auto-diagnose from health state file ECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh auto-diagnose # Summary across all services ECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh summary --minutes 120
Options:
--minutes N (default: 60), --json, --limit N (default: 200), --verbose
Auto-Detection
When
ECS_SERVICES is not set, both scripts auto-detect services from the cluster:
aws ecs list-services --cluster $ECS_CLUSTER
Log groups are resolved by pattern (default
/ecs/{service}). Override with ECS_LOG_PATTERN:
# If your log groups are /ecs/prod/my-api, /ecs/prod/my-frontend, etc. ECS_LOG_PATTERN="/ecs/prod/{service}" ECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh diagnose
Integration
The health monitor can trigger the log analyzer for auto-diagnosis when issues are detected. Set
ECS_HEALTH_OUTDIR to a shared directory and run both scripts together:
export ECS_CLUSTER=my-cluster export ECS_DOMAIN=example.com export ECS_HEALTH_OUTDIR=./data # Run health check (auto-triggers log analysis on failure) ./scripts/ecs-health.sh # Or run log analysis independently ./scripts/cloudwatch-logs.sh auto-diagnose --minutes 30
Error Categories
The log analyzer classifies errors into:
— Go panicspanic
— Fatal errorsfatal
— Out of memoryoom
— Connection/request timeoutstimeout
— Connection refused/resetconnection_error
— HTTP 500-level responseshttp_5xx
— Python tracebackspython_traceback
— Generic exceptionsexception
— Permission/authorization failuresauth_error
— JSON-structured error logsstructured_error
— Generic ERROR-level messageserror
Health check noise (GET/HEAD
/health from ALB) is automatically filtered from error counts and HTTP status distribution.