Claude-skill-registry aws-monitoring
Debug AWS resource issues, check Lambda logs, and monitor deployed services. Use when investigating production issues, checking CloudWatch logs, or debugging deployment failures.
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/aws-monitoring" ~/.claude/skills/majiayu000-claude-skill-registry-aws-monitoring && rm -rf "$T"
manifest:
skills/data/aws-monitoring/SKILL.mdsource content
AWS Monitoring Skill
This skill helps you monitor and debug AWS resources for the SG Cars Trends platform.
When to Use This Skill
- Investigating production errors
- Checking Lambda function logs
- Monitoring API performance
- Debugging deployment failures
- Analyzing CloudWatch metrics
- Setting up alarms
- Troubleshooting resource issues
Monitoring Tools
SST Console
SST provides a built-in console for monitoring:
# Open SST console for specific stage npx sst console --stage production npx sst console --stage staging npx sst console --stage dev
Features:
- Real-time Lambda logs
- Function invocations
- Error tracking
- Resource overview
- Environment variables
CloudWatch Logs
Access Lambda logs via CloudWatch:
# View logs using SST npx sst logs --stage production # View specific function logs npx sst logs --stage production --function api # Tail logs in real-time npx sst logs --stage production --function api --tail # Filter logs npx sst logs --stage production --function api --filter "ERROR" # Show logs from specific time npx sst logs --stage production --function api --since 1h npx sst logs --stage production --function api --since "2024-01-15 10:00"
AWS CLI
Use AWS CLI for advanced log queries:
# List log groups aws logs describe-log-groups \ --log-group-name-prefix "/aws/lambda/sgcarstrends" # Get recent log streams aws logs describe-log-streams \ --log-group-name "/aws/lambda/sgcarstrends-api-production" \ --order-by LastEventTime \ --descending \ --max-items 5 # Tail logs aws logs tail "/aws/lambda/sgcarstrends-api-production" --follow # Filter logs aws logs filter-log-events \ --log-group-name "/aws/lambda/sgcarstrends-api-production" \ --filter-pattern "ERROR" \ --start-time $(date -u -d '1 hour ago' +%s)000 # Get logs for specific request aws logs filter-log-events \ --log-group-name "/aws/lambda/sgcarstrends-api-production" \ --filter-pattern "request-id-here"
CloudWatch Metrics
Lambda Metrics
# Get Lambda invocations aws cloudwatch get-metric-statistics \ --namespace AWS/Lambda \ --metric-name Invocations \ --dimensions Name=FunctionName,Value=sgcarstrends-api-production \ --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \ --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \ --period 300 \ --statistics Sum # Get errors aws cloudwatch get-metric-statistics \ --namespace AWS/Lambda \ --metric-name Errors \ --dimensions Name=FunctionName,Value=sgcarstrends-api-production \ --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \ --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \ --period 300 \ --statistics Sum # Get duration aws cloudwatch get-metric-statistics \ --namespace AWS/Lambda \ --metric-name Duration \ --dimensions Name=FunctionName,Value=sgcarstrends-api-production \ --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \ --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \ --period 300 \ --statistics Average,Maximum
API Gateway Metrics
# Get API requests aws cloudwatch get-metric-statistics \ --namespace AWS/ApiGateway \ --metric-name Count \ --dimensions Name=ApiName,Value=sgcarstrends-api \ --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \ --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \ --period 300 \ --statistics Sum # Get 4XX errors aws cloudwatch get-metric-statistics \ --namespace AWS/ApiGateway \ --metric-name 4XXError \ --dimensions Name=ApiName,Value=sgcarstrends-api \ --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \ --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \ --period 300 \ --statistics Sum # Get latency aws cloudwatch get-metric-statistics \ --namespace AWS/ApiGateway \ --metric-name Latency \ --dimensions Name=ApiName,Value=sgcarstrends-api \ --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \ --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \ --period 300 \ --statistics Average,Maximum,p99
CloudWatch Alarms
Creating Alarms
// infra/alarms.ts import { StackContext, use } from "sst/constructs"; import * as cloudwatch from "aws-cdk-lib/aws-cloudwatch"; import * as sns from "aws-cdk-lib/aws-sns"; import * as subscriptions from "aws-cdk-lib/aws-sns-subscriptions"; import { API } from "./api"; export function Alarms({ stack, app }: StackContext) { const { api } = use(API); // Only create alarms for production if (app.stage !== "production") { return; } // SNS topic for alarms const alarmTopic = new sns.Topic(stack, "AlarmTopic"); // Add email subscription alarmTopic.addSubscription( new subscriptions.EmailSubscription("alerts@sgcarstrends.com") ); // High error rate alarm new cloudwatch.Alarm(stack, "ApiHighErrorRate", { metric: api.metricErrors(), threshold: 10, evaluationPeriods: 2, datapointsToAlarm: 2, alarmDescription: "API has high error rate", treatMissingData: cloudwatch.TreatMissingData.NOT_BREACHING, }).addAlarmAction(new cloudwatch.SnsAction(alarmTopic)); // High duration alarm new cloudwatch.Alarm(stack, "ApiHighDuration", { metric: api.metricDuration(), threshold: 5000, // 5 seconds evaluationPeriods: 2, datapointsToAlarm: 2, alarmDescription: "API response time is high", treatMissingData: cloudwatch.TreatMissingData.NOT_BREACHING, }).addAlarmAction(new cloudwatch.SnsAction(alarmTopic)); // Throttle alarm new cloudwatch.Alarm(stack, "ApiThrottled", { metric: api.metricThrottles(), threshold: 1, evaluationPeriods: 1, alarmDescription: "API is being throttled", treatMissingData: cloudwatch.TreatMissingData.NOT_BREACHING, }).addAlarmAction(new cloudwatch.SnsAction(alarmTopic)); }
Add to SST config:
// infra/sst.config.ts import { Alarms } from "./alarms"; export default { stacks(app) { app .stack(DNS) .stack(API) .stack(Web) .stack(Alarms); // Add alarms stack }, } satisfies SSTConfig;
Managing Alarms via CLI
# List alarms aws cloudwatch describe-alarms # Get alarm state aws cloudwatch describe-alarms \ --alarm-names "sgcarstrends-ApiHighErrorRate" # Disable alarm aws cloudwatch disable-alarm-actions \ --alarm-names "sgcarstrends-ApiHighErrorRate" # Enable alarm aws cloudwatch enable-alarm-actions \ --alarm-names "sgcarstrends-ApiHighErrorRate" # Delete alarm aws cloudwatch delete-alarms \ --alarm-names "sgcarstrends-ApiHighErrorRate"
CloudWatch Insights
Querying Logs
# Start query aws logs start-query \ --log-group-name "/aws/lambda/sgcarstrends-api-production" \ --start-time $(date -u -d '1 hour ago' +%s) \ --end-time $(date -u +%s) \ --query-string 'fields @timestamp, @message | filter @message like /ERROR/ | sort @timestamp desc | limit 20' # Get query results aws logs get-query-results --query-id <query-id>
Common Queries
Find errors:
fields @timestamp, @message | filter @message like /ERROR/ | sort @timestamp desc | limit 20
API performance:
fields @timestamp, @duration | stats avg(@duration), max(@duration), min(@duration)
Count errors by type:
fields @message | filter @message like /ERROR/ | parse @message /(?<errorType>\w+Error)/ | stats count() by errorType
Slow requests:
fields @timestamp, @duration, @requestId | filter @duration > 1000 | sort @duration desc | limit 20
Request rate:
fields @timestamp | stats count() by bin(5m)
X-Ray Tracing
Enable X-Ray
// infra/api.ts import { StackContext, Function } from "sst/constructs"; import * as lambda from "aws-cdk-lib/aws-lambda"; export function API({ stack }: StackContext) { const api = new Function(stack, "api", { handler: "apps/api/src/index.handler", tracing: lambda.Tracing.ACTIVE, // Enable X-Ray }); return { api }; }
Instrument Code
// apps/api/src/index.ts import { captureAWSv3Client } from "aws-xray-sdk-core"; import { DynamoDBClient } from "@aws-sdk/client-dynamodb"; // Wrap AWS SDK clients const client = captureAWSv3Client(new DynamoDBClient({}));
View Traces
# Get service graph aws xray get-service-graph \ --start-time $(date -u -d '1 hour ago' +%s) \ --end-time $(date -u +%s) # Get trace summaries aws xray get-trace-summaries \ --start-time $(date -u -d '1 hour ago' +%s) \ --end-time $(date -u +%s) # Get trace details aws xray batch-get-traces --trace-ids <trace-id>
Resource Monitoring
Lambda Functions
# List functions aws lambda list-functions --query 'Functions[?starts_with(FunctionName, `sgcarstrends`)].FunctionName' # Get function config aws lambda get-function-configuration \ --function-name sgcarstrends-api-production # Get function code location aws lambda get-function \ --function-name sgcarstrends-api-production # Invoke function aws lambda invoke \ --function-name sgcarstrends-api-production \ --payload '{"path": "/health"}' \ response.json cat response.json
CloudFront Distributions
# List distributions aws cloudfront list-distributions \ --query 'DistributionList.Items[*].[Id,DomainName,Status]' \ --output table # Get distribution config aws cloudfront get-distribution-config --id <distribution-id> # Create invalidation (cache clear) aws cloudfront create-invalidation \ --distribution-id <distribution-id> \ --paths "/*" # List invalidations aws cloudfront list-invalidations --distribution-id <distribution-id>
S3 Buckets
# List buckets aws s3 ls # Get bucket size aws s3 ls s3://bucket-name --recursive --summarize | grep "Total Size" # Monitor bucket metrics aws cloudwatch get-metric-statistics \ --namespace AWS/S3 \ --metric-name BucketSizeBytes \ --dimensions Name=BucketName,Value=bucket-name Name=StorageType,Value=StandardStorage \ --start-time $(date -u -d '1 day ago' +%Y-%m-%dT%H:%M:%S) \ --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \ --period 86400 \ --statistics Average
Cost Monitoring
Cost Explorer
# Get cost and usage aws ce get-cost-and-usage \ --time-period Start=$(date -u -d '1 month ago' +%Y-%m-%d),End=$(date -u +%Y-%m-%d) \ --granularity MONTHLY \ --metrics BlendedCost \ --group-by Type=SERVICE # Get cost by tag aws ce get-cost-and-usage \ --time-period Start=$(date -u -d '1 month ago' +%Y-%m-%d),End=$(date -u +%Y-%m-%d) \ --granularity MONTHLY \ --metrics BlendedCost \ --group-by Type=TAG,Key=Environment
Budget Alerts
Create budget in AWS Console or via CLI:
# Create budget aws budgets create-budget \ --account-id $(aws sts get-caller-identity --query Account --output text) \ --budget file://budget.json \ --notifications-with-subscribers file://notifications.json
Debugging Production Issues
1. Check Recent Deployments
# Get stack events aws cloudformation describe-stack-events \ --stack-name sgcarstrends-api-production \ --max-items 50 # Get deployment status npx sst stacks info API --stage production
2. Check Logs for Errors
# Get recent errors npx sst logs --stage production --function api --filter "ERROR" --since 1h # Or use AWS CLI aws logs tail "/aws/lambda/sgcarstrends-api-production" \ --follow \ --filter-pattern "ERROR"
3. Check Metrics
# Check invocations and errors aws cloudwatch get-metric-statistics \ --namespace AWS/Lambda \ --metric-name Invocations \ --dimensions Name=FunctionName,Value=sgcarstrends-api-production \ --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \ --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \ --period 300 \ --statistics Sum
4. Test Endpoint
# Test API directly curl -I https://api.sgcarstrends.com/health # Test with verbose output curl -v https://api.sgcarstrends.com/health
5. Check Resource Limits
# Check Lambda quotas aws service-quotas get-service-quota \ --service-code lambda \ --quota-code L-B99A9384 # Concurrent executions # Check API Gateway quotas aws service-quotas list-service-quotas \ --service-code apigateway
Common Issues
High Latency
Investigation:
- Check Lambda duration metrics
- Review CloudWatch Insights for slow queries
- Check database connection pool
- Review API response times
Solutions:
- Increase Lambda memory
- Optimize database queries
- Add caching
- Use connection pooling
High Error Rate
Investigation:
- Check error logs
- Review error types
- Check external service status
- Verify environment variables
Solutions:
- Fix application bugs
- Add error handling
- Retry failed requests
- Check API rate limits
Cold Starts
Investigation:
- Check init duration
- Review package size
- Check provisioned concurrency
Solutions:
- Enable provisioned concurrency
- Reduce bundle size
- Use ARM architecture
- Optimize imports
Monitoring Scripts
Health Check Script
#!/bin/bash # scripts/health-check.sh STAGE=${1:-production} API_URL="https://api${STAGE:+.$STAGE}.sgcarstrends.com" echo "Checking health of $STAGE environment..." # Check API API_STATUS=$(curl -s -o /dev/null -w "%{http_code}" $API_URL/health) if [ $API_STATUS -eq 200 ]; then echo "✓ API is healthy" else echo "✗ API is down (status: $API_STATUS)" exit 1 fi # Check Web WEB_URL="https://${STAGE:+$STAGE.}sgcarstrends.com" WEB_STATUS=$(curl -s -o /dev/null -w "%{http_code}" $WEB_URL) if [ $WEB_STATUS -eq 200 ]; then echo "✓ Web is healthy" else echo "✗ Web is down (status: $WEB_STATUS)" exit 1 fi echo "All services are healthy!"
Run:
chmod +x scripts/health-check.sh ./scripts/health-check.sh production
Log Analysis Script
#!/bin/bash # scripts/analyze-logs.sh STAGE=${1:-production} LOG_GROUP="/aws/lambda/sgcarstrends-api-$STAGE" echo "Analyzing logs for $STAGE..." # Count errors in last hour ERROR_COUNT=$(aws logs filter-log-events \ --log-group-name $LOG_GROUP \ --filter-pattern "ERROR" \ --start-time $(date -u -d '1 hour ago' +%s)000 \ --query 'events[*].message' \ --output text | wc -l) echo "Errors in last hour: $ERROR_COUNT" # Get top errors echo -e "\nTop error types:" aws logs filter-log-events \ --log-group-name $LOG_GROUP \ --filter-pattern "ERROR" \ --start-time $(date -u -d '1 hour ago' +%s)000 \ --query 'events[*].message' \ --output text | \ grep -oE '\w+Error' | \ sort | uniq -c | sort -rn | head -5
References
- CloudWatch Documentation: https://docs.aws.amazon.com/cloudwatch
- Lambda Monitoring: https://docs.aws.amazon.com/lambda/latest/dg/monitoring-functions.html
- X-Ray: https://docs.aws.amazon.com/xray
- Related files:
- Infrastructure with monitoring configinfra/- Root CLAUDE.md - Project documentation
Best Practices
- Log Levels: Use appropriate log levels (DEBUG, INFO, WARN, ERROR)
- Structured Logging: Use JSON format for easier parsing
- Correlation IDs: Track requests across services
- Alarms: Set up alarms for critical metrics
- Dashboards: Create CloudWatch dashboards for key metrics
- Cost Monitoring: Track AWS costs regularly
- Regular Reviews: Review logs and metrics weekly
- Retention: Set appropriate log retention (7-30 days)