Skilllibrary aws
Provision and configure AWS services — write IAM policies, manage S3 buckets, build Lambda functions, deploy with CDK or CloudFormation, wire API Gateway, configure DynamoDB tables, and set up CloudWatch alarms. Use when tasks involve AWS console/CLI operations, CDK stack definitions, IAM permission debugging, or AWS service integration. Do not use for GCP, Azure, or generic cloud-agnostic architecture patterns.
git clone https://github.com/merceralex397-collab/skilllibrary
T=$(mktemp -d) && git clone --depth=1 https://github.com/merceralex397-collab/skilllibrary "$T" && mkdir -p ~/.claude/skills && cp -r "$T/14-cloud-platform-devops/aws" ~/.claude/skills/merceralex397-collab-skilllibrary-aws && rm -rf "$T"
14-cloud-platform-devops/aws/SKILL.mdPurpose
Use this skill to provision, configure, and operate AWS services — define infrastructure with CDK or CloudFormation, write least-privilege IAM policies, build and deploy Lambda functions, configure S3, DynamoDB, SQS, API Gateway, and CloudWatch, and debug permission and deployment issues.
When to use this skill
Use this skill when:
- writing or reviewing IAM policies, roles, and permission boundaries
- creating or modifying S3 buckets (lifecycle rules, CORS, bucket policies, event notifications)
- building Lambda functions (runtime config, layers, environment variables, VPC attachment, memory/timeout tuning)
- defining CDK stacks (
, constructs,cdk init
,cdk synth
) or CloudFormation templatescdk deploy - configuring API Gateway (REST or HTTP API, routes, authorizers, stages, throttling)
- setting up DynamoDB tables (partition/sort keys, GSIs, capacity modes, TTL)
- wiring SQS queues or SNS topics (dead-letter queues, visibility timeout, fan-out patterns)
- creating CloudWatch alarms, dashboards, or log-based metric filters
- debugging AWS permission errors (
,AccessDenied
)is not authorized to perform - running AWS CLI commands (
,aws s3
,aws lambda
,aws iam
)aws cloudformation
Do not use this skill when
- the task targets GCP, Azure, or Firebase services (use the respective skill)
- the task is cloud-agnostic architecture design without AWS-specific implementation
- the task is application business logic that does not interact with AWS services
- a narrower active skill (e.g.,
for ECS/Fargate container config) already owns the problemdocker-containers
Operating procedure
-
Identify the AWS services and region. Confirm which AWS services are involved, the target region, and the AWS account structure (single account, multi-account with Organizations). Check for existing CDK/CloudFormation stacks in the repo.
-
Write IAM policies with least privilege. Start with the minimum permissions needed. Use specific resource ARNs instead of
. Use*
or CloudTrail logs to identify actually-used permissions. Add conditions (aws iam access-analyzer
,aws:SourceArn
) to restrict cross-service access. Never useaws:PrincipalOrgID
."Effect": "Allow", "Action": "*", "Resource": "*" -
Define infrastructure as code.
- CDK (preferred): Define constructs in TypeScript or Python. Use L2 constructs (e.g.,
,lambda.Function
) over L1 (s3.Bucket
). RunCfnResource
to generate the CloudFormation template and review before deploying.cdk synth - CloudFormation: Use YAML format. Define
for environment-specific values. UseParameters
for multi-environment templates. Always includeConditions
on stateful resources (databases, S3 buckets with data).DeletionPolicy: Retain
- CDK (preferred): Define constructs in TypeScript or Python. Use L2 constructs (e.g.,
-
Configure Lambda functions. Set
based on profiling (128 MB minimum, 1024 MB for compute-heavy). SetmemorySize
to 2x the expected p99 execution time. Use environment variables for configuration — never hardcode secrets. Wire a dead-letter queue for async invocations. Use Lambda layers for shared dependencies.timeout -
Set up data stores.
- DynamoDB: Choose partition key for even distribution. Add sort keys for range queries. Use on-demand capacity for unpredictable workloads, provisioned for steady-state. Enable point-in-time recovery. Set TTL for expiring data.
- S3: Enable versioning on buckets with important data. Add lifecycle rules to transition to Glacier or expire objects. Set CORS only when needed. Block public access by default.
-
Wire API Gateway. Use HTTP API (v2) for simple Lambda proxies — cheaper and faster. Use REST API (v1) when you need request validation, usage plans, or API keys. Attach a Cognito or Lambda authorizer. Set throttling limits per route. Enable access logging to CloudWatch.
-
Configure monitoring and alarms. Create CloudWatch alarms for: Lambda errors (>1% error rate), Lambda duration (>80% of timeout), DynamoDB throttled reads/writes, SQS dead-letter queue depth (>0), API Gateway 5xx rate. Set up SNS notifications to alert the on-call channel. Create a CloudWatch dashboard grouping key metrics per service.
-
Deploy and verify. Run
(orcdk deploy --require-approval broadening
). After deployment, test each endpoint/function manually. Check CloudWatch logs for errors. Verify IAM permissions by running the expected operations with the deployed role. Runaws cloudformation deploy
before subsequent deploys to review changes.cdk diff
Decision rules
- Always use infrastructure as code (CDK or CloudFormation) — never provision via console clicks for anything beyond investigation.
- Default to on-demand DynamoDB capacity unless the workload is steady and predictable.
- Use HTTP API Gateway over REST API unless you need features only available in REST (WAF, request validation, API keys).
- Never put secrets in environment variables or CloudFormation parameters in plaintext — use Secrets Manager or SSM Parameter Store with
.SecureString - Prefer
(CDK) /RemovalPolicy.RETAIN
(CFN) on all stateful resources.DeletionPolicy: Retain - Set Lambda concurrency limits (
) to prevent runaway costs from unexpected spikes.ReservedConcurrentExecutions
Output requirements
— CDK stack or CloudFormation template with all resources definedInfrastructure Code
— least-privilege policies for each role with specific resource ARNsIAM Policy Documents
— step-by-step commands to deploy, verify, and roll backDeployment Runbook
— CloudWatch alarms, dashboard definition, and alert routingMonitoring Configuration
— expected monthly cost based on anticipated usage (use AWS Pricing Calculator)Cost Estimate
References
Read these only when relevant:
references/iam-policy-patterns.mdreferences/cdk-best-practices.mdreferences/lambda-optimization.md
Related skills
firebasegcpverceldocker-containers
Anti-patterns
- Using
in IAM policies — grants overly broad access and fails security reviews."Resource": "*" - Creating resources via the AWS console without corresponding IaC — leads to drift and unreproducible environments.
- Setting Lambda timeout to the maximum (900s) "just in case" — masks performance issues and increases cost on failures.
- Storing secrets in Lambda environment variables in plaintext — visible in the console and CloudFormation outputs.
- Deploying CloudFormation stacks without
review in production — risky updates may replace stateful resources.--no-execute-changeset - Skipping DynamoDB point-in-time recovery — makes accidental data deletion unrecoverable.
Failure handling
- If
fails with a rollback, check the CloudFormation events in the console for the specific resource that failed and its error message.cdk deploy - If a Lambda function returns
, use CloudTrail to find the exact API call and missing permission, then add it to the function's execution role.AccessDenied - If DynamoDB returns
, switch to on-demand capacity or increase provisioned RCU/WCU, and add auto-scaling.ProvisionedThroughputExceededException - If API Gateway returns 502, check the Lambda function's CloudWatch logs — a 502 usually means the function crashed, timed out, or returned a malformed response.
- If CloudFormation stack is stuck in
, identify the resource that cannot roll back, skip it withUPDATE_ROLLBACK_FAILED
, then fix manually.continue-update-rollback --resources-to-skip