Aws-agent-skills step-functions
AWS Step Functions workflow orchestration with state machines. Use when designing workflows, implementing error handling, configuring parallel execution, integrating with AWS services, or debugging executions.
install
source · Clone the upstream repo
git clone https://github.com/itsmostafa/aws-agent-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/itsmostafa/aws-agent-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/step-functions" ~/.claude/skills/itsmostafa-aws-agent-skills-step-functions && rm -rf "$T"
manifest:
skills/step-functions/SKILL.mdsource content
AWS Step Functions
AWS Step Functions is a serverless orchestration service that lets you build and run workflows using state machines. Coordinate multiple AWS services into business-critical applications.
Table of Contents
Core Concepts
Workflow Types
| Type | Description | Pricing |
|---|---|---|
| Standard | Long-running, durable, exactly-once | Per state transition |
| Express | High-volume, short-duration | Per execution (time + memory) |
State Types
| State | Description |
|---|---|
| Task | Execute work (Lambda, API call) |
| Choice | Conditional branching |
| Parallel | Execute branches concurrently |
| Map | Iterate over array |
| Wait | Delay execution |
| Pass | Pass input to output |
| Succeed | End successfully |
| Fail | End with failure |
Amazon States Language (ASL)
JSON-based language for defining state machines.
Common Patterns
Simple Lambda Workflow
{ "Comment": "Process order workflow", "StartAt": "ValidateOrder", "States": { "ValidateOrder": { "Type": "Task", "Resource": "arn:aws:lambda:us-east-1:123456789012:function:ValidateOrder", "Next": "ProcessPayment" }, "ProcessPayment": { "Type": "Task", "Resource": "arn:aws:lambda:us-east-1:123456789012:function:ProcessPayment", "Next": "FulfillOrder" }, "FulfillOrder": { "Type": "Task", "Resource": "arn:aws:lambda:us-east-1:123456789012:function:FulfillOrder", "End": true } } }
Create State Machine
AWS CLI:
aws stepfunctions create-state-machine \ --name OrderWorkflow \ --definition file://workflow.json \ --role-arn arn:aws:iam::123456789012:role/StepFunctionsRole \ --type STANDARD
boto3:
import boto3 import json sfn = boto3.client('stepfunctions') definition = { "Comment": "Order workflow", "StartAt": "ProcessOrder", "States": { "ProcessOrder": { "Type": "Task", "Resource": "arn:aws:lambda:...", "End": True } } } response = sfn.create_state_machine( name='OrderWorkflow', definition=json.dumps(definition), roleArn='arn:aws:iam::123456789012:role/StepFunctionsRole', type='STANDARD' )
Start Execution
import boto3 import json sfn = boto3.client('stepfunctions') response = sfn.start_execution( stateMachineArn='arn:aws:states:us-east-1:123456789012:stateMachine:OrderWorkflow', name='order-12345', input=json.dumps({ 'order_id': '12345', 'customer_id': 'cust-789', 'items': [{'product_id': 'prod-1', 'quantity': 2}] }) ) execution_arn = response['executionArn']
Choice State (Conditional Logic)
{ "StartAt": "CheckOrderValue", "States": { "CheckOrderValue": { "Type": "Choice", "Choices": [ { "Variable": "$.total", "NumericGreaterThan": 1000, "Next": "HighValueOrder" }, { "Variable": "$.priority", "StringEquals": "rush", "Next": "RushOrder" } ], "Default": "StandardOrder" }, "HighValueOrder": { "Type": "Task", "Resource": "arn:aws:lambda:...:function:ProcessHighValue", "End": true }, "RushOrder": { "Type": "Task", "Resource": "arn:aws:lambda:...:function:ProcessRush", "End": true }, "StandardOrder": { "Type": "Task", "Resource": "arn:aws:lambda:...:function:ProcessStandard", "End": true } } }
Parallel Execution
{ "StartAt": "ProcessInParallel", "States": { "ProcessInParallel": { "Type": "Parallel", "Branches": [ { "StartAt": "UpdateInventory", "States": { "UpdateInventory": { "Type": "Task", "Resource": "arn:aws:lambda:...:function:UpdateInventory", "End": true } } }, { "StartAt": "SendNotification", "States": { "SendNotification": { "Type": "Task", "Resource": "arn:aws:lambda:...:function:SendNotification", "End": true } } }, { "StartAt": "UpdateAnalytics", "States": { "UpdateAnalytics": { "Type": "Task", "Resource": "arn:aws:lambda:...:function:UpdateAnalytics", "End": true } } } ], "Next": "Complete" }, "Complete": { "Type": "Succeed" } } }
Map State (Iteration)
{ "StartAt": "ProcessItems", "States": { "ProcessItems": { "Type": "Map", "ItemsPath": "$.items", "MaxConcurrency": 10, "Iterator": { "StartAt": "ProcessItem", "States": { "ProcessItem": { "Type": "Task", "Resource": "arn:aws:lambda:...:function:ProcessItem", "End": true } } }, "ResultPath": "$.processedItems", "End": true } } }
Error Handling
{ "StartAt": "ProcessWithRetry", "States": { "ProcessWithRetry": { "Type": "Task", "Resource": "arn:aws:lambda:...:function:Process", "Retry": [ { "ErrorEquals": ["Lambda.ServiceException", "Lambda.TooManyRequestsException"], "IntervalSeconds": 2, "MaxAttempts": 6, "BackoffRate": 2 }, { "ErrorEquals": ["States.Timeout"], "IntervalSeconds": 5, "MaxAttempts": 3, "BackoffRate": 1.5 } ], "Catch": [ { "ErrorEquals": ["CustomError"], "ResultPath": "$.error", "Next": "HandleCustomError" }, { "ErrorEquals": ["States.ALL"], "ResultPath": "$.error", "Next": "HandleAllErrors" } ], "End": true }, "HandleCustomError": { "Type": "Task", "Resource": "arn:aws:lambda:...:function:HandleCustom", "End": true }, "HandleAllErrors": { "Type": "Fail", "Error": "ProcessingFailed", "Cause": "An error occurred during processing" } } }
CLI Reference
State Machine Management
| Command | Description |
|---|---|
| Create state machine |
| Update definition |
| Delete state machine |
| List state machines |
| Get details |
Executions
| Command | Description |
|---|---|
| Start execution |
| Stop execution |
| Get execution details |
| List executions |
| Get execution history |
Best Practices
Design
- Keep states focused — one purpose per state
- Use meaningful state names
- Implement comprehensive error handling
- Use Parallel for independent tasks
- Use Map for batch processing
Performance
- Use Express workflows for high-volume, short tasks
- Set appropriate timeouts
- Limit Map concurrency to avoid throttling
- Use SDK integrations when possible (avoid Lambda wrapper)
Reliability
- Retry transient errors
- Catch and handle specific errors
- Use idempotent operations
- Enable X-Ray tracing
Cost Optimization
- Use Express for short workflows (< 5 minutes)
- Combine related operations to reduce transitions
- Use Wait states instead of Lambda delays
Troubleshooting
Execution Failed
# Get execution history aws stepfunctions get-execution-history \ --execution-arn arn:aws:states:us-east-1:123456789012:execution:MyWorkflow:exec-123 \ --query 'events[?type==`TaskFailed` || type==`ExecutionFailed`]'
Lambda Timeout
Causes:
- Lambda running too long
- Task timeout too short
Fix:
{ "Type": "Task", "Resource": "arn:aws:lambda:...", "TimeoutSeconds": 300, "HeartbeatSeconds": 60 }
State Stuck
Check:
- Task state waiting for callback
- Wait state not yet elapsed
- Activity worker not responding
Invalid State Machine
# Validate definition aws stepfunctions validate-state-machine-definition \ --definition file://workflow.json