Aws Ops
AWS Operations Toolkit - Unified incident investigation and daily operational visibility.
This extension provides workflows for investigating AWS outages and running daily infrastructure pulse checks. Gathers data from CloudWatch Logs, Metrics, Alarms, X-Ray Traces, resource inventory, networking, Cost Explorer, and GitHub.
Quick Start
# Install the extension (auto-resolves dependencies)
swamp extension pull @webframp/aws-ops
# Create model instances for your region
swamp model create @webframp/aws/logs aws-logs --global-arg region=us-east-1
swamp model create @webframp/aws/metrics aws-metrics --global-arg region=us-east-1
swamp model create @webframp/aws/alarms aws-alarms --global-arg region=us-east-1
swamp model create @webframp/aws/traces aws-traces --global-arg region=us-east-1
swamp model create @webframp/aws/inventory aws-inventory --global-arg region=us-east-1
swamp model create @webframp/aws/networking aws-networking --global-arg region=us-east-1
# Run the investigate-outage workflow
swamp workflow run @webframp/investigate-outageRequired IAM Permissions
logs:DescribeLogGroupslogs:StartQuerylogs:GetQueryResultslogs:FilterLogEventscloudwatch:ListMetricscloudwatch:GetMetricStatisticscloudwatch:GetMetricDatacloudwatch:DescribeAlarmscloudwatch:DescribeAlarmHistoryxray:GetServiceGraphxray:GetTraceSummaries
Included Components
Workflows
- @webframp/investigate-outage - Unified incident investigation workflow that:
- Gathers alarm summary and active alarms
- Analyzes Lambda Duration/Errors and ELB 5XX/latency metrics for anomalies
- Gets X-Ray service dependency graph
- Finds error traces and analyzes error patterns
- Lists CloudWatch log groups and searches for error patterns
- Inventories EC2 instances and Lambda functions
- Lists load balancers and NAT gateways with health status
- Gets alarm state change history
- Generates an incident report summarizing all findings
Reports
- @webframp/incident-report - Workflow-scope report that aggregates findings into:
- Alarm status and recent state changes
- Metric anomaly highlights (Lambda + ELB)
- Trace error analysis with top faulty services
- Infrastructure inventory (EC2, Lambda)
- Networking status (load balancers, NAT gateways)
- Actionable recommendations
Model Dependencies
The workflow expects these model instances (create them before running):
aws-logs- @webframp/aws/logsaws-metrics- @webframp/aws/metricsaws-alarms- @webframp/aws/alarmsaws-traces- @webframp/aws/tracesaws-inventory- @webframp/aws/inventoryaws-networking- @webframp/aws/networking
Unified AWS outage investigation workflow. Gathers data from CloudWatch Logs, Metrics, Alarms, X-Ray Traces, resource inventory, and networking to provide a comprehensive view of system health during an incident.
Daily morning infrastructure pulse check. Gathers alarm state, alarm health verdicts, cost trend, and open PRs across user-specified regions, then generates a concise morning-pulse report you can skim in two minutes. Region-flexible via forEach — pass any combination of regions as input. Model instances must follow the naming convention: aws-alarms-{region}, alarm-investigation-{region}
Summarizes findings from the investigate-outage workflow into an actionable incident report
Daily infrastructure pulse: alarms, alarm health, costs, and open PRs
Added 1 workflows. Added 1 reports. updated dependencies. updated labels
Modified 1 workflows. updated dependencies. updated platforms
Added 1 reports
- Has README or module doc2/2earned
- README has a code example1/1earned
- README is substantive1/1earned
- Most symbols documented1/1earned
- No slow types1/1earned
- Dependencies pass trust audit2/2earned
- Has description1/1earned
- Platform support declared (or universal)2/2earned
- License declared1/1earned
- Verified public repository2/2earned