Skip to main content

Aws Ops

@webframp/aws-opsv2026.05.24.1· 7d agoWORKFLOWS·REPORTS
01README

AWS Operations Toolkit - Unified incident investigation and daily operational visibility.

This extension provides workflows for investigating AWS outages and running daily infrastructure pulse checks. Gathers data from CloudWatch Logs, Metrics, Alarms, X-Ray Traces, resource inventory, networking, Cost Explorer, and GitHub.

Quick Start

# Install the extension (auto-resolves dependencies)
swamp extension pull @webframp/aws-ops

# Create model instances for your region
swamp model create @webframp/aws/logs aws-logs --global-arg region=us-east-1
swamp model create @webframp/aws/metrics aws-metrics --global-arg region=us-east-1
swamp model create @webframp/aws/alarms aws-alarms --global-arg region=us-east-1
swamp model create @webframp/aws/traces aws-traces --global-arg region=us-east-1
swamp model create @webframp/aws/inventory aws-inventory --global-arg region=us-east-1
swamp model create @webframp/aws/networking aws-networking --global-arg region=us-east-1

# Run the investigate-outage workflow
swamp workflow run @webframp/investigate-outage

Required IAM Permissions

  • logs:DescribeLogGroups
  • logs:StartQuery
  • logs:GetQueryResults
  • logs:FilterLogEvents
  • cloudwatch:ListMetrics
  • cloudwatch:GetMetricStatistics
  • cloudwatch:GetMetricData
  • cloudwatch:DescribeAlarms
  • cloudwatch:DescribeAlarmHistory
  • xray:GetServiceGraph
  • xray:GetTraceSummaries

Included Components

Workflows

  • @webframp/investigate-outage - Unified incident investigation workflow that:
    • Gathers alarm summary and active alarms
    • Analyzes Lambda Duration/Errors and ELB 5XX/latency metrics for anomalies
    • Gets X-Ray service dependency graph
    • Finds error traces and analyzes error patterns
    • Lists CloudWatch log groups and searches for error patterns
    • Inventories EC2 instances and Lambda functions
    • Lists load balancers and NAT gateways with health status
    • Gets alarm state change history
    • Generates an incident report summarizing all findings

Reports

  • @webframp/incident-report - Workflow-scope report that aggregates findings into:
    • Alarm status and recent state changes
    • Metric anomaly highlights (Lambda + ELB)
    • Trace error analysis with top faulty services
    • Infrastructure inventory (EC2, Lambda)
    • Networking status (load balancers, NAT gateways)
    • Actionable recommendations

Model Dependencies

The workflow expects these model instances (create them before running):

  • aws-logs - @webframp/aws/logs
  • aws-metrics - @webframp/aws/metrics
  • aws-alarms - @webframp/aws/alarms
  • aws-traces - @webframp/aws/traces
  • aws-inventory - @webframp/aws/inventory
  • aws-networking - @webframp/aws/networking
02Workflows2
@webframp/investigate-outagec3866eb0-6190-4154-b8e1-304624aba93e

Unified AWS outage investigation workflow. Gathers data from CloudWatch Logs, Metrics, Alarms, X-Ray Traces, resource inventory, and networking to provide a comprehensive view of system health during an incident.

gather-observability-dataCollect data from all observability sources in parallel
1.check-alarmsaws-alarms.get_summary— Get alarm summary and active alarms
2.get-active-alarmsaws-alarms.get_active— Get all currently active alarms
3.analyze-metricsaws-metrics.analyze— Analyze Lambda Duration metrics for anomalies
4.analyze-errors-metricaws-metrics.analyze— Analyze Lambda Errors metrics
5.analyze-elb-5xxaws-metrics.analyze— Analyze ALB 5XX error count
6.analyze-elb-latencyaws-metrics.analyze— Analyze ALB target response time
7.get-service-graphaws-traces.get_service_graph— Get X-Ray service dependency graph
8.get-error-tracesaws-traces.get_errors— Get traces with errors or faults
9.analyze-trace-errorsaws-traces.analyze_errors— Analyze error patterns in traces
gather-logsSearch logs for errors (runs in parallel with observability)
1.list-log-groupsaws-logs.list_log_groups— Discover log groups
2.find-lambda-errorsaws-logs.find_errors— Search Lambda log groups for error patterns
gather-infrastructureCollect resource inventory and networking state (runs in parallel)
1.list-ec2-instancesaws-inventory.list_ec2— List EC2 instances across all states
2.list-lambda-functionsaws-inventory.list_lambda— List Lambda functions
3.list-load-balancersaws-networking.list_load_balancers— List ALBs and NLBs with target health
4.list-nat-gatewaysaws-networking.list_nat_gateways— List NAT gateway status
deep-divePerform deeper analysis based on initial findings
1.get-alarm-historyaws-alarms.get_history— Get alarm state change history
@webframp/morning-pulse460c619c-c59a-44bd-a2ad-27c8b819e8f6

Daily morning infrastructure pulse check. Gathers alarm state, alarm health verdicts, cost trend, and open PRs across user-specified regions, then generates a concise morning-pulse report you can skim in two minutes. Region-flexible via forEach — pass any combination of regions as input. Model instances must follow the naming convention: aws-alarms-{region}, alarm-investigation-{region}

alarmsAlarm summary and active alarms across all regions
1.summary-${{ self.region }}aws-alarms-${{ self.region }}.get_summary— Get alarm state counts and recent changes
2.active-${{ self.region }}aws-alarms-${{ self.region }}.get_active— Get alarms currently in ALARM state
alarm-triageTriage active alarms with verdicts
1.triage-${{ self.region }}alarm-investigation-${{ self.region }}.triage— Enrich alarms with health verdicts
costsCost trend for the last N days
1.trendaws-costs.get_cost_trend— Daily cost trend and direction
2.by-serviceaws-costs.get_cost_by_service— Cost breakdown by service
githubCheck for open pull requests
1.open-prsgithub.list_prs— List open PRs on the target repo
03Reports2
@webframp/incident-reportworkflow
incident_report.ts

Summarizes findings from the investigate-outage workflow into an actionable incident report

awsincident-responseopsobservability
@webframp/morning-pulse-reportworkflow
morning_pulse_report.ts

Daily infrastructure pulse: alarms, alarm health, costs, and open PRs

opsdailyaws
04Previous Versions12
2026.05.21.1May 22, 2026
2026.05.13.1May 13, 2026
2026.05.08.1May 9, 2026

Added 1 workflows. Added 1 reports. updated dependencies. updated labels

2026.04.22.1Apr 22, 2026

Modified 1 workflows. updated dependencies. updated platforms

2026.04.14.2Apr 14, 2026
2026.04.14.1Apr 14, 2026
2026.04.13.1Apr 13, 2026
2026.03.31.1Mar 31, 2026
2026.03.30.4Mar 31, 2026
2026.03.30.3Mar 30, 2026
2026.03.30.2Mar 30, 2026

Added 1 reports

2026.03.30.1Mar 30, 2026
05Stats
A
100 / 100
Downloads
36
Archive size
26.4 KB
  • Has README or module doc2/2earned
  • README has a code example1/1earned
  • README is substantive1/1earned
  • Most symbols documented1/1earned
  • No slow types1/1earned
  • Dependencies pass trust audit2/2earned
  • Has description1/1earned
  • Platform support declared (or universal)2/2earned
  • License declared1/1earned
  • Verified public repository2/2earned
06Platforms
07Labels