Skip to main content

@webframp/aws-ops

v2026.04.14.2

AWS Operations Toolkit - Unified incident investigation and operational visibility.

This extension provides a complete workflow for investigating AWS outages by gathering data from CloudWatch Logs, Metrics, Alarms, and X-Ray Traces, plus an incident report that summarizes all findings.

Quick Start

# Install the extension (auto-resolves dependencies)
swamp extension pull @webframp/aws-ops

# Create model instances for your region
swamp model create @webframp/aws/logs aws-logs --global-arg region=us-east-1
swamp model create @webframp/aws/metrics aws-metrics --global-arg region=us-east-1
swamp model create @webframp/aws/alarms aws-alarms --global-arg region=us-east-1
swamp model create @webframp/aws/traces aws-traces --global-arg region=us-east-1

# Run the investigate-outage workflow
swamp workflow run @webframp/investigate-outage

Required IAM Permissions

  • logs:DescribeLogGroups
  • logs:StartQuery
  • logs:GetQueryResults
  • logs:FilterLogEvents
  • cloudwatch:ListMetrics
  • cloudwatch:GetMetricStatistics
  • cloudwatch:GetMetricData
  • cloudwatch:DescribeAlarms
  • cloudwatch:DescribeAlarmHistory
  • xray:GetServiceGraph
  • xray:GetTraceSummaries

Included Components

Workflows

  • @webframp/investigate-outage - Unified incident investigation workflow that:
    • Gathers alarm summary and active alarms
    • Analyzes Lambda Duration and Errors metrics for anomalies
    • Gets X-Ray service dependency graph
    • Finds error traces and analyzes error patterns
    • Lists CloudWatch log groups
    • Gets alarm state change history
    • Generates an incident report summarizing all findings

Reports

  • @webframp/incident-report - Workflow-scope report that aggregates findings into:
    • Alarm status and recent state changes
    • Metric anomaly highlights
    • Trace error analysis with top faulty services
    • Actionable recommendations

Model Dependencies

The workflow expects these model instances (create them before running):

  • aws-logs - @webframp/aws/logs
  • aws-metrics - @webframp/aws/metrics
  • aws-alarms - @webframp/aws/alarms
  • aws-traces - @webframp/aws/traces

Repository

https://github.com/webframp/swamp-extensions

Labels

awscloudwatchxrayobservabilityopsincident-responseworkflow

Install

$ swamp extension pull @webframp/aws-ops

@webframp/investigate-outagec3866eb0-6190-4154-b8e1-304624aba93e

Unified AWS outage investigation workflow. Gathers data from CloudWatch Logs, Metrics, Alarms, and X-Ray Traces to provide a comprehensive view of system health during an incident.

gather-observability-dataCollect data from all observability sources in parallel
1.check-alarmsaws-alarms.get_summary— Get alarm summary and active alarms
2.get-active-alarmsaws-alarms.get_active— Get all currently active alarms
3.analyze-metricsaws-metrics.analyze— Analyze Lambda Duration metrics for anomalies
4.analyze-errors-metricaws-metrics.analyze— Analyze Lambda Errors metrics
5.get-service-graphaws-traces.get_service_graph— Get X-Ray service dependency graph
6.get-error-tracesaws-traces.get_errors— Get traces with errors or faults
7.analyze-trace-errorsaws-traces.analyze_errors— Analyze error patterns in traces
gather-logsSearch logs for errors (runs in parallel with observability)
1.list-log-groupsaws-logs.list_log_groups— Discover log groups
2.find-lambda-errorsaws-logs.find_errors— Search Lambda log groups for error patterns
deep-divePerform deeper analysis based on initial findings
1.get-alarm-historyaws-alarms.get_history— Get alarm state change history

@webframp/incident-reportworkflow
incident_report.ts

Summarizes findings from the investigate-outage workflow into an actionable incident report

awsincident-responseopsobservability