@john/debug-namespace-deep66654a96-7f45-4b6a-acff-e4987d40e648
Comprehensive namespace debugging workflow. Discovers and diagnoses every deployment, pod, service, configmap, PVC, secret, network policy, and event in a namespace, then iterates over all discovered services, deployments, and network policies for deep diagnostics — per-service selector and endpoint diagnosis, per-deployment rollout status and ReplicaSet history, and per-netpol rule inspection.
PREREQ — create the 9 per-namespace model instances first. Run this once per namespace before invoking
discoverCollect all resources in the namespace in parallel
1.list-deployments${{ inputs.namespace + "-deployment" }}.list— List all deployments with replica counts, images, and conditions
2.list-pods${{ inputs.namespace + "-pod" }}.list— List all pods with phase, container states, and restart counts
3.list-services${{ inputs.namespace + "-service" }}.list— List all services with type, ports, and selectors
4.list-configmaps${{ inputs.namespace + "-configmap" }}.list— List all configmaps with their data keys and values
5.list-pvcs${{ inputs.namespace + "-pvc" }}.list— List all PVCs with binding status, storage class, and capacity
6.list-secrets${{ inputs.namespace + "-secret" }}.list— List all secrets with type and data keys (not decoded values)
7.list-netpols${{ inputs.namespace + "-netpol" }}.list— List all NetworkPolicies with pod selectors and rule counts
8.list-events${{ inputs.namespace + "-event" }}.list— List all events sorted by timestamp
9.get-warnings${{ inputs.namespace + "-event" }}.getWarnings— List only warning-type events for quick problem identification
diagnose-servicesAutomatically diagnose every discovered service — selector matching, endpoint health, port analysis
1.diagnose-${{ self.svc.attributes.name }}${{ inputs.namespace + "-service" }}.diagnoseService— Diagnose service selector and port matching
diagnose-deploymentsCheck rollout status and ReplicaSet history for every discovered deployment
1.rollout-${{ self.dep.attributes.name }}${{ inputs.namespace + "-deployment" }}.getRolloutStatus— Get deployment rollout status and conditions
2.replicasets-${{ self.dep.attributes.name }}${{ inputs.namespace + "-deployment" }}.getReplicaSets— Get deployment ReplicaSet history and revisions
inspect-netpolsFetch full rule details for every discovered NetworkPolicy — selectors, ports, and CIDR blocks
1.get-${{ self.pol.attributes.name }}${{ inputs.namespace + "-netpol" }}.get— Inspect NetworkPolicy rules, selectors, and traffic configuration
summarizeFinal health roll-up. Runs the @john/namespace.health aggregator after everything else — the resulting `namespaceHealth` data record carries a single `healthy: bool` plus per-resource breakdown so callers don't need to fan out across pods/deployments/services/events to verify state.
1.health${{ inputs.namespace + "-namespace" }}.health— Aggregated namespace health — one record summarising the whole namespace
@john/deployment-status731aed2b-b876-42e5-84e4-327133e1dfaf
Rollout health check for all deployments — replica counts, rollout conditions, ReplicaSet history, and warning events
checkCollect deployment specs, rollout conditions, and related events
1.list-deployments${{ inputs.deploymentModel }}.list— Fetch all deployments with replica counts, strategy, containers, and rollout conditions
2.list-events${{ inputs.eventModel }}.list— Fetch all events to correlate with deployment activity (scaling, rollouts, failures)
3.get-warnings${{ inputs.eventModel }}.getWarnings— Fetch warning events to surface rollout failures, image pull errors, and crash loops
@john/service-connectivityfc82018b-d717-49ac-9ecd-f0396d6a6f61
Service connectivity overview — all services, all pods with labels, endpoints, and warning events for diagnosing routing and selector mismatches
collectGather services, pods, and events to diagnose connectivity issues
1.list-services${{ inputs.serviceModel }}.list— Fetch all services with selectors, ports, and types for selector-vs-label comparison
2.list-pods${{ inputs.podModel }}.list— Fetch all pods with labels, phase, and container ports for cross-referencing against service selectors
3.list-events${{ inputs.eventModel }}.list— Fetch all events to identify service-related issues (endpoint changes, failures)
4.get-warnings${{ inputs.eventModel }}.getWarnings— Fetch warning events for quick identification of connectivity-affecting problems
@john/cluster-health07dac9da-b595-43a8-b3a9-db6844b3b177
Cluster-wide health overview — node conditions, resource capacity, CPU/memory utilization, pod distribution, and storage status
assessCollect node status, metrics, pod health, and storage data
1.list-nodes${{ inputs.nodeModel }}.list— Fetch all nodes with conditions (Ready, MemoryPressure, DiskPressure), capacity, taints, and schedulability
2.get-node-metrics${{ inputs.nodeModel }}.getMetrics— Fetch CPU and memory usage for all nodes from metrics-server
3.list-pods${{ inputs.podModel }}.list— Fetch all pods to assess distribution, phases, and restart counts
4.get-pod-metrics${{ inputs.podModel }}.getMetrics— Fetch per-pod CPU and memory usage from metrics-server
5.list-pvcs${{ inputs.pvcModel }}.list— Fetch PVC binding status and capacity to check for unbound or full volumes
6.list-pvs${{ inputs.pvcModel }}.listVolumes— Fetch cluster PersistentVolumes to check reclaim policies and available capacity
@john/security-audit14f9de97-d94e-4e62-9a55-78edc575ae49
Audit namespace security posture — deployment security contexts, volumes, secrets, configmaps, ingress TLS, NetworkPolicy coverage, RBAC roles/bindings, and ServiceAccount permissions
collectGather all security-relevant resources from the namespace
1.list-deployments${{ inputs.deploymentModel }}.list— Fetch all deployments to inspect container security contexts, volume mounts, and image sources
2.list-secrets${{ inputs.secretModel }}.list— Inventory all secrets by type and key names (no content pulled)
3.list-configmaps${{ inputs.configmapModel }}.list— Fetch all configmaps to check for accidentally embedded credentials or sensitive config
4.list-ingresses${{ inputs.ingressModel }}.list— Fetch all ingresses to verify TLS termination and annotation-based security policies
5.list-netpols${{ inputs.netpolModel }}.list— Fetch all NetworkPolicies to verify pod selector coverage, ingress/egress restrictions, and CIDR blocks
6.list-roles${{ inputs.rbacModel }}.listRoles— Fetch namespace Roles to audit permission rules
7.list-role-bindings${{ inputs.rbacModel }}.listRoleBindings— Fetch namespace RoleBindings to map subject-to-role assignments
8.list-service-accounts${{ inputs.rbacModel }}.listServiceAccounts— Fetch ServiceAccounts to check auto-mount token settings
@john/rbac-audit856cc29a-27c5-4b8e-98bb-69ff7ea30cd7
RBAC security audit — Roles, ClusterRoles, RoleBindings, ClusterRoleBindings, ServiceAccounts, and permission analysis for identifying overly permissive access, wildcard rules, and cluster-admin bindings
collectGather all RBAC resources for permission analysis
1.list-roles${{ inputs.rbacModel }}.listRoles— Fetch all namespace-scoped Roles to audit permission rules (apiGroups, resources, verbs)
2.list-cluster-roles${{ inputs.rbacModel }}.listClusterRoles— Fetch all ClusterRoles to identify overly permissive cluster-wide permissions and wildcard rules
3.list-role-bindings${{ inputs.rbacModel }}.listRoleBindings— Fetch all namespace RoleBindings to map which subjects (users, groups, SAs) have which role assignments
4.list-cluster-role-bindings${{ inputs.rbacModel }}.listClusterRoleBindings— Fetch all ClusterRoleBindings to identify cluster-admin access and broad cluster-wide permissions
5.list-service-accounts${{ inputs.rbacModel }}.listServiceAccounts— Fetch all ServiceAccounts to check auto-mount token settings and secret associations
@john/storage-healthb1e038a4-04a9-4763-befa-02c8c5ffcd16
Storage health check — PVC binding status, capacity usage, PersistentVolume inventory, and storage-related events
collectGather PVC status, PV inventory, and storage events
1.list-pvcs${{ inputs.pvcModel }}.list— Fetch all PVCs to check binding status, storage classes, requested vs actual capacity, and access modes
2.list-pvs${{ inputs.pvcModel }}.listVolumes— Fetch cluster-wide PersistentVolumes to check phases, reclaim policies, and volume sources
3.list-events${{ inputs.eventModel }}.list— Fetch namespace events to surface FailedMount, FailedAttachVolume, and ProvisioningFailed warnings
4.get-warnings${{ inputs.eventModel }}.getWarnings— Fetch only warning events to highlight storage provisioning failures and mount errors
@john/autoscaling-status5935cf85-a935-4e1f-b65b-0c4eb62afa08
Autoscaling status report — HPA current vs target metrics, replica counts, scale conditions, and related deployment state
collectGather HPA metrics, deployment state, and scaling events
1.list-hpas${{ inputs.hpaModel }}.list— Fetch all HPAs to compare current vs target metrics, replica ranges, scale conditions, and last scale times
2.list-deployments${{ inputs.deploymentModel }}.list— Fetch all deployments to cross-reference HPA targets with actual replica counts and rollout status
3.list-events${{ inputs.eventModel }}.list— Fetch namespace events to surface ScalingReplicaSet, SuccessfulRescale, and FailedGetResourceMetric events
4.get-warnings${{ inputs.eventModel }}.getWarnings— Fetch only warning events to highlight scaling failures and metric collection errors
@john/batch-jobs-status85f5b09a-2aad-4682-8a11-2fc5e0bbfcc1
Batch workload status — Job completion rates, CronJob schedules, failure counts, and batch-related warnings
collectGather Job/CronJob status and batch events
1.list-jobs${{ inputs.jobModel }}.listJobs— Fetch all Jobs to check completion counts, failure rates, durations, and active/succeeded/failed status
2.list-cronjobs${{ inputs.jobModel }}.listCronJobs— Fetch all CronJobs to check schedules, suspend status, concurrency policies, and last run times
3.list-events${{ inputs.eventModel }}.list— Fetch namespace events to surface job completion, failure, and scheduling events
4.get-warnings${{ inputs.eventModel }}.getWarnings— Fetch only warning events to highlight BackoffLimitExceeded, DeadlineExceeded, and failed scheduling
@john/network-audit153573f2-bce7-4657-879f-5d6317c9d7ca
Network policy audit — NetworkPolicy inventory, pod selector coverage, service endpoints, and traffic rule analysis
collectGather NetworkPolicies, services, pods, and events for network analysis
1.list-netpols${{ inputs.netpolModel }}.list— Fetch all NetworkPolicies to audit pod selectors, ingress/egress rules, CIDR blocks, and policy types
2.list-services${{ inputs.serviceModel }}.list— Fetch all services to cross-reference with NetworkPolicy selectors and check endpoint exposure
3.list-pods${{ inputs.podModel }}.list— Fetch all pods to identify which pods are covered by NetworkPolicies and which are unprotected
4.get-warnings${{ inputs.eventModel }}.getWarnings— Fetch warning events to surface any network-related failures
@john/pod-inventory8dbc9cbf-6a48-4065-bfe2-2d0b3559f65b
List all pods via the cluster-pods model and collect per-pod CPU/memory metrics from the metrics-server in a single job
collectList pods from the K8s API and query the metrics-server for resource usage
1.list-podscluster-pods.list— Fetch all pods in the configured namespace via the K8s API
2.collect-metricscluster-pods.getMetrics— Query the metrics-server for CPU and memory usage of all pods in the namespace
@john/pod-health-checkbe9dd9b0-fc50-471e-abb5-2f02beea0c96
Discover all pods via cluster-pods, then iterate over each to fetch detailed status and the last 50 lines of container logs
discoverList all pods in the configured namespace to populate the pod data set
1.list-allcluster-pods.list— Fetch all pods from the K8s API
inspectIterate over each discovered pod to fetch its full status and recent logs
1.get-statuscluster-pods.get— Read detailed pod status including container states, conditions, and restart counts
2.get-logscluster-pods.getLogs— Fetch the last 50 lines of stdout/stderr logs from each pod's containers
@john/cluster-summaryeaea8dd8-54f0-4d86-ab75-68bd3dd5ec1b
Collect pod inventory and metrics from cluster-pods, then aggregate into a summary with counts by phase, node, restart totals, and health status
collectFetch all pods from the K8s API and query the metrics-server for CPU/memory usage
1.list-podscluster-pods.list— Fetch all pods in the configured namespace via the K8s API
2.get-metricscluster-pods.getMetrics— Query the metrics-server for per-pod CPU and memory usage
summarizeRead collected pod data and compute aggregated statistics
1.build-summarypod-summary.summarize— Aggregate pods by phase, node, restart count, and health status into a single summary resource