Skip to main content

Impact Analysis Guide

Complete guide for AI-powered dependency and blast radius analysis with the DevOps AI Toolkit.

Note: The examples below use MCP, but this tool is also available via the CLI, Web Dashboard, and Headlamp Plugin.

Prerequisites

Before using this guide, complete the Deployment Guide.

Optional but recommended:

Overview

What it does:

  • Blast radius assessment — Maps all resources affected by a proposed operation before you execute it
  • Multi-level dependency tracing — Follows dependency chains iteratively (e.g., PVC → Pod → Service → Ingress), not just first-level relationships
  • Free-text input — Accepts kubectl commands, YAML manifests, or plain-English descriptions
  • Confidence-level reporting — Categorizes each discovered dependency as Definite, Likely, or Uncertain
  • GitOps-aware — When you reference a git repo and file change, clones the repo and reads the actual manifest to verify the change
  • Safe/not-safe verdict — Returns a clear signal on whether the operation is safe to proceed

Use when:

  • Unsure about the impact of an operation on your cluster

How It Works

The AI combines three sources to discover dependencies:

  1. Built-in AI knowledge — Standard Kubernetes relationships (Deployment → ReplicaSet → Pod, Service → Endpoints) and ecosystem tools (CNPG Cluster → Pod/PVC/Service, Crossplane XR → MR, Istio VirtualService → Service, etc.)
  2. Cluster inspection — Reads the actual cluster state: ownerReferences, resource specs that reference other resources by name, events, labels, and annotations
  3. Capabilities and resources — If available, searches the Vector DB for semantic understanding of your cluster's resource types and their relationships

The AI iteratively follows dependency chains using these sources — the same way a human expert would investigate. When it can't confirm a relationship, it tells you.

Confidence Levels

Each dependency in the analysis is tagged with a confidence level:

LevelMeaningExample
DefiniteConfirmed from cluster data or git sourceownerReference links Pod to its parent Cluster
LikelyBased on ecosystem knowledge, not verified in cluster"CNPG typically creates a Service for each Cluster"
UncertainMight be affected but cannot confirmUnknown CRD without documentation

Complete Workflow Examples

Plain-English Input

User: What happens if I delete the postgres service in the production namespace?
Agent: I'll analyze the blast radius of deleting the postgres service.

[Uses impact_analysis tool with input: "what happens if I delete the postgres service in the production namespace?"]

The operation is NOT safe. Here's the analysis:

There's no service literally named "postgres" — the CNPG Cluster "test-pg" manages
three services: test-pg-rw, test-pg-r, and test-pg-ro.

Deleting any of them would cause:

DEFINITE impact:
- test-pg-rw — clients using it for write access to the primary (test-pg-1)
lose connectivity immediately
- test-pg-r — clients reading from any instance lose connectivity
- test-pg-ro — clients using read-only routing lose connectivity

LIKELY impact:
- The CNPG operator will automatically recreate deleted services, but there's
a disruption window
- Applications in other namespaces with hardcoded DNS references will get
connection failures
- Monitoring targets become unavailable

Risk summary:
- Data loss risk: LOW — services are networking constructs, not data
- Service disruption risk: HIGH — all database connectivity breaks immediately
- Automatic recovery: YES — the operator recreates services within minutes,
but there's a gap
- Cascading failures: MEDIUM — anything currently connected will fail

Recommendation: Don't delete these services. If you need to reconfigure,
modify them instead.

What happened behind the scenes:

  • AI parsed the plain-English description and identified the target resources
  • No service named "postgres" existed — AI discovered the actual CNPG-managed services
  • Investigation traced dependencies: Services → Pods → client connections → monitoring
  • Confidence levels distinguished between confirmed cluster state (Definite) and ecosystem knowledge (Likely)

GitOps Manifest Change

When you reference a git repository and file change, the AI clones the repo and reads the actual manifest to verify what's changing.

User: In repo https://github.com/vfarcic/dot-ai.git, the file
tests/integration/fixtures/gitops/cnpg-cluster/cluster.yaml will be
changed to set spec.instances from 2 to 1. What is the impact?
Agent: I'll analyze the impact of this GitOps manifest change.

[Uses impact_analysis tool]

The impact analysis confirms this change is safe. Here's the breakdown:

Affected Resources

Definite:
- Pod/gitops-pg-2 (replica) — will be terminated cleanly
- PVC/gitops-pg-2 — deleted after scale-down (contains only replica data)
- Pod/gitops-pg-1 (primary) — remains unchanged, keeps all data
- Services (gitops-pg-rw, gitops-pg-r, gitops-pg-ro) — auto-adjust routing,
no disruption

Likely:
- PodDisruptionBudget — updates automatically, becomes more protective
(0 allowed disruptions)
- ArgoCD Application/impact-analysis-pg — will briefly show OutOfSync,
then auto-reconcile (<1 min)

Uncertain:
- No integration test dependencies found on this fixture

Key Takeaways:
- Data safety: Primary pod and PVC retain all data
- Service continuity: All connection strings and services remain valid
- HA degraded: Loses failover capability
- Rollback: Simple — revert to instances: 2, CNPG auto-creates a new
replica in ~2-5 min

Safe to proceed.

What happened behind the scenes:

  • AI cloned the git repository and read the actual manifest file
  • Compared the proposed change (instances: 2 → 1) against the live cluster state
  • Detected the Argo CD Application managing this resource and assessed sync impact
  • Traced dependencies through Pods, PVCs, Services, and PodDisruptionBudgets

Integration with Other Tools

Impact analysis works alongside the other DevOps AI Toolkit tools. You don't need to remember to call it — the other tools will suggest it when relevant:

  • Operate — After generating an operational proposal (scale, update, delete), the tool suggests running impact_analysis before executing to understand the blast radius
  • Query — When you ask dependency or impact questions ("what depends on this database?"), the tool suggests impact_analysis for detailed blast radius analysis
  • Remediate — When remediation involves destructive actions, the tool suggests running impact_analysis first to understand downstream effects

This is opt-in — existing tool response times are unchanged. The AI agent decides whether to follow the suggestion based on the context.