Capability Scan Guide

This guide covers the CapabilityScanConfig CRD for autonomous capability discovery and scanning in your Kubernetes cluster.

Overview

The CapabilityScanConfig enables:

Autonomous Discovery: Automatically detects CRD changes (create, update, delete)
Event-Driven Scanning: Triggers capability scans when new CRDs are installed
Startup Reconciliation: Syncs cluster state with MCP on controller restart
Debounced Batching: Groups rapid CRD changes into efficient batch requests

This feature works with the DevOps AI Toolkit MCP to keep your cluster's capability data up-to-date for AI-powered recommendations.

Stack Installation

If you installed via the DevOps AI Toolkit Stack, CapabilityScanConfig is already configured. You can verify with:

kubectl get capabilityscanconfig -n dot-ai

Continue below only if you need to customize the configuration or installed the controller individually.

Prerequisites

Controller installed (see Setup Guide)
DevOps AI Toolkit MCP installed and running

Quick Start

Create a secret with your MCP API key (if authentication is required):

kubectl create secret generic dot-ai-secrets \
  --namespace dot-ai \
  --from-literal=auth-token=your-auth-token-here

Create a CapabilityScanConfig to start scanning:

apiVersion: dot-ai.devopstoolkit.live/v1alpha1
kind: CapabilityScanConfig
metadata:
  name: default-scan
  namespace: dot-ai
spec:
  mcp:
    endpoint: http://dot-ai.dot-ai.svc.cluster.local:3456/api/v1/tools/manageOrgData
    authSecretRef:
      name: dot-ai-secrets
      key: auth-token

Apply it:

kubectl apply -f capabilityscanconfig.yaml

The controller will perform an initial scan of all cluster resources and then watch for CRD changes.

How It Works

Startup Reconciliation

When the controller starts (or restarts), it performs a diff-and-sync:

List Cluster Resources: Uses Discovery API to get all resources (core + CRDs) matching include/exclude filters
List MCP Capabilities: Queries MCP for existing capability IDs
Compute Diff:
- Resources in cluster but not in MCP → trigger targeted scan
- Capabilities in MCP but not in cluster → delete orphaned

This ensures the controller recovers gracefully from restarts without missing any changes.

Event-Driven Scanning

After startup, the controller watches for CRD events:

CRD Created/Updated: Queue for capability scan
CRD Deleted: Queue for capability deletion
Debounce: Wait for debounceWindowSeconds to collect more events
Batch Request: Send all queued scans in a single request

Debouncing

When operators are installed, many CRDs may be created in rapid succession. Debouncing prevents overwhelming MCP with individual requests:

Time 0s:   CRD-A created → add to buffer
Time 1s:   CRD-B created → add to buffer
Time 2s:   CRD-C created → add to buffer
...
Time 10s:  Flush buffer → single request: "CRD-A,CRD-B,CRD-C"

Configure the window based on your needs:

Lower values (1-5s): Faster scanning, more HTTP requests
Higher values (30-60s): Fewer requests, delayed scanning

Fire-and-Forget Model

The controller uses a fire-and-forget pattern:

Scans are triggered asynchronously (controller doesn't wait for completion)
MCP performs the actual capability analysis in the background
Failed scans are automatically retried on next controller restart

Configuration

Spec Fields

Field	Type	Required	Default	Description
`mcp.endpoint`	string	Yes	-	Full URL of the MCP manageOrgData endpoint
`mcp.collection`	string	No	capabilities	Qdrant collection name for storing capabilities
`mcp.authSecretRef`	SecretReference	Yes	-	Secret containing API key for MCP authentication
`includeResources`	[]string	No	all	Patterns for resources to include in scanning
`excludeResources`	[]string	No	-	Patterns for resources to exclude from scanning
`retry.maxAttempts`	int	No	3	Maximum retry attempts for MCP API calls
`retry.backoffSeconds`	int	No	5	Initial backoff duration in seconds
`retry.maxBackoffSeconds`	int	No	300	Maximum backoff duration in seconds
`debounceWindowSeconds`	int	No	10	Time window to batch CRD events before syncing

Resource Filtering

Use includeResources and excludeResources to control which resources are scanned. Filters apply to:

Initial scan: All resources discovered via Discovery API (core + CRDs)
Event-driven scanning: CRD create/update/delete events

Pattern Format:

Kind.group for grouped resources (e.g., Deployment.apps, RDSInstance.database.aws.crossplane.io)
Kind for core resources (e.g., Service, ConfigMap)
Wildcards supported: *.crossplane.io, *.apps, *

Example: Whitelist - Scan Only Crossplane Resources:

spec:
  includeResources:
    - "*.crossplane.io"

Example: Blocklist - Scan Everything Except High-Volume Resources:

spec:
  excludeResources:
    - "Event"
    - "Lease.coordination.k8s.io"
    - "EndpointSlice.discovery.k8s.io"

Example: Combined - Crossplane Resources Except Provider Configs:

spec:
  includeResources:
    - "*.crossplane.io"
  excludeResources:
    - "ProviderConfig.*"

Processing Order:

If includeResources is specified, only those patterns are scanned
excludeResources is applied as a blocklist after includes
If neither is specified, all resources are scanned

Status

Check the status to verify scanning is working:

kubectl get capabilityscanconfig default-scan -o yaml

Status Fields

Field	Description
`initialScanComplete`	Whether startup reconciliation has completed
`lastScanTime`	Timestamp of last successful scan trigger
`lastError`	Last error message if any
`conditions`	Standard Kubernetes conditions

Conditions

Type	Description
`Ready`	True when controller is watching CRDs and connected to MCP

Example: Full Configuration

apiVersion: dot-ai.devopstoolkit.live/v1alpha1
kind: CapabilityScanConfig
metadata:
  name: production-scan
  namespace: dot-ai
spec:
  # MCP configuration
  mcp:
    endpoint: http://dot-ai.dot-ai.svc.cluster.local:3456/api/v1/tools/manageOrgData
    collection: capabilities
    authSecretRef:
      name: dot-ai-secrets
      key: auth-token

  # Only scan Crossplane and ArgoCD resources
  includeResources:
    - "*.crossplane.io"
    - "*.aws.crossplane.io"
    - "*.gcp.crossplane.io"
    - "*.azure.crossplane.io"
    - "applications.argoproj.io"
    - "applicationsets.argoproj.io"

  # Exclude internal resources
  excludeResources:
    - "*.internal.company.com"

  # Retry configuration for MCP API calls
  retry:
    maxAttempts: 5
    backoffSeconds: 10
    maxBackoffSeconds: 300

  # Batch CRD events for 15 seconds before sending
  debounceWindowSeconds: 15

Use Cases

Crossplane Provider Installation

When you install a Crossplane provider:

kubectl apply -f provider-aws.yaml

The controller:

Detects new CRDs (RDSInstance.database.aws.crossplane.io, Bucket.s3.aws.crossplane.io, etc.)
Waits for debounce window (batches all CRDs)
Sends single scan request to MCP
MCP analyzes and stores capabilities

MCP can now provide AI recommendations that include the newly available AWS resources.

Operator Removal

When you remove an operator:

kubectl delete -f provider-aws.yaml

The controller:

Detects CRD deletions
Sends delete requests to MCP for each capability
MCP removes stale capability data

MCP recommendations no longer suggest the removed resources.

Controller Restart Recovery

If the controller pod restarts:

Controller performs startup reconciliation
Compares cluster CRDs with MCP capabilities
Syncs any differences (missed events during downtime)
Resumes event watching

No manual intervention required.

Troubleshooting

Controller Not Starting

Check the Ready condition:

kubectl get capabilityscanconfig default-scan -o jsonpath='{.status.conditions}'

Common issues:

Invalid mcp.endpoint URL
MCP service not reachable
Missing RBAC permissions

Scans Not Triggering

Check if CRD matches include/exclude filters:

# View configured filters
kubectl get capabilityscanconfig default-scan -o jsonpath='{.spec.includeResources}'
kubectl get capabilityscanconfig default-scan -o jsonpath='{.spec.excludeResources}'

Check controller logs:

kubectl logs -l app.kubernetes.io/name=dot-ai-controller -n dot-ai --tail=50

Look for messages about CRD events and filtering decisions.

MCP Connection Errors

Check lastError in status:

kubectl get capabilityscanconfig default-scan -o jsonpath='{.status.lastError}'

Common issues:

MCP endpoint unreachable (check service/DNS)
Authentication failure (check secret exists and has correct key)
MCP server overloaded (check MCP logs)

Initial Scan Not Completing

Check if initial scan is marked complete:

kubectl get capabilityscanconfig default-scan -o jsonpath='{.status.initialScanComplete}'

If false, check controller logs for errors during startup reconciliation.

Debounce Window Too Long/Short

Adjust debounceWindowSeconds based on your operator installation patterns:

spec:
  # For frequent small changes
  debounceWindowSeconds: 5

  # For large operator installations
  debounceWindowSeconds: 30

Cleanup

Delete the CapabilityScanConfig to stop scanning:

kubectl delete capabilityscanconfig default-scan

This stops the CRD watcher but does not delete capability data from MCP. To remove capability data, use the MCP manageOrgData tool with operation: deleteAll. See the Capability Management Guide for details.

Next Steps

Learn about Resource Sync for semantic search of cluster resources
Explore Remediation Policies for event-driven remediation
Check Troubleshooting Guide for common issues

Overview​

Stack Installation​

Prerequisites​

Quick Start​

How It Works​

Startup Reconciliation​

Event-Driven Scanning​

Debouncing​

Fire-and-Forget Model​

Configuration​

Spec Fields​

Resource Filtering​

Status​

Status Fields​

Conditions​

Example: Full Configuration​

Use Cases​

Crossplane Provider Installation​

Operator Removal​

Controller Restart Recovery​

Troubleshooting​

Controller Not Starting​

Scans Not Triggering​

MCP Connection Errors​

Initial Scan Not Completing​

Debounce Window Too Long/Short​

Cleanup​

Next Steps​