Capability Scan Guide
This guide covers the CapabilityScanConfig CRD for autonomous capability discovery and scanning in your Kubernetes cluster.
Overview
The CapabilityScanConfig enables:
- Autonomous Discovery: Automatically detects CRD changes (create, update, delete)
- Event-Driven Scanning: Triggers capability scans when new CRDs are installed
- Startup Reconciliation: Syncs cluster state with MCP on controller restart
- Debounced Batching: Groups rapid CRD changes into efficient batch requests
This feature works with the DevOps AI Toolkit MCP to keep your cluster's capability data up-to-date for AI-powered recommendations.
Stack Installation
If you installed via the DevOps AI Toolkit Stack, CapabilityScanConfig is already configured. You can verify with:
kubectl get capabilityscanconfig -n dot-ai
Continue below only if you need to customize the configuration or installed the controller individually.
Prerequisites
- Controller installed (see Setup Guide)
- DevOps AI Toolkit MCP installed and running
Quick Start
- Create a secret with your MCP API key (if authentication is required):
kubectl create secret generic dot-ai-secrets \
--namespace dot-ai \
--from-literal=auth-token=your-auth-token-here
- Create a CapabilityScanConfig to start scanning:
apiVersion: dot-ai.devopstoolkit.live/v1alpha1
kind: CapabilityScanConfig
metadata:
name: default-scan
namespace: dot-ai
spec:
mcp:
endpoint: http://dot-ai.dot-ai.svc.cluster.local:3456/api/v1/tools/manageOrgData
authSecretRef:
name: dot-ai-secrets
key: auth-token
- Apply it:
kubectl apply -f capabilityscanconfig.yaml
The controller will perform an initial scan of all cluster resources and then watch for CRD changes.
How It Works
Startup Reconciliation
When the controller starts (or restarts), it performs a diff-and-sync:
- List Cluster Resources: Uses Discovery API to get all resources (core + CRDs) matching include/exclude filters
- List MCP Capabilities: Queries MCP for existing capability IDs
- Compute Diff:
- Resources in cluster but not in MCP → trigger targeted scan
- Capabilities in MCP but not in cluster → delete orphaned
This ensures the controller recovers gracefully from restarts without missing any changes.
Event-Driven Scanning
After startup, the controller watches for CRD events:
- CRD Created/Updated: Queue for capability scan
- CRD Deleted: Queue for capability deletion
- Debounce: Wait for
debounceWindowSecondsto collect more events - Batch Request: Send all queued scans in a single request
Debouncing
When operators are installed, many CRDs may be created in rapid succession. Debouncing prevents overwhelming MCP with individual requests:
Time 0s: CRD-A created → add to buffer
Time 1s: CRD-B created → add to buffer
Time 2s: CRD-C created → add to buffer
...
Time 10s: Flush buffer → single request: "CRD-A,CRD-B,CRD-C"
Configure the window based on your needs:
- Lower values (1-5s): Faster scanning, more HTTP requests
- Higher values (30-60s): Fewer requests, delayed scanning
Fire-and-Forget Model
The controller uses a fire-and-forget pattern:
- Scans are triggered asynchronously (controller doesn't wait for completion)
- MCP performs the actual capability analysis in the background
- Failed scans are automatically retried on next controller restart
Configuration
Spec Fields
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
mcp.endpoint | string | Yes | - | Full URL of the MCP manageOrgData endpoint |
mcp.collection | string | No | capabilities | Qdrant collection name for storing capabilities |
mcp.authSecretRef | SecretReference | Yes | - | Secret containing API key for MCP authentication |
includeResources | []string | No | all | Patterns for resources to include in scanning |
excludeResources | []string | No | - | Patterns for resources to exclude from scanning |
retry.maxAttempts | int | No | 3 | Maximum retry attempts for MCP API calls |
retry.backoffSeconds | int | No | 5 | Initial backoff duration in seconds |
retry.maxBackoffSeconds | int | No | 300 | Maximum backoff duration in seconds |
debounceWindowSeconds | int | No | 10 | Time window to batch CRD events before syncing |
Resource Filtering
Use includeResources and excludeResources to control which resources are scanned. Filters apply to:
- Initial scan: All resources discovered via Discovery API (core + CRDs)
- Event-driven scanning: CRD create/update/delete events
Pattern Format:
Kind.groupfor grouped resources (e.g.,Deployment.apps,RDSInstance.database.aws.crossplane.io)Kindfor core resources (e.g.,Service,ConfigMap)- Wildcards supported:
*.crossplane.io,*.apps,*
Example: Whitelist - Scan Only Crossplane Resources:
spec:
includeResources:
- "*.crossplane.io"
Example: Blocklist - Scan Everything Except High-Volume Resources:
spec:
excludeResources:
- "Event"
- "Lease.coordination.k8s.io"
- "EndpointSlice.discovery.k8s.io"
Example: Combined - Crossplane Resources Except Provider Configs:
spec:
includeResources:
- "*.crossplane.io"
excludeResources:
- "ProviderConfig.*"
Processing Order:
- If
includeResourcesis specified, only those patterns are scanned excludeResourcesis applied as a blocklist after includes- If neither is specified, all resources are scanned
Status
Check the status to verify scanning is working:
kubectl get capabilityscanconfig default-scan -o yaml
Status Fields
| Field | Description |
|---|---|
initialScanComplete | Whether startup reconciliation has completed |
lastScanTime | Timestamp of last successful scan trigger |
lastError | Last error message if any |
conditions | Standard Kubernetes conditions |
Conditions
| Type | Description |
|---|---|
Ready | True when controller is watching CRDs and connected to MCP |
Example: Full Configuration
apiVersion: dot-ai.devopstoolkit.live/v1alpha1
kind: CapabilityScanConfig
metadata:
name: production-scan
namespace: dot-ai
spec:
# MCP configuration
mcp:
endpoint: http://dot-ai.dot-ai.svc.cluster.local:3456/api/v1/tools/manageOrgData
collection: capabilities
authSecretRef:
name: dot-ai-secrets
key: auth-token
# Only scan Crossplane and ArgoCD resources
includeResources:
- "*.crossplane.io"
- "*.aws.crossplane.io"
- "*.gcp.crossplane.io"
- "*.azure.crossplane.io"
- "applications.argoproj.io"
- "applicationsets.argoproj.io"
# Exclude internal resources
excludeResources:
- "*.internal.company.com"
# Retry configuration for MCP API calls
retry:
maxAttempts: 5
backoffSeconds: 10
maxBackoffSeconds: 300
# Batch CRD events for 15 seconds before sending
debounceWindowSeconds: 15
Use Cases
Crossplane Provider Installation
When you install a Crossplane provider:
kubectl apply -f provider-aws.yaml
The controller:
- Detects new CRDs (
RDSInstance.database.aws.crossplane.io,Bucket.s3.aws.crossplane.io, etc.) - Waits for debounce window (batches all CRDs)
- Sends single scan request to MCP
- MCP analyzes and stores capabilities
MCP can now provide AI recommendations that include the newly available AWS resources.
Operator Removal
When you remove an operator:
kubectl delete -f provider-aws.yaml
The controller:
- Detects CRD deletions
- Sends delete requests to MCP for each capability
- MCP removes stale capability data
MCP recommendations no longer suggest the removed resources.
Controller Restart Recovery
If the controller pod restarts:
- Controller performs startup reconciliation
- Compares cluster CRDs with MCP capabilities
- Syncs any differences (missed events during downtime)
- Resumes event watching
No manual intervention required.
Troubleshooting
Controller Not Starting
Check the Ready condition:
kubectl get capabilityscanconfig default-scan -o jsonpath='{.status.conditions}'
Common issues:
- Invalid
mcp.endpointURL - MCP service not reachable
- Missing RBAC permissions
Scans Not Triggering
- Check if CRD matches include/exclude filters:
# View configured filters
kubectl get capabilityscanconfig default-scan -o jsonpath='{.spec.includeResources}'
kubectl get capabilityscanconfig default-scan -o jsonpath='{.spec.excludeResources}'
- Check controller logs:
kubectl logs -l app.kubernetes.io/name=dot-ai-controller -n dot-ai --tail=50
Look for messages about CRD events and filtering decisions.
MCP Connection Errors
Check lastError in status:
kubectl get capabilityscanconfig default-scan -o jsonpath='{.status.lastError}'
Common issues:
- MCP endpoint unreachable (check service/DNS)
- Authentication failure (check secret exists and has correct key)
- MCP server overloaded (check MCP logs)
Initial Scan Not Completing
Check if initial scan is marked complete:
kubectl get capabilityscanconfig default-scan -o jsonpath='{.status.initialScanComplete}'
If false, check controller logs for errors during startup reconciliation.
Debounce Window Too Long/Short
Adjust debounceWindowSeconds based on your operator installation patterns:
spec:
# For frequent small changes
debounceWindowSeconds: 5
# For large operator installations
debounceWindowSeconds: 30
Cleanup
Delete the CapabilityScanConfig to stop scanning:
kubectl delete capabilityscanconfig default-scan
This stops the CRD watcher but does not delete capability data from MCP. To remove capability data, use the MCP manageOrgData tool with operation: deleteAll. See the Capability Management Guide for details.
Next Steps
- Learn about Resource Sync for semantic search of cluster resources
- Explore Remediation Policies for event-driven remediation
- Check Troubleshooting Guide for common issues