AI Engine Deployment

Deploy the DevOps AI Toolkit Engine to Kubernetes using Helm chart — production-ready deployment.

For the easiest setup, we recommend installing the complete dot-ai stack which includes all components pre-configured. See the Stack Installation Guide.

Continue below if you want to install components individually (for granular control over configuration).

Overview

The DevOps AI Toolkit Engine provides:

Kubernetes Deployment Recommendations — AI-powered application deployment assistance with enhanced semantic understanding
Cluster Query — Natural language interface for querying cluster resources, status, and health
Capability Management — Discover and store semantic resource capabilities for intelligent recommendation matching
Pattern Management — Organizational deployment patterns that enhance AI recommendations
Policy Management — Governance policies that guide users toward compliant configurations with optional Kyverno enforcement
Kubernetes Issue Remediation — AI-powered root cause analysis and automated remediation
Shared Prompts Library — Centralized prompt sharing via native slash commands
REST API Gateway — HTTP endpoints for all toolkit capabilities

Access these tools through MCP clients or the CLI.

What You Get

Production Kubernetes Deployment — Scalable deployment with proper resource management
Integrated Qdrant Database — Vector database for capability and pattern management
External Access — Ingress configuration for team collaboration
Resource Management — Proper CPU/memory limits and requests
Authentication — Static token (default) and OAuth (opt-in, requires HTTPS)
Security — RBAC and ServiceAccount configuration

Prerequisites

Kubernetes cluster (1.19+) with kubectl access
Helm 3.x installed
AI model API key (default: Anthropic). See AI Model Configuration for available model options.
Ingress controller (any standard controller)

Quick Start (5 Minutes)

Step 1: Set Environment Variables

Export your API key and configure authentication:

# Required
export ANTHROPIC_API_KEY="sk-ant-api03-..."

# Required — static token for REST API, CI/CD, and MCP clients without OAuth
export DOT_AI_AUTH_TOKEN=$(openssl rand -base64 32)

# Ingress class - change to match your ingress controller (traefik, haproxy, etc.)
export INGRESS_CLASS_NAME="nginx"

Step 2: Install the Controller

Install the dot-ai-controller to enable autonomous cluster operations:

# Set the controller version from https://github.com/vfarcic/dot-ai-controller/pkgs/container/dot-ai-controller%2Fcharts%2Fdot-ai-controller
export DOT_AI_CONTROLLER_VERSION="..."

# Install controller (includes CRDs for Solution and RemediationPolicy)
helm install dot-ai-controller \
  oci://ghcr.io/vfarcic/dot-ai-controller/charts/dot-ai-controller:$DOT_AI_CONTROLLER_VERSION \
  --namespace dot-ai \
  --create-namespace \
  --wait

The controller provides CRDs for autonomous cluster operations. Create Custom Resources like CapabilityScanConfig, Solution, RemediationPolicy, or ResourceSyncConfig to enable features such as capability scanning, solution tracking, and more. See the Controller Setup Guide for complete details.

Step 3: Install the Server

Install the server using the published Helm chart:

# Set the version from https://github.com/vfarcic/dot-ai/pkgs/container/dot-ai%2Fcharts%2Fdot-ai
export DOT_AI_VERSION="..."

helm install dot-ai-mcp oci://ghcr.io/vfarcic/dot-ai/charts/dot-ai:$DOT_AI_VERSION \
  --set secrets.anthropic.apiKey="$ANTHROPIC_API_KEY" \
  --set secrets.auth.token="$DOT_AI_AUTH_TOKEN" \
  --set localEmbeddings.enabled=true \
  --set ingress.enabled=true \
  --set ingress.className="$INGRESS_CLASS_NAME" \
  --set ingress.host="dot-ai.127.0.0.1.nip.io" \
  --set controller.enabled=true \
  --namespace dot-ai \
  --wait

Notes:

Authentication: DOT_AI_AUTH_TOKEN is required and provides shared token auth for REST API, CI/CD, and MCP clients without OAuth. To enable OAuth with individual user identity, set dex.enabled: true (requires HTTPS). See Authentication for details.
localEmbeddings.enabled=true deploys an in-cluster embedding service (HuggingFace TEI) so semantic search works without any embedding API keys. See Local Embeddings for details.
Replace dot-ai.127.0.0.1.nip.io with your desired hostname for external access.
For enhanced security, create a secret named dot-ai-secrets with keys anthropic-api-key and auth-token instead of using --set arguments.
For all available configuration options, see the Helm values file.
Global annotations: Add annotations to all Kubernetes resources using annotations in your values file (e.g., for Reloader integration: reloader.stakater.com/auto: "true").
Custom endpoints (OpenRouter, self-hosted): See Custom Endpoint Configuration for environment variables, then use --set or values file with ai.customEndpoint.enabled=true and ai.customEndpoint.baseURL.
Observability/Tracing: Add tracing environment variables via extraEnv in your values file. See Observability Guide for complete configuration.
User-Defined Prompts: Load custom prompts from your git repository via extraEnv. See User-Defined Prompts for configuration.

Step 4: Connect a Client

With the server running, connect using your preferred access method:

MCP Client Setup — Connect via MCP protocol from Claude Code, Cursor, or other MCP clients. MCP clients with OAuth support authenticate via browser automatically; others use the static token.
CLI — Use the command-line interface for terminal and CI/CD pipelines

Capability Scanning for AI Recommendations

Many MCP tools depend on capability data to function:

recommend: Uses capabilities to find resources matching your deployment intent
manageOrgData (patterns): References capabilities when applying organizational patterns
manageOrgData (policies): Validates resources against stored capability metadata

Without capability data, these tools may not work or will produce poor results.

Enabling Capability Scanning

Create a CapabilityScanConfig CR to enable autonomous capability discovery. The controller watches for CRD changes and automatically scans new resources. See the Capability Scan Guide for setup instructions.

AI Model Configuration

The DevOps AI Toolkit supports multiple AI models. Choose your model by setting the AI_PROVIDER environment variable.

Model Requirements

All AI models must meet these minimum requirements:

Context window: 200K+ tokens (some tools like capability scanning use large context)
Output tokens: 8K+ tokens (for YAML generation and policy creation)
Function calling: Required for MCP tool interactions

Available Models

Provider	Model	AI_PROVIDER	API Key Required	Recommended
Anthropic	Claude Haiku 4.5	`anthropic_haiku`	`ANTHROPIC_API_KEY`	Yes
Anthropic	Claude Opus 4.6	`anthropic_opus`	`ANTHROPIC_API_KEY`	Yes
Anthropic	Claude Sonnet 4.6	`anthropic`	`ANTHROPIC_API_KEY`	Yes
AWS	Amazon Bedrock	`amazon_bedrock`	AWS credentials (see setup)	Yes
Google	Gemini 3.1 Pro	`google`	`GOOGLE_GENERATIVE_AI_API_KEY`	Yes (might be slow)
Google	Gemini 3 Flash	`google_flash`	`GOOGLE_GENERATIVE_AI_API_KEY`	Yes (preview)
Host	Host Environment LLM	`host`	None (uses host's AI)	Yes (if supported)
Moonshot AI	Kimi K2.5	`kimi`	`MOONSHOT_API_KEY`	Yes
Alibaba	Qwen 3.5 Plus	`alibaba`	`ALIBABA_API_KEY`	Yes
GitHub Copilot	Claude Sonnet 4.6 (via Copilot)	`copilot`	`GITHUB_COPILOT_TOKEN`	Yes
OpenAI	GPT-5.4	`openai`	`OPENAI_API_KEY`	No *
xAI	Grok-4	`xai`	`XAI_API_KEY`	No *

Migration note: AI_PROVIDER=kimi_thinking was removed. If you were using that value, switch to AI_PROVIDER=kimi — Kimi K2.5 includes thinking mode by default.

* Note: These models may not perform as well as other providers for complex DevOps reasoning tasks.

Models Not Supported

Provider	Model	Reason
DeepSeek	DeepSeek V3.2 (`deepseek-chat`)	128K context limit insufficient for heavy workflows
DeepSeek	DeepSeek R1 (`deepseek-reasoner`)	64K context limit insufficient for most workflows

Why DeepSeek is not supported: Integration testing revealed that DeepSeek's context window limitations (128K for V3.2, 64K for R1) cause failures in context-heavy operations like Kyverno policy generation, which can exceed 130K tokens. The toolkit requires 200K+ context for reliable operation across all features.

Helm Configuration

Set AI provider in your Helm values:

ai:
  provider: anthropic_haiku  # or anthropic, anthropic_opus, google, etc.

secrets:
  anthropic:
    apiKey: "your-api-key"

Or via --set:

helm install dot-ai-mcp oci://ghcr.io/vfarcic/dot-ai/charts/dot-ai:$DOT_AI_VERSION \
  --set ai.provider=anthropic_haiku \
  --set secrets.anthropic.apiKey="$ANTHROPIC_API_KEY" \
  # ... other settings

GitHub Copilot (no per-token billing)

Use your existing GitHub Copilot subscription instead of a pay-per-token API:

Important — unofficial integration notice

This provider sends requests directly to api.githubcopilot.com using VS Code-style headers (Copilot-Integration-Id: vscode-chat, Editor-Version: vscode/1.104.1). This mirrors the approach used by other third-party tools (e.g. opencode) — it is not an officially documented or supported GitHub API path.

Operators should be aware of the following before deploying:

❌ May break without notice. GitHub may change the Copilot API, required headers, or authentication model at any time.

🎯 Terms of service. Review the GitHub Copilot terms for your subscription tier (Individual, Business, or Enterprise). Third-party clients using VS Code-style headers may not be permitted under all tiers.

🔧 No official support from GitHub. Issues caused by API changes cannot be resolved through GitHub support.

# Set your GitHub OAuth token (gho_* prefix, from a Copilot-enabled account)
# Obtain one with: gh auth token  (requires gh CLI)
export GITHUB_COPILOT_TOKEN="gho_..."

helm install dot-ai-mcp oci://ghcr.io/vfarcic/dot-ai/charts/dot-ai:$DOT_AI_VERSION \
  --set ai.provider=copilot \
  --set secrets.copilot.token="$GITHUB_COPILOT_TOKEN" \
  --set secrets.auth.token="$DOT_AI_AUTH_TOKEN" \
  --set ingress.enabled=true \
  --set ingress.className="$INGRESS_CLASS_NAME" \
  --set ingress.host="dot-ai.127.0.0.1.nip.io" \
  --namespace dot-ai \
  --wait

Supported token formats: gho_* (OAuth, recommended) and ghu_* (GitHub App). Personal access tokens (github_pat_* fine-grained PATs and ghp_* classic PATs) are not supported by api.githubcopilot.com. The resolver checks GITHUB_COPILOT_TOKEN, GH_TOKEN, and GITHUB_TOKEN env vars in that priority order. Note: the gh auth token CLI fallback is not available in Kubernetes deployments where gh is not installed — supply the token explicitly via the env var.

AI Keys Are Optional: The MCP server starts successfully without AI API keys. Tools like Shared Prompts Library and REST API Gateway work without AI. AI-powered tools (deployment recommendations, remediation, pattern/policy management, capability scanning) require AI keys (unless using the host provider) and will show helpful error messages when accessed without configuration.

Retry Tuning (Optional)

The Vercel AI SDK retries transient failures (HTTP 429/5xx and network errors) with exponential backoff. The dot-ai server configures maxRetries per operation so you can trade resilience against responsiveness:

Operation	Default `maxRetries`	Helm value	Env var (runtime)
`embeddings` (single + batch)	`4`	`ai.retries.embeddings`	`DOT_AI_AI_MAX_RETRIES_EMBEDDINGS`
`chat` (single-turn `generateText`)	`2`	`ai.retries.chat`	`DOT_AI_AI_MAX_RETRIES_CHAT`
`tool_loop` (agentic multi-step)	`2`	`ai.retries.toolLoop`	`DOT_AI_AI_MAX_RETRIES_TOOL_LOOP`
`wrap_up` (final summary after a tool loop)	`1`	`ai.retries.wrapUp`	`DOT_AI_AI_MAX_RETRIES_WRAP_UP`

Set ai.retries.default (env var DOT_AI_AI_MAX_RETRIES) to override every operation with the same value. A per-operation value always wins over the global one. Values must be non-negative integers; empty or invalid values fall back to the next level (per-op, then global, then the built-in default above). Set a value to 0 to disable retries for an operation.

Configure via typed Helm values (preferred):

ai:
  retries:
    chat: "1"            # fail interactive chat fast
    embeddings: "6"      # tolerate flaky embedding endpoints

Or via --set:

helm install dot-ai-mcp oci://ghcr.io/vfarcic/dot-ai/charts/dot-ai:$DOT_AI_VERSION \
  --set ai.retries.chat="1" \
  --set ai.retries.embeddings="6" \
  # ... other settings

The chart templates these into the env vars consumed by src/core/ai-retry-config.ts. As a fallback, you can still set the env vars directly via extraEnv (useful for testing without re-rendering the chart):

extraEnv:
  - name: DOT_AI_AI_MAX_RETRIES_CHAT
    value: "1"

Verifying the rendered env vars

You can preview exactly which env vars the chart will inject before installing. With the defaults (empty values), no DOT_AI_AI_MAX_RETRIES* env vars are rendered and the per-operation defaults in src/core/ai-retry-config.ts take effect:

$ helm template dot-ai-mcp oci://ghcr.io/vfarcic/dot-ai/charts/dot-ai:$DOT_AI_VERSION \
    --set secrets.auth.token=t --set secrets.anthropic.apiKey=k \
  | grep DOT_AI_AI_MAX_RETRIES
# (no output — no retry env vars rendered, defaults from code apply)

With overrides, only the env vars you set are emitted:

$ helm template dot-ai-mcp oci://ghcr.io/vfarcic/dot-ai/charts/dot-ai:$DOT_AI_VERSION \
    --set secrets.auth.token=t --set secrets.anthropic.apiKey=k \
    --set ai.retries.chat=1 --set ai.retries.embeddings=6 \
  | grep -A1 DOT_AI_AI_MAX_RETRIES
        - name: DOT_AI_AI_MAX_RETRIES_EMBEDDINGS
          value: "6"
        - name: DOT_AI_AI_MAX_RETRIES_CHAT
          value: "1"

Embedding Provider Configuration

The DevOps AI Toolkit supports multiple embedding providers for semantic search in pattern management, capability discovery, and policy matching.

Local Embeddings (Zero-Config)

The recommended setup for new deployments. An in-cluster HuggingFace Text Embeddings Inference (TEI) service provides embeddings without any API keys.

localEmbeddings:
  enabled: true   # Deploys TEI with all-MiniLM-L6-v2 (384 dimensions)

This is already included in the Quick Start above. No additional configuration needed.

Property	Value
Model	`all-MiniLM-L6-v2` (384 dimensions)
Resource footprint	~256 MB RAM, 250m CPU (request)
GPU	Not required
Architecture	amd64 only (no ARM64/Apple Silicon — see TEI issue #769)

To customize the model or resources:

localEmbeddings:
  enabled: true
  model: "sentence-transformers/all-MiniLM-L6-v2"  # Any TEI-compatible model
  dimensions: 384          # Must match model output dimensions
  resources:
    requests:
      cpu: "250m"
      memory: "256Mi"
    limits:
      cpu: "1"
      memory: "512Mi"

To disable local embeddings (e.g., if using a cloud provider instead):

localEmbeddings:
  enabled: false

Cloud Embedding Providers

Use a cloud provider if you need higher-quality embeddings or are already paying for an API key.

Provider	EMBEDDINGS_PROVIDER	Model	Dimensions	API Key Required
Amazon Bedrock	`amazon_bedrock`	`amazon.titan-embed-text-v2:0`	1024	AWS credentials
Google	`google`	`gemini-embedding-001`	768	`GOOGLE_API_KEY`
OpenAI	`openai`	`text-embedding-3-small`	1536	`OPENAI_API_KEY`

Set the cloud embedding provider via extraEnv in your values file:

localEmbeddings:
  enabled: false   # Disable local embeddings when using a cloud provider

secrets:
  openai:
    apiKey: "your-openai-key"

# Only needed if using a non-OpenAI embedding provider:
extraEnv:
  - name: EMBEDDINGS_PROVIDER
    value: "google"
  - name: GOOGLE_API_KEY
    valueFrom:
      secretKeyRef:
        name: dot-ai-secrets
        key: google-api-key

Notes:

Same Provider: If using the same provider for both AI models and embeddings (e.g., AI_PROVIDER=google and EMBEDDINGS_PROVIDER=google), you only need to set one API key
Mixed Providers: You can use different providers for AI models and embeddings (e.g., AI_PROVIDER=anthropic with EMBEDDINGS_PROVIDER=google)
Embedding Support: Not all AI model providers support embeddings. Anthropic does not provide embeddings; use OpenAI, Google, Amazon Bedrock, or local embeddings

Switching Embedding Providers (Migration)

Switching between embedding providers (e.g., from OpenAI 1536-dim to local 384-dim) requires re-embedding all stored data because vector dimensions differ between models. A REST API endpoint handles this automatically.

Migrate all collections:

curl -X POST https://your-dot-ai-host/api/v1/embeddings/migrate

Migrate a single collection:

curl -X POST https://your-dot-ai-host/api/v1/embeddings/migrate \
  -H "Content-Type: application/json" \
  -d '{"collection": "patterns"}'

The endpoint re-embeds all points using the currently configured provider. Collections where vector dimensions already match the target are skipped. The response reports per-collection results:

{
  "success": true,
  "data": {
    "collections": [
      {
        "collection": "patterns",
        "status": "migrated",
        "previousDimensions": 1536,
        "newDimensions": 384,
        "total": 42,
        "processed": 42,
        "failed": 0
      }
    ],
    "summary": {
      "totalCollections": 1,
      "migrated": 1,
      "skipped": 0,
      "failed": 0
    }
  }
}

The migration endpoint is also available via the auto-generated CLI (dot-ai embeddings migrate).

Custom Endpoint Configuration

You can configure custom OpenAI-compatible endpoints for AI models. This enables using alternative providers like OpenRouter, self-hosted models, or air-gapped deployments.

In-Cluster Ollama Example

Deploy with a self-hosted Ollama service running in the same Kubernetes cluster:

Create a values.yaml file:

ai:
  provider: openai
  model: "llama3.3:70b"  # Your self-hosted model
  customEndpoint:
    enabled: true
    baseURL: "http://ollama-service.default.svc.cluster.local:11434/v1"

localEmbeddings:
  enabled: true   # Use local embeddings instead of OpenAI

secrets:
  customLlm:
    apiKey: "ollama"  # Ollama doesn't require authentication

Install with custom values:

helm install dot-ai-mcp oci://ghcr.io/vfarcic/dot-ai/charts/dot-ai:$DOT_AI_VERSION \
  --values values.yaml \
  --create-namespace \
  --namespace dot-ai \
  --wait

Other Self-Hosted Options

vLLM (Self-Hosted):

ai:
  provider: openai
  model: "meta-llama/Llama-3.1-70B-Instruct"
  customEndpoint:
    enabled: true
    baseURL: "http://vllm-service:8000/v1"

localEmbeddings:
  enabled: true

secrets:
  customLlm:
    apiKey: "dummy"  # vLLM may not require authentication

LocalAI (Self-Hosted):

ai:
  provider: openai
  model: "your-model-name"
  customEndpoint:
    enabled: true
    baseURL: "http://localai-service:8080/v1"

localEmbeddings:
  enabled: true

secrets:
  customLlm:
    apiKey: "dummy"

OpenRouter Example

OpenRouter provides access to 100+ LLM models from multiple providers:

ai:
  provider: openai
  model: "anthropic/claude-haiku-4.5"
  customEndpoint:
    enabled: true
    baseURL: "https://openrouter.ai/api/v1"

localEmbeddings:
  enabled: true   # OpenRouter doesn't support embeddings; use local instead

secrets:
  customLlm:
    apiKey: "sk-or-v1-your-key-here"

Get your OpenRouter API key at https://openrouter.ai/

Custom Headers

Pass custom HTTP headers to AI provider APIs using ai.customEndpoint.headers. Headers are specified as a JSON string and sent with every LLM request. Custom headers are merged with provider-specific defaults (e.g., the Anthropic beta header), with your custom headers taking precedence on conflicts.

This is useful for enterprise LLM gateways and proxies that require authentication, versioning, or routing headers.

Example — corporate proxy with authentication headers:

ai:
  provider: anthropic
  customEndpoint:
    enabled: true
    baseURL: "https://proxy.corp.example.com/anthropic"
    headers: '{"x-api-version": "2026-02-20", "x-proxy-auth": "token123"}'

secrets:
  anthropic:
    apiKey: "your-anthropic-key"

Notes:

headers must be valid JSON (e.g., '{"key": "value"}'). Invalid JSON is ignored with a warning.
Headers apply to LLM requests only, not embeddings.
Custom headers override provider defaults when the same header key is used.

Anthropic Proxies with Bearer Authentication

The Anthropic API uses x-api-key for authentication by default. If your corporate proxy requires Authorization: Bearer authentication instead, include an Authorization header in your custom headers. dot-ai automatically detects this and switches the Anthropic SDK to Bearer token mode.

ai:
  provider: anthropic
  customEndpoint:
    enabled: true
    baseURL: "https://proxy.corp.example.com/anthropic/v1"
    headers: '{"Authorization": "Bearer your-proxy-token", "version": "2026-02-20"}'

The Bearer token value comes from the Authorization header you provide. The Authorization header is handled by the SDK's auth mechanism and won't be duplicated in the request. Other providers (OpenAI, Google, xAI) already use Bearer auth by default, so this only applies to Anthropic.

Native Provider with Custom Base URL

By default, setting a custom base URL without specifying a provider routes requests through an OpenAI-compatible endpoint. If your proxy fronts a non-OpenAI provider (e.g., Anthropic), you can preserve native provider features — such as cache control, extended context, and native tool calling — by explicitly setting the provider.

Example — Anthropic proxy with native features preserved:

ai:
  provider: anthropic  # Preserves Anthropic-specific features (cache control, extended context)
  customEndpoint:
    enabled: true
    baseURL: "https://proxy.corp.example.com/anthropic"

secrets:
  anthropic:
    apiKey: "your-anthropic-key"

Without provider: anthropic, this would fall back to OpenAI-compatible mode and lose Anthropic-specific capabilities. The same pattern works for other providers (openai, google, xai).

Important Notes

Context window: 200K+ tokens recommended
Output tokens: 8K+ tokens minimum
Function calling: Must support OpenAI-compatible function calling

Testing Status:

Validated with OpenRouter (alternative SaaS provider)
Not yet tested with self-hosted Ollama, vLLM, or LocalAI
We need your help testing! Report results in issue #193

Notes:

For embeddings, use localEmbeddings.enabled=true (recommended) or set an OpenAI/Google/Bedrock API key. See Embedding Provider Configuration.
If model requirements are too high for your setup, please open an issue
Configuration examples are based on common patterns but not yet validated

MCP Server Integration

Extend dot-ai tools with capabilities from external MCP servers running in your cluster. Instead of building custom integrations for each observability or infrastructure platform, dot-ai connects as an MCP client to discover and use tools from any compatible MCP server.

Any MCP server that supports HTTP transport can be connected — Prometheus, Jaeger, Grafana, Datadog, or any other server from the MCP ecosystem. The examples below use Prometheus, but the configuration pattern is the same for any server.

How It Works

┌──────────────────────────────────────────────────────────────┐
│                      Your Cluster                            │
│                                                              │
│  ┌─────────────┐     MCP Protocol     ┌──────────────────┐  │
│  │   dot-ai    │◄────────────────────►│  Prometheus MCP  │  │
│  │ (MCP Client)│                      │    Server        │  │
│  │             │                      └──────────────────┘  │
│  │  remediate  │     MCP Protocol     ┌──────────────────┐  │
│  │   operate   │◄────────────────────►│   Jaeger MCP     │  │
│  │    query    │                      │    Server        │  │
│  └─────────────┘                      └──────────────────┘  │
│                                                              │
└──────────────────────────────────────────────────────────────┘

You deploy MCP servers in your cluster (dot-ai does not manage their lifecycle)
You configure mcpServers in Helm values with each server's endpoint
dot-ai connects to each server at startup and discovers available tools
The attachTo field controls which dot-ai tools can use each server's tools
During AI analysis, tools from MCP servers are used alongside existing tools automatically

Configuration

Add MCP servers to your Helm values:

mcpServers:
  prometheus:
    enabled: true
    endpoint: "http://prometheus-mcp.monitoring.svc:8080/mcp"
    attachTo:
      - remediate
      - query
  jaeger:
    enabled: true
    endpoint: "http://jaeger-mcp.tracing.svc:3000/mcp"
    attachTo:
      - remediate

Configuration Reference

Each entry under mcpServers has the following fields:

Field	Type	Required	Description
`enabled`	boolean	Yes	Whether to connect to this MCP server
`endpoint`	string	Yes	Full URL of the MCP server endpoint (must be reachable from dot-ai pod)
`attachTo`	string[]	Yes	Which dot-ai tools can use this server's tools. Valid values: `remediate`, `operate`, `query`

Tool Namespacing

Tools from MCP servers are automatically namespaced as {server}__{tool} to avoid collisions. For example, a Prometheus MCP server configured as prometheus with a tool named execute_query becomes prometheus__execute_query in dot-ai.

Prometheus Example

This example shows how to integrate a Prometheus MCP server so that remediate and query can use Prometheus metrics during analysis.

Prerequisites:

Prometheus running in your cluster (e.g., via prometheus-community/prometheus Helm chart)
A Prometheus MCP server deployed and accessible (e.g., pab1it0/prometheus-mcp-server)

Deploy the Prometheus MCP server (example using a Kubernetes Deployment):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus-mcp
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus-mcp
  template:
    metadata:
      labels:
        app: prometheus-mcp
    spec:
      containers:
      - name: prometheus-mcp
        image: ghcr.io/pab1it0/prometheus-mcp-server:latest
        ports:
        - containerPort: 8080
        env:
        - name: PROMETHEUS_URL
          value: "http://prometheus-server.monitoring.svc:80"
        - name: MCP_TRANSPORT
          value: "http"
        - name: PORT
          value: "8080"
---
apiVersion: v1
kind: Service
metadata:
  name: prometheus-mcp
  namespace: monitoring
spec:
  selector:
    app: prometheus-mcp
  ports:
  - port: 8080
    targetPort: 8080

Configure dot-ai to connect:

mcpServers:
  prometheus:
    enabled: true
    endpoint: "http://prometheus-mcp.monitoring.svc:8080/mcp"
    attachTo:
      - remediate
      - query

Once configured, the remediate tool can correlate cluster events with Prometheus metrics (CPU/memory trends, error rates) for more accurate root cause analysis. The query tool can answer questions like "what's the memory usage trend for my-api?" using live Prometheus data.

Verifying MCP Server Connections

After deployment, verify MCP server connections using the version/status tool:

Show dot-ai status

The response includes an mcpServers section showing connected servers, their endpoints, attached operations, and discovered tool count. Use this to confirm servers connected successfully.

Startup Behavior

No MCP servers configured (default): dot-ai starts normally without MCP server augmentation.
MCP servers configured: dot-ai connects to each enabled server at startup, discovers tools, and makes them available to the configured operations.
MCP server unreachable: Startup fails fast with a clear error message. Configured MCP servers must be reachable — there is no background retry. Fix the endpoint or disable the server entry to proceed.

TLS Configuration

HTTPS is required for OAuth authentication (dex.enabled: true). If you only use static token authentication, HTTPS is optional.

Ingress-Level TLS (cert-manager)

To terminate TLS at the Ingress (requires cert-manager with a ClusterIssuer):

ingress:
  tls:
    enabled: true
    clusterIssuer: letsencrypt  # Your ClusterIssuer name

Then update your .mcp.json URL to use https://.

Reverse Proxy / External TLS Termination

If TLS terminates upstream (Traefik, nginx, cloud load balancer), leave ingress.tls.enabled: false and set the external URLs explicitly:

externalUrl: "https://dot-ai.example.com"
dex:
  enabled: true
  externalUrl: "https://dex.dot-ai.example.com"

Web UI Visualization

Enable rich visualizations of query results by connecting to a DevOps AI Web UI instance.

When configured, the query tool includes a visualizationUrl field in responses that opens interactive visualizations (resource topology, relationships, health status) in your browser.

Configuration

Add the Web UI base URL to your Helm values:

webUI:
  baseUrl: "https://dot-ai-ui.example.com"  # Your Web UI instance URL

Or via --set:

helm install dot-ai-mcp oci://ghcr.io/vfarcic/dot-ai/charts/dot-ai:$DOT_AI_VERSION \
  --set webUI.baseUrl="https://dot-ai-ui.example.com" \
  # ... other settings

Feature Toggle Behavior

Not configured (default): Query responses contain only text summaries. No visualizationUrl field is included.
Configured: Query responses include a visualizationUrl field (format: {baseUrl}/v/{sessionId}) that opens the visualization in the Web UI.

Example Query Response

When webUI.baseUrl is configured, query responses include:

**View visualization**: https://dot-ai-ui.example.com/v/abc123-session-id

This URL opens an interactive visualization of the query results in the Web UI.

Gateway API (Alternative to Ingress)

For Kubernetes 1.26+, you can use Gateway API v1 for advanced traffic management with role-oriented design (platform teams manage Gateways, app teams create routes).

When to Use

Use Gateway API when:

Running Kubernetes 1.26+ with Gateway API support
Need advanced routing (weighted traffic, header-based routing)
Prefer separation of infrastructure and application concerns

Use Ingress when:

Running Kubernetes < 1.26
Simpler requirements met by Ingress features

Prerequisites

Kubernetes 1.26+ cluster
Gateway API CRDs installed: kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.4.1/standard-install.yaml
Gateway controller running (Istio, Envoy Gateway, Kong, etc.)
Existing Gateway resource created by platform team (reference pattern)

Quick Start (Reference Pattern - RECOMMENDED)

Reference an existing platform-managed Gateway:

helm install dot-ai-mcp oci://ghcr.io/vfarcic/dot-ai/charts/dot-ai:$DOT_AI_VERSION \
  --set secrets.anthropic.apiKey="$ANTHROPIC_API_KEY" \
  --set secrets.auth.token="$DOT_AI_AUTH_TOKEN" \
  --set localEmbeddings.enabled=true \
  --set ingress.enabled=false \
  --set gateway.name="cluster-gateway" \
  --set gateway.namespace="gateway-system" \
  --namespace dot-ai \
  --wait

Configuration Reference

# Reference pattern (RECOMMENDED)
gateway:
  name: "cluster-gateway"           # Existing Gateway name
  namespace: "gateway-system"       # Gateway namespace (optional)
  timeouts:
    request: "3600s"                # SSE streaming timeout
    backendRequest: "3600s"

# Creation pattern (development/testing only)
gateway:
  create: true                      # Create Gateway (NOT for production)
  className: "istio"                # GatewayClass name

Complete Guide

See Gateway API Deployment Guide for:

Platform team Gateway setup (HTTP and HTTPS)
Application team deployment steps
Cross-namespace access (ReferenceGrant)
Development/testing creation pattern
Troubleshooting and verification
Migration from Ingress

Next Steps

Once the server is running:

1. Configure Authentication & Authorization

Authentication — Understand static token vs OAuth, enable Dex for per-user identity, or connect your identity provider
Authorization (RBAC) — Control what each user can do with per-user and per-group permissions (requires OAuth)

2. Explore Tools

Tools Overview — Complete guide to all available tools, how they work together, and recommended usage flow

3. Enable Observability (Optional)

Observability Guide — Distributed tracing with OpenTelemetry for debugging workflows, measuring AI performance, and monitoring Kubernetes operations

4. Connect MCP Servers (Optional)

MCP Server Integration — Augment dot-ai tools with capabilities from external MCP servers (Prometheus, Jaeger, etc.)

5. Production Considerations

Consider backup strategies for vector database content (organizational patterns and capabilities)
Review TLS Configuration for HTTPS

Support

Bug Reports: GitHub Issues

Overview​

What You Get​

Prerequisites​

Quick Start (5 Minutes)​

Step 1: Set Environment Variables​

Step 2: Install the Controller​

Step 3: Install the Server​

Step 4: Connect a Client​

Capability Scanning for AI Recommendations​

Enabling Capability Scanning​

AI Model Configuration​

Model Requirements​

Available Models​

Models Not Supported​

Helm Configuration​

GitHub Copilot (no per-token billing)​

Retry Tuning (Optional)​

Verifying the rendered env vars​

Embedding Provider Configuration​

Local Embeddings (Zero-Config)​

Cloud Embedding Providers​

Switching Embedding Providers (Migration)​

Custom Endpoint Configuration​

In-Cluster Ollama Example​

Other Self-Hosted Options​

OpenRouter Example​

Custom Headers​

Anthropic Proxies with Bearer Authentication​

Native Provider with Custom Base URL​

Important Notes​

MCP Server Integration​

How It Works​

Configuration​

Configuration Reference​

Tool Namespacing​

Prometheus Example​

Verifying MCP Server Connections​

Startup Behavior​

TLS Configuration​

Ingress-Level TLS (cert-manager)​

Reverse Proxy / External TLS Termination​

Web UI Visualization​

Configuration​

Feature Toggle Behavior​

Example Query Response​

Gateway API (Alternative to Ingress)​

When to Use​

Prerequisites​

Quick Start (Reference Pattern - RECOMMENDED)​

Configuration Reference​

Complete Guide​

Next Steps​

1. Configure Authentication & Authorization​

2. Explore Tools​

3. Enable Observability (Optional)​

4. Connect MCP Servers (Optional)​

5. Production Considerations​

Support​

Overview

What You Get

Prerequisites

Quick Start (5 Minutes)

Step 1: Set Environment Variables

Step 2: Install the Controller

Step 3: Install the Server

Step 4: Connect a Client

Capability Scanning for AI Recommendations

Enabling Capability Scanning

AI Model Configuration

Model Requirements

Available Models

Models Not Supported

Helm Configuration

GitHub Copilot (no per-token billing)

Retry Tuning (Optional)

Verifying the rendered env vars

Embedding Provider Configuration

Local Embeddings (Zero-Config)

Cloud Embedding Providers

Switching Embedding Providers (Migration)

Custom Endpoint Configuration

In-Cluster Ollama Example

Other Self-Hosted Options

OpenRouter Example

Custom Headers

Anthropic Proxies with Bearer Authentication

Native Provider with Custom Base URL

Important Notes

MCP Server Integration

How It Works

Configuration

Configuration Reference

Tool Namespacing

Prometheus Example

Verifying MCP Server Connections

Startup Behavior

TLS Configuration

Ingress-Level TLS (cert-manager)

Reverse Proxy / External TLS Termination

Web UI Visualization

Configuration

Feature Toggle Behavior

Example Query Response

Gateway API (Alternative to Ingress)

When to Use

Prerequisites

Quick Start (Reference Pattern - RECOMMENDED)

Configuration Reference

Complete Guide

Next Steps

1. Configure Authentication & Authorization

2. Explore Tools

3. Enable Observability (Optional)

4. Connect MCP Servers (Optional)

5. Production Considerations

Support