Knowledge Base Guide

Complete guide for ingesting and searching organizational documentation with the DevOps AI Toolkit.

Overview

Knowledge Base provides semantic search over organizational documentation. Documents are ingested, chunked, and stored in a vector database. Users can then search this knowledge base using natural language queries through the manageKnowledge MCP tool.

What is Knowledge Base?

Knowledge Base allows you to:

Ingest documents into a searchable vector store
Search semantically using natural language to find relevant content by meaning, not just keywords
Maintain provenance with full source URLs so users can verify and explore original documents

How It Works

Ingestion: Documents are split into chunks (~1000 characters), embedded, and stored in Qdrant
Search: Queries are embedded and matched against stored chunks by semantic similarity
Results: Matching chunks are returned with source URI, relevance score, and metadata

Key Concepts

Concept	Description
URI	Full URL identifying the source document (e.g., `https://github.com/org/repo/blob/main/docs/guide.md`)
Chunk	A segment of a document stored with its embedding for semantic search
Semantic Search	Finding content by meaning similarity, not just keyword matching

Prerequisites

Before using Knowledge Base, ensure you have completed one of the following setups:

Stack Installation (recommended) - Installs all components including the controller, MCP server, and Qdrant
MCP Setup Guide - Individual component installation with integrated Qdrant

Both setups include the required Vector DB (Qdrant). You also need an embedding provider API key - see Embedding Provider Configuration.

Knowledge Base Operations

The manageKnowledge MCP tool supports three operations: ingest, search, and deleteByUri.

Ingesting Documents

There are two ways to ingest documents into the knowledge base:

Option 1: Controller (Recommended for Git Repositories)

For documentation stored in Git repositories, use the dot-ai-controller. Create a KnowledgeSource CR pointing to your repo, and the controller automatically:

Syncs markdown files from specified paths
Keeps the knowledge base up to date on a schedule
Handles updates and deletions automatically

This is "set and forget" - once configured, your Git documentation stays in sync.

Option 2: Via AI Agent (For Ad-Hoc Ingestion)

For documents outside Git, or when you need to quickly add something, use your MCP-enabled AI assistant (Claude Code, Cursor, etc.). Simply ask it to ingest content in natural language.

What you can ingest:

Local files: Point to files in your project or filesystem
Pasted text: Copy-paste content directly into the chat
Multiple documents: Ask the agent to ingest several files at once
Any text format: Not limited to markdown - any text content works

Example:

Ingest docs/guides/mcp-recommendation-guide.md

Response:

Successfully ingested docs/guides/mcp-recommendation-guide.md into the knowledge base.
The document was split into 33 chunks and stored with URI
https://github.com/vfarcic/dot-ai/blob/main/docs/guides/mcp-recommendation-guide.md.

What happened behind the scenes:

AI assistant read the file content
Constructed the full URI from the file path
Called manageKnowledge with operation: "ingest"
Document was chunked (~1000 characters per chunk) and stored with embeddings

Searching the Knowledge Base

Search for relevant information using natural language queries. The system finds semantically similar content and synthesizes an answer.

Example:

Search knowledgebase to explain how dot-ai recommendations work.

Response:

How dot-ai Recommendations Work

The dot-ai recommendation system provides AI-powered Kubernetes deployment
recommendations through a multi-stage workflow:

Core Mechanism

1. Intent Refinement - When a user provides a vague intent (e.g., "deploy my app"),
   the system provides guidance for the client agent (Claude Code, Cursor, etc.)
   to analyze the project and gather details before proceeding.
2. Semantic Capability Matching - Instead of treating CRDs like
   sqls.devopstoolkit.live as meaningless names, the system understands what
   capabilities each resource provides through semantic search.
3. Capability-Enhanced Discovery - When you say "I need a PostgreSQL database":
   - Semantic search finds resources with matching capability context
   - AI gets pre-filtered, semantically relevant resources
   - Optimal matches appear as top suggestions with rich context
4. Organizational Pattern Integration - Recommendations automatically include
   your team's best practices and standards stored as patterns.
5. Helm Fallback - When no cluster capability matches, the system searches
   ArtifactHub for Helm charts and presents installation options.

Workflow Stages

The recommendation flow uses these stages:
- recommend - Initial intent analysis and solution generation
- chooseSolution - Select from generated alternatives
- answerQuestion:required/basic/advanced/open - Guided configuration collection
- generateManifests - Creates ready-to-deploy Kubernetes YAML
- deployManifests - Applies resources to the cluster

Sources:
- https://github.com/vfarcic/dot-ai/blob/main/docs/guides/mcp-recommendation-guide.md

What happened behind the scenes:

Query was embedded and matched against stored chunks by semantic similarity
Relevant chunks were retrieved from the knowledge base
AI synthesized a coherent answer from the chunks
Source URIs were included for verification

Deleting Documents

Remove all chunks for a document by its URI. This is useful when a document is updated (delete then re-ingest) or removed from the source.

Note: For Git repositories, the dot-ai-controller handles document lifecycle automatically - it detects changes, deletes outdated chunks, and re-ingests updated content.

Example (manual deletion):

Delete the recommendation guide from the knowledge base
(URI: https://github.com/vfarcic/dot-ai/blob/main/docs/guides/mcp-recommendation-guide.md)

Response:

Done. Deleted 33 chunks for the recommendation guide from the knowledge base.

What happened behind the scenes:

All chunks matching the URI were found in the knowledge base
Each chunk was deleted from the vector database
The document is no longer searchable

Troubleshooting

Check Knowledge Base Status

To verify the knowledge base is working and see how many documents are stored:

Show dot-ai status

Look for the knowledgeBase collection in the response:

Collections

| Collection    | Status   | Documents |
|---------------|----------|-----------|
| knowledgeBase | ✅ Ready | 156       |

If the collection shows ⚠️ Not created, no documents have been ingested yet. If it shows an error, check your Vector DB connection and embedding provider configuration.

FAQ

Q: How do I update a document? A: Simply re-ingest it. The system automatically deletes existing chunks for the URI before storing new ones.

Q: How are documents chunked? A: Documents are split into chunks of approximately 1000 characters with 200 character overlap to preserve context across chunk boundaries.

Q: Can I search across multiple repositories? A: Yes. All ingested documents are stored in the same knowledge base regardless of source. Search returns results from all sources, with URIs indicating the origin.

Overview​

What is Knowledge Base?​

How It Works​

Key Concepts​

Prerequisites​

Knowledge Base Operations​

Ingesting Documents​

Option 1: Controller (Recommended for Git Repositories)​

Option 2: Via AI Agent (For Ad-Hoc Ingestion)​

Searching the Knowledge Base​

Deleting Documents​

Troubleshooting​

Check Knowledge Base Status​

FAQ​

See Also​

Overview

What is Knowledge Base?

How It Works

Key Concepts

Prerequisites

Knowledge Base Operations

Ingesting Documents

Option 1: Controller (Recommended for Git Repositories)

Option 2: Via AI Agent (For Ad-Hoc Ingestion)

Searching the Knowledge Base

Deleting Documents

Troubleshooting

Check Knowledge Base Status

FAQ

See Also