Skip to main content

Knowledge Source Guide

This guide covers the GitKnowledgeSource CRD for automatically syncing documentation from Git repositories into the DevOps AI Toolkit knowledge base.

Overview

The GitKnowledgeSource enables:

  • Document Ingestion: Automatically syncs markdown and other files to the knowledge base
  • Change Detection: Only processes files changed since the last sync
  • Scheduled Sync: Periodically re-syncs to capture updates
  • Automatic Cleanup: Removes documents from knowledge base when the resource is deleted

Once documents are synced, they become searchable through the DevOps AI Toolkit's semantic search capabilities.

Stack Installation

If you installed via the DevOps AI Toolkit Stack, you can create GitKnowledgeSource resources immediately. Verify the CRD is available:

kubectl get crds gitknowledgesources.dot-ai.devopstoolkit.live

Continue below to configure a GitKnowledgeSource for your documentation.

Prerequisites

Quick Start

  1. Ensure the MCP authentication secret exists:
kubectl get secret dot-ai-secrets -n dot-ai

If not, create it:

kubectl create secret generic dot-ai-secrets \
--namespace dot-ai \
--from-literal=auth-token=your-auth-token-here
  1. Create a GitKnowledgeSource to sync documentation from a Git repository:
apiVersion: dot-ai.devopstoolkit.live/v1alpha1
kind: GitKnowledgeSource
metadata:
name: my-docs
namespace: dot-ai
spec:
repository:
url: https://github.com/your-org/your-repo.git
branch: main
paths:
- "docs/**/*.md"
- "README.md"
mcpServer:
url: http://dot-ai.dot-ai.svc:3456
authSecretRef:
name: dot-ai-secrets
key: auth-token
  1. Apply it:
kubectl apply -f gitknowledgesource.yaml
  1. Check the sync status:
kubectl get gitknowledgesource my-docs -n dot-ai

Expected output:

NAME      ACTIVE   DOCUMENTS   LAST SYNC              NEXT SYNC
my-docs true 9 2026-02-05T16:40:14Z 2026-02-06T16:40:14Z

How It Works

Sync Process

  1. Clone: Controller performs a shallow clone of the repository
  2. Pattern Match: Finds files matching paths patterns, excluding exclude patterns
  3. Change Detection: Compares current commit with lastSyncedCommit to find changed files
  4. Ingest: Sends changed documents to MCP knowledge base with sourceIdentifier
  5. Cleanup: Deletes the local clone (no persistent storage required)
  6. Schedule: Queues next sync based on schedule field

First Sync vs Incremental Sync

  • First sync: Processes all matching files (full sync)
  • Subsequent syncs: Only processes files changed since lastSyncedCommit
  • Spec changes: Modifying paths or other spec fields triggers a full sync

What Gets Synced

Each document is ingested to MCP with:

  • Content: The file contents
  • URI: https://github.com/{org}/{repo}/blob/{branch}/{path}
  • Source Identifier: {namespace}/{name} for bulk operations
  • Custom Metadata: Values from spec.metadata field

Cleanup on Deletion

When a GitKnowledgeSource is deleted:

  1. Controller detects deletion via finalizer
  2. Checks deletionPolicy (Delete or Retain)
  3. If Delete: Calls MCP to remove all documents with matching sourceIdentifier
  4. Removes finalizer, allowing CR deletion to complete

Configuration

Spec Fields

FieldTypeRequiredDefaultDescription
repository.urlstringYes-Git repository URL (HTTPS only)
repository.branchstringNomainBranch to sync
repository.depthintNo1Shallow clone depth
repository.secretRefSecretReferenceNo-Secret with token for private repos
paths[]stringYes-Glob patterns for files to sync (e.g., docs/**/*.md)
exclude[]stringNo-Glob patterns to exclude
schedulestringNo@every 24hSync schedule (cron or interval)
mcpServer.urlstringYes-MCP server endpoint URL
mcpServer.authSecretRefSecretReferenceYes-Secret with MCP auth token
metadatamap[string]stringNo-Custom metadata attached to all documents
maxFileSizeBytesintNo-Skip files larger than this size
deletionPolicystringNoDeleteDelete or Retain documents on CR deletion

Repository Authentication

For private repositories, create a secret with a personal access token:

kubectl create secret generic github-token \
--namespace dot-ai \
--from-literal=token=ghp_xxxxxxxxxxxx

Reference it in the GitKnowledgeSource:

spec:
repository:
url: https://github.com/your-org/private-repo.git
secretRef:
name: github-token
key: token

Path Patterns

The paths field uses glob patterns to match files:

PatternMatches
docs/**/*.mdAll markdown files under docs/ recursively
README.mdOnly the root README
**/*.mdAll markdown files in the repository
docs/*.mdMarkdown files directly in docs/ (not subdirectories)

Use exclude to skip specific paths:

spec:
paths:
- "docs/**/*.md"
exclude:
- "docs/internal/**"
- "docs/drafts/**"

Schedule Configuration

The schedule field accepts cron expressions or interval syntax:

FormatExampleDescription
Interval@every 24hSync every 24 hours (default)
Interval@every 6hSync every 6 hours
Interval@every 30mSync every 30 minutes
Cron0 3 * * *Daily at 3:00 AM
Cron0 */6 * * *Every 6 hours

The default @every 24h means each GitKnowledgeSource syncs 24 hours after its last sync, naturally staggering syncs based on creation time.

Invalid schedules: If you specify an invalid schedule expression, the controller will sync once, then set a ScheduleError condition and stop scheduling. Fix the schedule to resume.

File Size Limits

Use maxFileSizeBytes to skip large files:

spec:
maxFileSizeBytes: 1048576 # 1MB limit

Skipped files appear in the status:

kubectl get gitknowledgesource my-docs -n dot-ai -o jsonpath='{.status.skippedFiles}' | jq

Deletion Policy

The deletionPolicy controls what happens when the GitKnowledgeSource is deleted:

ValueBehavior
Delete (default)Remove all synced documents from MCP knowledge base
RetainKeep documents in MCP (useful for migrations)
spec:
deletionPolicy: Retain # Keep docs when CR is deleted

Status

Check the status to monitor sync progress:

kubectl get gitknowledgesource my-docs -n dot-ai -o yaml

Status Fields

FieldDescription
activeWhether the source is actively syncing
documentCountTotal documents synced to MCP
lastSyncTimeTimestamp of last successful sync
lastSyncedCommitGit commit SHA of last sync
nextScheduledSyncWhen the next sync will occur
skippedDocumentsCount of files skipped (e.g., size limit)
skippedFilesDetails of skipped files with reasons
syncErrorsCount of sync errors
lastErrorMost recent error message
observedGenerationLast processed spec generation
conditionsStandard Kubernetes conditions

Conditions

TypeDescription
ReadyTrue when source is active and configured correctly
SyncedTrue when last sync completed successfully
ScheduledTrue when next sync is scheduled

Example Status

status:
active: true
documentCount: 9
lastSyncTime: "2026-02-05T16:40:14Z"
lastSyncedCommit: "c32655af7f70361835a533e57533caaf4e8b750a"
nextScheduledSync: "2026-02-06T16:40:14Z"
conditions:
- type: Ready
status: "True"
reason: Active
message: "GitKnowledgeSource is active and syncing"
- type: Synced
status: "True"
reason: SyncComplete
message: "Successfully synced 9 documents"
- type: Scheduled
status: "True"
reason: Scheduled
message: "Next sync scheduled for 2026-02-06T16:40:14Z"

Troubleshooting

Sync Not Starting

Check the Ready condition:

kubectl get gitknowledgesource my-docs -n dot-ai -o jsonpath='{.status.conditions}' | jq

Common issues:

  • CloneError: Invalid repository URL or authentication failure
  • MCP unreachable: Check MCP server URL and network connectivity
  • Missing secret: Verify auth secret exists and has correct keys

Clone Errors

If you see "read-only file system" errors:

  • Ensure the controller deployment has a writable /tmp volume mount

If you see authentication errors for private repos:

  • Verify the secret exists: kubectl get secret <name> -n dot-ai
  • Check the token has read access to the repository
  • Ensure secretRef.key matches the key in the secret
  1. Check sync completed successfully:
kubectl get gitknowledgesource my-docs -n dot-ai -o jsonpath='{.status.documentCount}'
  1. Verify MCP is running:
kubectl get pods -n dot-ai -l app=dot-ai
  1. Check for sync errors:
kubectl get gitknowledgesource my-docs -n dot-ai -o jsonpath='{.status.lastError}'

Schedule Not Working

Check the Scheduled condition:

kubectl get gitknowledgesource my-docs -n dot-ai -o jsonpath='{.status.conditions}' | jq '.[] | select(.type=="Scheduled")'

If ScheduleError, the schedule expression is invalid. Fix the spec.schedule field.

Git Provider Compatibility

GitKnowledgeSource uses standard Git HTTPS protocol and should work with any Git provider:

  • GitHub
  • GitLab
  • Bitbucket
  • Gitea
  • Self-hosted Git servers

Testing has been performed primarily with GitHub. If you encounter issues with other providers, please report them on GitHub.

Next Steps