Solution CRD Guide

Video: Kubernetes ownerReferences and Resource Grouping

This video explains the problem of understanding what resources compose an application in Kubernetes, explores how Kubernetes ownership and ownerReferences work for garbage collection, and demonstrates how the Solution CRD provides a better approach by wrapping related resources into logical groups with status, context, and relationships.

Overview

The Solution CRD is a Kubernetes Custom Resource Definition that tracks deployed solutions and their constituent resources. It acts as a parent resource that groups all Kubernetes resources (Deployments, Services, ConfigMaps, etc.) that compose a logical solution, preserving metadata and context not available in individual resources.

Purpose

When deploying applications to Kubernetes, the Solution CRD provides:

Resource Grouping: Links related Kubernetes resources into a logical solution
Intent Preservation: Stores the original user intent that led to the deployment
Metadata Storage: Captures information not available in individual resources:
- Deployment rationale and decision-making context
- Configuration trade-offs and choices
- Documentation links
- Patterns and policies applied
Health Monitoring: Aggregates health status across all tracked resources
Automatic Cleanup: Deleting a Solution CR automatically deletes all child resources

Key Design Principles

Kubernetes-Native: Uses standard Kubernetes patterns and conventions
Metadata Store: Captures information NOT in individual resources
Automatic Ownership: Controller dynamically manages ownerReferences
Lifecycle Management: Tracks solution state from deployment through operation

Prerequisites

Controller installed (see Setup Guide)

The Solution CRD is automatically installed with the controller. Verify it's available:

# Verify Solution CRD is installed
kubectl get crds solutions.dot-ai.devopstoolkit.live

# Check controller is running
kubectl get pods --selector app.kubernetes.io/name=dot-ai-controller --namespace dot-ai

Solution CRD Schema

Spec Fields

spec:
  # Original user intent that led to this deployment (required)
  intent: string

  # Solution metadata (information not in individual resources)
  context:
    createdBy: string       # Tool or user that created this solution
    rationale: string       # Why this solution was deployed this way
    patterns: []string      # Organizational patterns applied
    policies: []string      # Policies applied to this solution

  # List of Kubernetes resources that compose this solution (required)
  resources:
    - apiVersion: string    # e.g., "apps/v1"
      kind: string          # e.g., "Deployment"
      name: string          # Resource name
      namespace: string     # Optional for cluster-scoped resources

  # Documentation URL (optional)
  documentationURL: string  # Link to deployment documentation

Status Fields

status:
  # Overall state of the solution
  state: string  # pending, deployed, degraded, failed

  # Generation tracking
  observedGeneration: int64

  # Resource health summary
  resources:
    total: int     # Total resources tracked
    ready: int     # Resources that are ready
    failed: int    # Resources that have failed

  # Standard Kubernetes conditions
  conditions:
    - type: Ready
      status: "True" | "False" | "Unknown"
      reason: string
      message: string

Quick Start: Your First Solution

Let's create a simple web application with a PostgreSQL database and track it with a Solution CR.

Step 1: Create a Namespace

kubectl create namespace my-app

Step 2: Deploy Application Resources

Deploy your application components (Deployment, Service, etc.):

kubectl apply --filename - <<'EOF'
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
  namespace: my-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
      - name: httpd
        image: httpd:2.4-alpine
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: web-app-service
  namespace: my-app
spec:
  selector:
    app: web
  ports:
  - port: 80
    targetPort: 80
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgresql
  namespace: my-app
spec:
  serviceName: postgresql
  replicas: 1
  selector:
    matchLabels:
      app: postgresql
  template:
    metadata:
      labels:
        app: postgresql
    spec:
      containers:
      - name: postgresql
        image: postgres:13-alpine
        env:
        - name: POSTGRES_PASSWORD
          value: secretpassword
        - name: POSTGRES_DB
          value: appdb
        ports:
        - containerPort: 5432
        volumeMounts:
        - name: data
          mountPath: /var/lib/postgresql/data
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 1Gi
---
apiVersion: v1
kind: Service
metadata:
  name: postgresql
  namespace: my-app
spec:
  clusterIP: None
  selector:
    app: postgresql
  ports:
  - port: 5432
    targetPort: 5432
EOF

Step 3: Create a Solution CR

Now create a Solution CR that tracks all these resources:

kubectl apply --filename - <<'EOF'
apiVersion: dot-ai.devopstoolkit.live/v1alpha1
kind: Solution
metadata:
  name: my-web-app
  namespace: my-app
spec:
  # Describe what this solution is for
  intent: "Deploy a web application with PostgreSQL database for production workloads"

  # Add context about this deployment
  context:
    createdBy: "platform-team"
    rationale: "Deployed to meet Q1 customer requirements. PostgreSQL chosen for ACID compliance."
    patterns:
      - "stateful-workload"
      - "multi-tier-application"
    policies:
      - "production-sla"
      - "data-retention-90d"

  # List all resources that compose this solution
  resources:
    - apiVersion: apps/v1
      kind: Deployment
      name: web-app
      namespace: my-app
    - apiVersion: v1
      kind: Service
      name: web-app-service
      namespace: my-app
    - apiVersion: apps/v1
      kind: StatefulSet
      name: postgresql
      namespace: my-app
    - apiVersion: v1
      kind: Service
      name: postgresql
      namespace: my-app
EOF

Step 4: Verify the Solution

Check that the Solution was created and the controller has processed it:

# View the Solution
kubectl get solutions --namespace my-app

# Get detailed status
kubectl get solution my-web-app --namespace my-app --output yaml

# View controller logs
kubectl logs --selector app.kubernetes.io/name=dot-ai-controller --namespace dot-ai --tail 30

Expected output:

NAME         INTENT                                       STATE      RESOURCES   AGE
my-web-app   Deploy a web application with PostgreSQL...  deployed   4/4         2m

Step 5: Verify ownerReferences Were Added

The controller automatically adds ownerReferences to all tracked resources:

# Check ownerReference on Deployment
kubectl get deployment web-app --namespace my-app --output jsonpath='{.metadata.ownerReferences}' | jq

# Check ownerReference on Service
kubectl get service web-app-service --namespace my-app --output jsonpath='{.metadata.ownerReferences}' | jq

You should see ownerReferences pointing to the Solution CR:

[
  {
    "apiVersion": "dot-ai.devopstoolkit.live/v1alpha1",
    "kind": "Solution",
    "name": "my-web-app",
    "uid": "...",
    "controller": true,
    "blockOwnerDeletion": true
  }
]

Understanding Solution Status

The Solution controller monitors all tracked resources and updates the status:

Status States

pending: Initial state or resources not yet ready
deployed: All resources are healthy and ready
degraded: Some resources are unhealthy or missing
failed: Critical failure in resource deployment

Health Checking

The controller uses multiple strategies to determine resource health:

Conditions (highest priority): Checks for Ready, Available, Healthy, or Synced conditions
Replica Counts: For Deployments, StatefulSets, DaemonSets - compares readyReplicas vs desired
Existence (fallback): Resource exists in the cluster

Status Updates

The controller reconciles every 30 seconds to keep status current:

# Watch status updates
kubectl get solutions --namespace my-app --watch

# Get detailed status with conditions
kubectl get solution my-web-app --namespace my-app --output jsonpath='{.status}' | jq

Testing Health Monitoring

Let's test how the controller detects unhealthy resources:

Simulate a Failed Deployment

Scale a deployment to an impossible replica count:

# Scale to more replicas than cluster can handle
kubectl scale deployment web-app --namespace my-app --replicas=100

# Watch Solution status change to degraded
kubectl get solution my-web-app --namespace my-app --watch

The Solution status will show:

status:
  state: degraded
  resources:
    total: 4
    ready: 3
    failed: 1
  conditions:
  - type: Ready
    status: "False"
    reason: ResourcesNotReady
    message: "Ready: 3/4, Failed: 1"

Restore Health

# Scale back to normal
kubectl scale deployment web-app --namespace my-app --replicas=2

# Watch status return to deployed
kubectl get solution my-web-app --namespace my-app --watch

Garbage Collection

One of the most powerful features is automatic cleanup via ownerReferences:

# Delete the Solution CR
kubectl delete solution my-web-app --namespace my-app

# All tracked resources are automatically deleted
kubectl get all --namespace my-app
# (should show: No resources found)

Important: Deleting a Solution CR deletes ALL child resources. This is by design for clean solution removal.

Advanced Usage

Tracking Existing Resources

You can create a Solution CR for resources that already exist. The controller will add ownerReferences dynamically:

# Create resources first
kubectl create deployment nginx --image=nginx --namespace my-app

# Then create Solution referencing existing resources
kubectl apply --filename - <<'EOF'
apiVersion: dot-ai.devopstoolkit.live/v1alpha1
kind: Solution
metadata:
  name: nginx-solution
  namespace: my-app
spec:
  intent: "Simple nginx web server"
  resources:
    - apiVersion: apps/v1
      kind: Deployment
      name: nginx
      namespace: my-app
EOF

# ownerReference is added after Solution creation
kubectl get deployment nginx --namespace my-app --output jsonpath='{.metadata.ownerReferences}'

Cross-Namespace Resources

Currently, Solutions track resources in their own namespace. All resource references without an explicit namespace default to the Solution's namespace.

Documentation Links

You can link to external documentation:

spec:
  intent: "Production web application"
  documentationURL: "https://docs.example.com/apps/web-app"
  resources:
    - ...

This field helps teams track where deployment documentation is stored.

Kubectl Tips

List All Solutions

# All namespaces
kubectl get solutions --all-namespaces

# Specific namespace
kubectl get solutions --namespace my-app

# Watch for changes
kubectl get solutions --all-namespaces --watch

Filter by State

# Find degraded solutions
kubectl get solutions --all-namespaces --output json | \
  jq -r '.items[] | select(.status.state=="degraded") | "\(.metadata.namespace)/\(.metadata.name)"'

# Find solutions with failed resources
kubectl get solutions --all-namespaces --output json | \
  jq -r '.items[] | select(.status.resources.failed > 0) | "\(.metadata.namespace)/\(.metadata.name): \(.status.resources.failed) failed"'

Inspect Resource Health

# Get detailed status
kubectl get solution my-web-app --namespace my-app --output yaml

# Just the state
kubectl get solution my-web-app --namespace my-app --output jsonpath='{.status.state}'

# Resource counts
kubectl get solution my-web-app --namespace my-app --output jsonpath='{.status.resources}' | jq

Common Patterns

Pattern 1: Multi-Tier Application

apiVersion: dot-ai.devopstoolkit.live/v1alpha1
kind: Solution
metadata:
  name: ecommerce-app
  namespace: production
spec:
  intent: "E-commerce platform with web frontend, API backend, and PostgreSQL database"
  context:
    createdBy: "ecommerce-team"
    patterns:
      - "three-tier-architecture"
      - "stateful-backend"
  resources:
    - apiVersion: apps/v1
      kind: Deployment
      name: frontend
    - apiVersion: v1
      kind: Service
      name: frontend
    - apiVersion: apps/v1
      kind: Deployment
      name: api-backend
    - apiVersion: v1
      kind: Service
      name: api-backend
    - apiVersion: apps/v1
      kind: StatefulSet
      name: postgresql
    - apiVersion: v1
      kind: Service
      name: postgresql
    - apiVersion: v1
      kind: ConfigMap
      name: app-config

Pattern 2: Microservice with Dependencies

apiVersion: dot-ai.devopstoolkit.live/v1alpha1
kind: Solution
metadata:
  name: order-service
  namespace: microservices
spec:
  intent: "Order processing microservice with Redis cache and message queue"
  context:
    createdBy: "platform-team"
    rationale: "Separated from monolith for scalability"
    policies:
      - "auto-scaling-enabled"
      - "circuit-breaker-required"
  resources:
    - apiVersion: apps/v1
      kind: Deployment
      name: order-service
    - apiVersion: v1
      kind: Service
      name: order-service
    - apiVersion: apps/v1
      kind: Deployment
      name: redis
    - apiVersion: v1
      kind: Service
      name: redis
    - apiVersion: v1
      kind: ConfigMap
      name: order-config
    - apiVersion: v1
      kind: Secret
      name: order-secrets

Pattern 3: Data Pipeline

apiVersion: dot-ai.devopstoolkit.live/v1alpha1
kind: Solution
metadata:
  name: analytics-pipeline
  namespace: data
spec:
  intent: "Analytics pipeline for processing customer data"
  context:
    createdBy: "data-team"
    patterns:
      - "batch-processing"
      - "data-pipeline"
    policies:
      - "pii-encryption-required"
      - "gdpr-compliant"
  resources:
    - apiVersion: batch/v1
      kind: CronJob
      name: data-ingestion
    - apiVersion: apps/v1
      kind: StatefulSet
      name: kafka
    - apiVersion: v1
      kind: Service
      name: kafka
    - apiVersion: apps/v1
      kind: Deployment
      name: data-processor
    - apiVersion: v1
      kind: PersistentVolumeClaim
      name: processed-data

Troubleshooting

Solution Shows "degraded" State

Check which resources are failing:

kubectl get solution <name> -n <namespace> -o jsonpath='{.status.resources}' | jq

Inspect individual resources:

kubectl get deployment <name> -n <namespace>
kubectl describe deployment <name> -n <namespace>

Check controller logs:

kubectl logs --selector app.kubernetes.io/name=dot-ai-controller --namespace dot-ai --tail 50

ownerReferences Not Added

Verify resource exists and is in correct namespace:

kubectl get <kind> <name> -n <namespace>

Check Solution references match exactly:

kubectl get solution <name> -n <namespace> -o yaml | grep -A 10 resources:

Wait for controller to reconcile (30 seconds) or check logs:

kubectl logs --selector app.kubernetes.io/name=dot-ai-controller --namespace dot-ai | grep ownerReference

Solution Status Not Updating

Verify controller is running:

kubectl get pods --selector app.kubernetes.io/name=dot-ai-controller --namespace dot-ai

Check for controller errors:

kubectl logs --selector app.kubernetes.io/name=dot-ai-controller --namespace dot-ai --tail 100

Verify controller has RBAC permissions:

kubectl get clusterrole dot-ai-controller-manager-role -o yaml

Resources Not Deleted with Solution

Check if ownerReferences were added:

kubectl get <kind> <name> -n <namespace> -o jsonpath='{.metadata.ownerReferences}'

If missing, controller may not have permission. Check RBAC:

kubectl logs --selector app.kubernetes.io/name=dot-ai-controller --namespace dot-ai | grep -i "forbidden\|permission"

Current Limitations

Namespace Scoped: Solutions only track resources in the same namespace
Namespaced Resources Only: Cannot currently track cluster-scoped resources (ClusterRoles, PVs, etc.)
No Configuration Drift Detection: Controller only tracks resource health, not configuration changes

Future Enhancements

Planned features for future releases:

Solution Updates: Support updating deployed solutions via Solution CR changes
Rollback Support: Track solution versions and enable rollback
Advanced Health Checks: Custom health checks beyond basic resource status
Cost Tracking: Integration with cloud cost APIs
Cross-Namespace Solutions: Support for solutions spanning multiple namespaces
Template System: Solution templates for common patterns

Next Steps

Explore the Remediation Guide for event-driven remediation
Learn about Capability Scanning for autonomous capability discovery
Check Troubleshooting Guide for common issues

Video: Kubernetes ownerReferences and Resource Grouping​

Overview​

Purpose​

Key Design Principles​

Prerequisites​

Solution CRD Schema​

Spec Fields​

Status Fields​

Quick Start: Your First Solution​

Step 1: Create a Namespace​

Step 2: Deploy Application Resources​

Step 3: Create a Solution CR​

Step 4: Verify the Solution​

Step 5: Verify ownerReferences Were Added​

Understanding Solution Status​

Status States​

Health Checking​

Status Updates​

Testing Health Monitoring​

Simulate a Failed Deployment​

Restore Health​

Garbage Collection​

Advanced Usage​

Tracking Existing Resources​

Cross-Namespace Resources​

Documentation Links​

Kubectl Tips​

List All Solutions​

Filter by State​

Inspect Resource Health​

Common Patterns​

Pattern 1: Multi-Tier Application​

Pattern 2: Microservice with Dependencies​

Pattern 3: Data Pipeline​

Troubleshooting​

Solution Shows "degraded" State​

ownerReferences Not Added​

Solution Status Not Updating​

Resources Not Deleted with Solution​

Current Limitations​

Future Enhancements​

Next Steps​