KServe is a standard Model Inference Platform on Kubernetes for Machine Learning and Generative AI. It provides a standardized way to deploy, serve, and manage ML models across different frameworks.

Features

Multi-Framework Support: TensorFlow, PyTorch, scikit-learn, XGBoost, Hugging Face, Triton, and more
Deployment Modes:
- RawDeployment (Standard): Uses native Kubernetes Deployments without Knative
- Serverless (Knative): Auto-scaling with scale-to-zero capability
Model Storage: Support for S3, GCS, Azure Blob, PVC, and more
Inference Protocols: REST and gRPC
Advanced Features: Canary deployments, traffic splitting, explainability, outlier detection

Prerequisites

Kubernetes cluster (installed via just k8s::install)
Longhorn storage (installed via just longhorn::install)
cert-manager (required, installed via just cert-manager::install)
MinIO (optional, for S3-compatible model storage via just minio::install)
Prometheus (optional, for monitoring via just prometheus::install)

Installation

Basic Installation

# Install cert-manager (required)
just cert-manager::install

# Install KServe with default settings (RawDeployment mode)
just kserve::install

During installation, you will be prompted for:

Prometheus Monitoring: Whether to enable ServiceMonitor (if Prometheus is installed)

The domain for inference endpoints is configured via the KSERVE_DOMAIN environment variable (default: cluster.local).

Environment Variables

Key environment variables (set via .env.local or environment):

KSERVE_NAMESPACE=kserve                    # Namespace for KServe
KSERVE_CHART_VERSION=v0.15.0               # KServe Helm chart version
KSERVE_DEPLOYMENT_MODE=RawDeployment       # Deployment mode (RawDeployment or Knative)
KSERVE_DOMAIN=cluster.local                # Base domain for inference endpoints
MONITORING_ENABLED=true                    # Enable Prometheus monitoring
MINIO_NAMESPACE=minio                      # MinIO namespace (if using MinIO)

Domain Configuration

KServe uses the KSERVE_DOMAIN to construct URLs for inference endpoints.

Internal Access Only (Default):

KSERVE_DOMAIN=cluster.local

InferenceServices are accessible only within the cluster
URLs: http://<service-name>.<namespace>.svc.cluster.local
No external Ingress configuration needed
Recommended for development and testing

External Access:

KSERVE_DOMAIN=example.com

InferenceServices are accessible from outside the cluster
URLs: https://<service-name>.<namespace>.example.com
Requires Traefik Ingress configuration
DNS records must point to your cluster
Recommended for production deployments

Usage

Check Status

# View status of KServe components
just kserve::status

# View controller logs
just kserve::logs

Deploy a Model

Create an InferenceService resource:

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
    name: sklearn-iris
    namespace: default
spec:
    predictor:
        sklearn:
            storageUri: s3://models/sklearn/iris

Apply the resource:

kubectl apply -f inferenceservice.yaml

Access Inference Endpoint

# Get inference service URL
kubectl get inferenceservice sklearn-iris

For cluster.local (internal access):

# From within the cluster
curl -X POST http://sklearn-iris.default.svc.cluster.local/v1/models/sklearn-iris:predict \
    -H "Content-Type: application/json" \
    -d '{"instances": [[6.8, 2.8, 4.8, 1.4]]}'

For external domain:

# From anywhere (requires DNS and Ingress configuration)
curl -X POST https://sklearn-iris.default.example.com/v1/models/sklearn-iris:predict \
    -H "Content-Type: application/json" \
    -d '{"instances": [[6.8, 2.8, 4.8, 1.4]]}'

Storage Configuration

Using MinIO (S3-compatible)

If MinIO is installed, KServe will automatically configure S3 credentials:

# Storage secret is created automatically during installation
kubectl get secret kserve-s3-credentials -n kserve

External Secrets Integration:

When External Secrets Operator is available:
- Credentials are retrieved directly from Vault at minio/admin
- ExternalSecret resource syncs credentials to Kubernetes Secret
- Secret includes KServe-specific annotations for S3 endpoint configuration
- No duplicate storage needed - references existing MinIO credentials
When External Secrets Operator is not available:
- Credentials are retrieved from MinIO Secret
- Kubernetes Secret is created directly with annotations
- Credentials are also backed up to Vault at kserve/storage if available

Models can be stored in MinIO buckets:

# Create a bucket for models
just minio::create-bucket models

# Upload model files to MinIO
# Then reference in InferenceService: s3://models/path/to/model

Using Other Storage

KServe supports various storage backends:

S3: AWS S3 or compatible services
GCS: Google Cloud Storage
Azure: Azure Blob Storage
PVC: Kubernetes Persistent Volume Claims
HTTP/HTTPS: Direct URLs

Supported Frameworks

The following serving runtimes are enabled by default:

scikit-learn: sklearn models
XGBoost: XGBoost models
MLServer: Multi-framework server (sklearn, XGBoost, etc.)
Triton: NVIDIA Triton Inference Server
TensorFlow: TensorFlow models
PyTorch: PyTorch models via TorchServe
Hugging Face: Transformer models

Advanced Configuration

Custom Serving Runtimes

You can create custom ClusterServingRuntime or ServingRuntime resources for specialized model servers.

Prometheus Monitoring

When monitoring is enabled, KServe controller metrics are exposed and scraped by Prometheus:

# View metrics in Grafana
# Metrics include: inference request rates, latencies, error rates

Deployment Modes

RawDeployment (Standard)

Uses standard Kubernetes Deployments, Services, and Ingress
No Knative dependency
Simpler setup, more control over resources
Manual scaling configuration required

Serverless (Knative)

Requires Knative Serving installation
Auto-scaling with scale-to-zero
Advanced traffic management
Better resource utilization for sporadic workloads

Examples

Iris Classification with MLflow

A complete end-to-end example demonstrating model serving with KServe:

Train an Iris classification model in JupyterHub
Register the model to MLflow Model Registry
Deploy the registered model with KServe InferenceService
Test inference using v2 protocol from JupyterHub notebooks and Kubernetes Jobs

This example demonstrates:

Converting MLflow artifact paths to KServe storageUri
Using MLflow format runtime (with automatic dependency installation)
Testing with both single and batch predictions
Using v2 Open Inference Protocol

See: examples/kserve-mlflow-iris

Uninstallation

# Remove KServe (keeps CRDs for safety)
just kserve::uninstall

This will:

Uninstall KServe resources Helm chart
Uninstall KServe CRDs
Delete storage secrets
Delete namespace

Warning: Uninstalling will remove all InferenceService resources.

Troubleshooting

Check Controller Logs

just kserve::logs

View InferenceService Status

kubectl get inferenceservice -A
kubectl describe inferenceservice <name> -n <namespace>

Check Predictor Pods

kubectl get pods -l serving.kserve.io/inferenceservice=<name>
kubectl logs <pod-name>

Storage Issues

If models fail to download:

# Check storage initializer logs
kubectl logs <pod-name> -c storage-initializer

# Verify S3 credentials
kubectl get secret kserve-s3-credentials -n kserve -o yaml

README.md

KServe