KServe
KServe is a standard Model Inference Platform on Kubernetes for Machine Learning and Generative AI. It provides a standardized way to deploy, serve, and manage ML models across different frameworks.
Features
- Multi-Framework Support: TensorFlow, PyTorch, scikit-learn, XGBoost, Hugging Face, Triton, and more
- Deployment Modes:
- RawDeployment (Standard): Uses native Kubernetes Deployments without Knative
- Serverless (Knative): Auto-scaling with scale-to-zero capability
- Model Storage: Support for S3, GCS, Azure Blob, PVC, and more
- Inference Protocols: REST and gRPC
- Advanced Features: Canary deployments, traffic splitting, explainability, outlier detection
Prerequisites
- Kubernetes cluster (installed via
just k8s::install) - Longhorn storage (installed via
just longhorn::install) - cert-manager (required, installed via
just cert-manager::install) - MinIO (optional, for S3-compatible model storage via
just minio::install) - Prometheus (optional, for monitoring via
just prometheus::install)
Installation
Basic Installation
# Install cert-manager (required)
just cert-manager::install
# Install KServe with default settings (RawDeployment mode)
just kserve::install
During installation, you will be prompted for:
- Prometheus Monitoring: Whether to enable ServiceMonitor (if Prometheus is installed)
The domain for inference endpoints is configured via the KSERVE_DOMAIN environment variable (default: cluster.local).
Environment Variables
Key environment variables (set via .env.local or environment):
KSERVE_NAMESPACE=kserve # Namespace for KServe
KSERVE_CHART_VERSION=v0.15.0 # KServe Helm chart version
KSERVE_DEPLOYMENT_MODE=RawDeployment # Deployment mode (RawDeployment or Knative)
KSERVE_DOMAIN=cluster.local # Base domain for inference endpoints
MONITORING_ENABLED=true # Enable Prometheus monitoring
MINIO_NAMESPACE=minio # MinIO namespace (if using MinIO)
Domain Configuration
KServe uses the KSERVE_DOMAIN to construct URLs for inference endpoints.
Internal Access Only (Default):
KSERVE_DOMAIN=cluster.local
- InferenceServices are accessible only within the cluster
- URLs:
http://<service-name>.<namespace>.svc.cluster.local - No external Ingress configuration needed
- Recommended for development and testing
External Access:
KSERVE_DOMAIN=example.com
- InferenceServices are accessible from outside the cluster
- URLs:
https://<service-name>.<namespace>.example.com - Requires Traefik Ingress configuration
- DNS records must point to your cluster
- Recommended for production deployments
Usage
Check Status
# View status of KServe components
just kserve::status
# View controller logs
just kserve::logs
Deploy a Model
Create an InferenceService resource:
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: sklearn-iris
namespace: default
spec:
predictor:
sklearn:
storageUri: s3://models/sklearn/iris
Apply the resource:
kubectl apply -f inferenceservice.yaml
Access Inference Endpoint
# Get inference service URL
kubectl get inferenceservice sklearn-iris
For cluster.local (internal access):
# From within the cluster
curl -X POST http://sklearn-iris.default.svc.cluster.local/v1/models/sklearn-iris:predict \
-H "Content-Type: application/json" \
-d '{"instances": [[6.8, 2.8, 4.8, 1.4]]}'
For external domain:
# From anywhere (requires DNS and Ingress configuration)
curl -X POST https://sklearn-iris.default.example.com/v1/models/sklearn-iris:predict \
-H "Content-Type: application/json" \
-d '{"instances": [[6.8, 2.8, 4.8, 1.4]]}'
Storage Configuration
Using MinIO (S3-compatible)
If MinIO is installed, KServe will automatically configure S3 credentials:
# Storage secret is created automatically during installation
kubectl get secret kserve-s3-credentials -n kserve
External Secrets Integration:
- When External Secrets Operator is available:
- Credentials are retrieved directly from Vault at
minio/admin - ExternalSecret resource syncs credentials to Kubernetes Secret
- Secret includes KServe-specific annotations for S3 endpoint configuration
- No duplicate storage needed - references existing MinIO credentials
- Credentials are retrieved directly from Vault at
- When External Secrets Operator is not available:
- Credentials are retrieved from MinIO Secret
- Kubernetes Secret is created directly with annotations
- Credentials are also backed up to Vault at
kserve/storageif available
Models can be stored in MinIO buckets:
# Create a bucket for models
just minio::create-bucket models
# Upload model files to MinIO
# Then reference in InferenceService: s3://models/path/to/model
Using Other Storage
KServe supports various storage backends:
- S3: AWS S3 or compatible services
- GCS: Google Cloud Storage
- Azure: Azure Blob Storage
- PVC: Kubernetes Persistent Volume Claims
- HTTP/HTTPS: Direct URLs
Supported Frameworks
The following serving runtimes are enabled by default:
- scikit-learn: sklearn models
- XGBoost: XGBoost models
- MLServer: Multi-framework server (sklearn, XGBoost, etc.)
- Triton: NVIDIA Triton Inference Server
- TensorFlow: TensorFlow models
- PyTorch: PyTorch models via TorchServe
- Hugging Face: Transformer models
Advanced Configuration
Custom Serving Runtimes
You can create custom ClusterServingRuntime or ServingRuntime resources for specialized model servers.
Prometheus Monitoring
When monitoring is enabled, KServe controller metrics are exposed and scraped by Prometheus:
# View metrics in Grafana
# Metrics include: inference request rates, latencies, error rates
Deployment Modes
RawDeployment (Standard)
- Uses standard Kubernetes Deployments, Services, and Ingress
- No Knative dependency
- Simpler setup, more control over resources
- Manual scaling configuration required
Serverless (Knative)
- Requires Knative Serving installation
- Auto-scaling with scale-to-zero
- Advanced traffic management
- Better resource utilization for sporadic workloads
Examples
Iris Classification with MLflow
A complete end-to-end example demonstrating model serving with KServe:
- Train an Iris classification model in JupyterHub
- Register the model to MLflow Model Registry
- Deploy the registered model with KServe InferenceService
- Test inference using v2 protocol from JupyterHub notebooks and Kubernetes Jobs
This example demonstrates:
- Converting MLflow artifact paths to KServe storageUri
- Using MLflow format runtime (with automatic dependency installation)
- Testing with both single and batch predictions
- Using v2 Open Inference Protocol
See: examples/kserve-mlflow-iris
Uninstallation
# Remove KServe (keeps CRDs for safety)
just kserve::uninstall
This will:
- Uninstall KServe resources Helm chart
- Uninstall KServe CRDs
- Delete storage secrets
- Delete namespace
Warning: Uninstalling will remove all InferenceService resources.
Troubleshooting
Check Controller Logs
just kserve::logs
View InferenceService Status
kubectl get inferenceservice -A
kubectl describe inferenceservice <name> -n <namespace>
Check Predictor Pods
kubectl get pods -l serving.kserve.io/inferenceservice=<name>
kubectl logs <pod-name>
Storage Issues
If models fail to download:
# Check storage initializer logs
kubectl logs <pod-name> -c storage-initializer
# Verify S3 credentials
kubectl get secret kserve-s3-credentials -n kserve -o yaml