Files
buun-stack/kserve
2025-11-10 21:31:35 +09:00
..
2025-11-10 21:31:35 +09:00
2025-11-10 21:31:35 +09:00
2025-11-10 21:31:35 +09:00

KServe

KServe is a standard Model Inference Platform on Kubernetes for Machine Learning and Generative AI. It provides a standardized way to deploy, serve, and manage ML models across different frameworks.

Features

  • Multi-Framework Support: TensorFlow, PyTorch, scikit-learn, XGBoost, Hugging Face, Triton, and more
  • Deployment Modes:
    • RawDeployment (Standard): Uses native Kubernetes Deployments without Knative
    • Serverless (Knative): Auto-scaling with scale-to-zero capability
  • Model Storage: Support for S3, GCS, Azure Blob, PVC, and more
  • Inference Protocols: REST and gRPC
  • Advanced Features: Canary deployments, traffic splitting, explainability, outlier detection

Prerequisites

  • Kubernetes cluster (installed via just k8s::install)
  • Longhorn storage (installed via just longhorn::install)
  • cert-manager (required, installed via just cert-manager::install)
  • MinIO (optional, for S3-compatible model storage via just minio::install)
  • Prometheus (optional, for monitoring via just prometheus::install)

Installation

Basic Installation

# Install cert-manager (required)
just cert-manager::install

# Install KServe with default settings (RawDeployment mode)
just kserve::install

During installation, you will be prompted for:

  • Prometheus Monitoring: Whether to enable ServiceMonitor (if Prometheus is installed)

The domain for inference endpoints is configured via the KSERVE_DOMAIN environment variable (default: cluster.local).

Environment Variables

Key environment variables (set via .env.local or environment):

KSERVE_NAMESPACE=kserve                    # Namespace for KServe
KSERVE_CHART_VERSION=v0.15.0               # KServe Helm chart version
KSERVE_DEPLOYMENT_MODE=RawDeployment       # Deployment mode (RawDeployment or Knative)
KSERVE_DOMAIN=cluster.local                # Base domain for inference endpoints
MONITORING_ENABLED=true                    # Enable Prometheus monitoring
MINIO_NAMESPACE=minio                      # MinIO namespace (if using MinIO)

Domain Configuration

KServe uses the KSERVE_DOMAIN to construct URLs for inference endpoints.

Internal Access Only (Default):

KSERVE_DOMAIN=cluster.local
  • InferenceServices are accessible only within the cluster
  • URLs: http://<service-name>.<namespace>.svc.cluster.local
  • No external Ingress configuration needed
  • Recommended for development and testing

External Access:

KSERVE_DOMAIN=example.com
  • InferenceServices are accessible from outside the cluster
  • URLs: https://<service-name>.<namespace>.example.com
  • Requires Traefik Ingress configuration
  • DNS records must point to your cluster
  • Recommended for production deployments

Usage

Check Status

# View status of KServe components
just kserve::status

# View controller logs
just kserve::logs

Deploy a Model

Create an InferenceService resource:

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
    name: sklearn-iris
    namespace: default
spec:
    predictor:
        sklearn:
            storageUri: s3://models/sklearn/iris

Apply the resource:

kubectl apply -f inferenceservice.yaml

Access Inference Endpoint

# Get inference service URL
kubectl get inferenceservice sklearn-iris

For cluster.local (internal access):

# From within the cluster
curl -X POST http://sklearn-iris.default.svc.cluster.local/v1/models/sklearn-iris:predict \
    -H "Content-Type: application/json" \
    -d '{"instances": [[6.8, 2.8, 4.8, 1.4]]}'

For external domain:

# From anywhere (requires DNS and Ingress configuration)
curl -X POST https://sklearn-iris.default.example.com/v1/models/sklearn-iris:predict \
    -H "Content-Type: application/json" \
    -d '{"instances": [[6.8, 2.8, 4.8, 1.4]]}'

Storage Configuration

Using MinIO (S3-compatible)

If MinIO is installed, KServe will automatically configure S3 credentials:

# Storage secret is created automatically during installation
kubectl get secret kserve-s3-credentials -n kserve

External Secrets Integration:

  • When External Secrets Operator is available:
    • Credentials are retrieved directly from Vault at minio/admin
    • ExternalSecret resource syncs credentials to Kubernetes Secret
    • Secret includes KServe-specific annotations for S3 endpoint configuration
    • No duplicate storage needed - references existing MinIO credentials
  • When External Secrets Operator is not available:
    • Credentials are retrieved from MinIO Secret
    • Kubernetes Secret is created directly with annotations
    • Credentials are also backed up to Vault at kserve/storage if available

Models can be stored in MinIO buckets:

# Create a bucket for models
just minio::create-bucket models

# Upload model files to MinIO
# Then reference in InferenceService: s3://models/path/to/model

Using Other Storage

KServe supports various storage backends:

  • S3: AWS S3 or compatible services
  • GCS: Google Cloud Storage
  • Azure: Azure Blob Storage
  • PVC: Kubernetes Persistent Volume Claims
  • HTTP/HTTPS: Direct URLs

Supported Frameworks

The following serving runtimes are enabled by default:

  • scikit-learn: sklearn models
  • XGBoost: XGBoost models
  • MLServer: Multi-framework server (sklearn, XGBoost, etc.)
  • Triton: NVIDIA Triton Inference Server
  • TensorFlow: TensorFlow models
  • PyTorch: PyTorch models via TorchServe
  • Hugging Face: Transformer models

Advanced Configuration

Custom Serving Runtimes

You can create custom ClusterServingRuntime or ServingRuntime resources for specialized model servers.

Prometheus Monitoring

When monitoring is enabled, KServe controller metrics are exposed and scraped by Prometheus:

# View metrics in Grafana
# Metrics include: inference request rates, latencies, error rates

Deployment Modes

RawDeployment (Standard)

  • Uses standard Kubernetes Deployments, Services, and Ingress
  • No Knative dependency
  • Simpler setup, more control over resources
  • Manual scaling configuration required

Serverless (Knative)

  • Requires Knative Serving installation
  • Auto-scaling with scale-to-zero
  • Advanced traffic management
  • Better resource utilization for sporadic workloads

Examples

Iris Classification with MLflow

A complete end-to-end example demonstrating model serving with KServe:

  • Train an Iris classification model in JupyterHub
  • Register the model to MLflow Model Registry
  • Deploy the registered model with KServe InferenceService
  • Test inference using v2 protocol from JupyterHub notebooks and Kubernetes Jobs

This example demonstrates:

  • Converting MLflow artifact paths to KServe storageUri
  • Using MLflow format runtime (with automatic dependency installation)
  • Testing with both single and batch predictions
  • Using v2 Open Inference Protocol

See: examples/kserve-mlflow-iris

Uninstallation

# Remove KServe (keeps CRDs for safety)
just kserve::uninstall

This will:

  • Uninstall KServe resources Helm chart
  • Uninstall KServe CRDs
  • Delete storage secrets
  • Delete namespace

Warning: Uninstalling will remove all InferenceService resources.

Troubleshooting

Check Controller Logs

just kserve::logs

View InferenceService Status

kubectl get inferenceservice -A
kubectl describe inferenceservice <name> -n <namespace>

Check Predictor Pods

kubectl get pods -l serving.kserve.io/inferenceservice=<name>
kubectl logs <pod-name>

Storage Issues

If models fail to download:

# Check storage initializer logs
kubectl logs <pod-name> -c storage-initializer

# Verify S3 credentials
kubectl get secret kserve-s3-credentials -n kserve -o yaml

References