feat(kserve): install KServe

2025-11-10 21:31:35 +09:00
parent 27de65dd37
commit 2b0687330c
14 changed files with 1974 additions and 0 deletions
--- a/kserve/README.md
+++ b/kserve/README.md
@@ -0,0 +1,300 @@
+# KServe
+
+KServe is a standard Model Inference Platform on Kubernetes for Machine Learning and Generative AI. It provides a standardized way to deploy, serve, and manage ML models across different frameworks.
+
+## Features
+
+- **Multi-Framework Support**: TensorFlow, PyTorch, scikit-learn, XGBoost, Hugging Face, Triton, and more
+- **Deployment Modes**:
+    - **RawDeployment (Standard)**: Uses native Kubernetes Deployments without Knative
+    - **Serverless (Knative)**: Auto-scaling with scale-to-zero capability
+- **Model Storage**: Support for S3, GCS, Azure Blob, PVC, and more
+- **Inference Protocols**: REST and gRPC
+- **Advanced Features**: Canary deployments, traffic splitting, explainability, outlier detection
+
+## Prerequisites
+
+- Kubernetes cluster (installed via `just k8s::install`)
+- Longhorn storage (installed via `just longhorn::install`)
+- **cert-manager** (required, installed via `just cert-manager::install`)
+- MinIO (optional, for S3-compatible model storage via `just minio::install`)
+- Prometheus (optional, for monitoring via `just prometheus::install`)
+
+## Installation
+
+### Basic Installation
+
+```bash
+# Install cert-manager (required)
+just cert-manager::install
+
+# Install KServe with default settings (RawDeployment mode)
+just kserve::install
+```
+
+During installation, you will be prompted for:
+
+- **Prometheus Monitoring**: Whether to enable ServiceMonitor (if Prometheus is installed)
+
+The domain for inference endpoints is configured via the `KSERVE_DOMAIN` environment variable (default: `cluster.local`).
+
+### Environment Variables
+
+Key environment variables (set via `.env.local` or environment):
+
+```bash
+KSERVE_NAMESPACE=kserve                    # Namespace for KServe
+KSERVE_CHART_VERSION=v0.15.0               # KServe Helm chart version
+KSERVE_DEPLOYMENT_MODE=RawDeployment       # Deployment mode (RawDeployment or Knative)
+KSERVE_DOMAIN=cluster.local                # Base domain for inference endpoints
+MONITORING_ENABLED=true                    # Enable Prometheus monitoring
+MINIO_NAMESPACE=minio                      # MinIO namespace (if using MinIO)
+```
+
+### Domain Configuration
+
+KServe uses the `KSERVE_DOMAIN` to construct URLs for inference endpoints.
+
+**Internal Access Only (Default):**
+
+```bash
+KSERVE_DOMAIN=cluster.local
+```
+
+- InferenceServices are accessible only within the cluster
+- URLs: `http://<service-name>.<namespace>.svc.cluster.local`
+- No external Ingress configuration needed
+- Recommended for development and testing
+
+**External Access:**
+
+```bash
+KSERVE_DOMAIN=example.com
+```
+
+- InferenceServices are accessible from outside the cluster
+- URLs: `https://<service-name>.<namespace>.example.com`
+- Requires Traefik Ingress configuration
+- DNS records must point to your cluster
+- Recommended for production deployments
+
+## Usage
+
+### Check Status
+
+```bash
+# View status of KServe components
+just kserve::status
+
+# View controller logs
+just kserve::logs
+```
+
+### Deploy a Model
+
+Create an `InferenceService` resource:
+
+```yaml
+apiVersion: serving.kserve.io/v1beta1
+kind: InferenceService
+metadata:
+    name: sklearn-iris
+    namespace: default
+spec:
+    predictor:
+        sklearn:
+            storageUri: s3://models/sklearn/iris
+```
+
+Apply the resource:
+
+```bash
+kubectl apply -f inferenceservice.yaml
+```
+
+### Access Inference Endpoint
+
+```bash
+# Get inference service URL
+kubectl get inferenceservice sklearn-iris
+```
+
+**For cluster.local (internal access):**
+
+```bash
+# From within the cluster
+curl -X POST http://sklearn-iris.default.svc.cluster.local/v1/models/sklearn-iris:predict \
+    -H "Content-Type: application/json" \
+    -d '{"instances": [[6.8, 2.8, 4.8, 1.4]]}'
+```
+
+**For external domain:**
+
+```bash
+# From anywhere (requires DNS and Ingress configuration)
+curl -X POST https://sklearn-iris.default.example.com/v1/models/sklearn-iris:predict \
+    -H "Content-Type: application/json" \
+    -d '{"instances": [[6.8, 2.8, 4.8, 1.4]]}'
+```
+
+## Storage Configuration
+
+### Using MinIO (S3-compatible)
+
+If MinIO is installed, KServe will automatically configure S3 credentials:
+
+```bash
+# Storage secret is created automatically during installation
+kubectl get secret kserve-s3-credentials -n kserve
+```
+
+**External Secrets Integration:**
+
+- When External Secrets Operator is available:
+    - Credentials are retrieved directly from Vault at `minio/admin`
+    - ExternalSecret resource syncs credentials to Kubernetes Secret
+    - Secret includes KServe-specific annotations for S3 endpoint configuration
+    - No duplicate storage needed - references existing MinIO credentials
+- When External Secrets Operator is not available:
+    - Credentials are retrieved from MinIO Secret
+    - Kubernetes Secret is created directly with annotations
+    - Credentials are also backed up to Vault at `kserve/storage` if available
+
+Models can be stored in MinIO buckets:
+
+```bash
+# Create a bucket for models
+just minio::create-bucket models
+
+# Upload model files to MinIO
+# Then reference in InferenceService: s3://models/path/to/model
+```
+
+### Using Other Storage
+
+KServe supports various storage backends:
+
+- **S3**: AWS S3 or compatible services
+- **GCS**: Google Cloud Storage
+- **Azure**: Azure Blob Storage
+- **PVC**: Kubernetes Persistent Volume Claims
+- **HTTP/HTTPS**: Direct URLs
+
+## Supported Frameworks
+
+The following serving runtimes are enabled by default:
+
+- **scikit-learn**: sklearn models
+- **XGBoost**: XGBoost models
+- **MLServer**: Multi-framework server (sklearn, XGBoost, etc.)
+- **Triton**: NVIDIA Triton Inference Server
+- **TensorFlow**: TensorFlow models
+- **PyTorch**: PyTorch models via TorchServe
+- **Hugging Face**: Transformer models
+
+## Advanced Configuration
+
+### Custom Serving Runtimes
+
+You can create custom `ClusterServingRuntime` or `ServingRuntime` resources for specialized model servers.
+
+### Prometheus Monitoring
+
+When monitoring is enabled, KServe controller metrics are exposed and scraped by Prometheus:
+
+```bash
+# View metrics in Grafana
+# Metrics include: inference request rates, latencies, error rates
+```
+
+## Deployment Modes
+
+### RawDeployment (Standard)
+
+- Uses standard Kubernetes Deployments, Services, and Ingress
+- No Knative dependency
+- Simpler setup, more control over resources
+- Manual scaling configuration required
+
+### Serverless (Knative)
+
+- Requires Knative Serving installation
+- Auto-scaling with scale-to-zero
+- Advanced traffic management
+- Better resource utilization for sporadic workloads
+
+## Examples
+
+### Iris Classification with MLflow
+
+A complete end-to-end example demonstrating model serving with KServe:
+
+- Train an Iris classification model in JupyterHub
+- Register the model to MLflow Model Registry
+- Deploy the registered model with KServe InferenceService
+- Test inference using v2 protocol from JupyterHub notebooks and Kubernetes Jobs
+
+This example demonstrates:
+- Converting MLflow artifact paths to KServe storageUri
+- Using MLflow format runtime (with automatic dependency installation)
+- Testing with both single and batch predictions
+- Using v2 Open Inference Protocol
+
+See: [`examples/kserve-mlflow-iris`](../examples/kserve-mlflow-iris/README.md)
+
+## Uninstallation
+
+```bash
+# Remove KServe (keeps CRDs for safety)
+just kserve::uninstall
+```
+
+This will:
+
+- Uninstall KServe resources Helm chart
+- Uninstall KServe CRDs
+- Delete storage secrets
+- Delete namespace
+
+**Warning**: Uninstalling will remove all InferenceService resources.
+
+## Troubleshooting
+
+### Check Controller Logs
+
+```bash
+just kserve::logs
+```
+
+### View InferenceService Status
+
+```bash
+kubectl get inferenceservice -A
+kubectl describe inferenceservice <name> -n <namespace>
+```
+
+### Check Predictor Pods
+
+```bash
+kubectl get pods -l serving.kserve.io/inferenceservice=<name>
+kubectl logs <pod-name>
+```
+
+### Storage Issues
+
+If models fail to download:
+
+```bash
+# Check storage initializer logs
+kubectl logs <pod-name> -c storage-initializer
+
+# Verify S3 credentials
+kubectl get secret kserve-s3-credentials -n kserve -o yaml
+```
+
+## References
+
+- [KServe Documentation](https://kserve.github.io/website/)
+- [KServe GitHub](https://github.com/kserve/kserve)
+- [KServe Examples](https://github.com/kserve/kserve/tree/master/docs/samples)
+- [Supported ML Frameworks](https://kserve.github.io/website/latest/modelserving/v1beta1/serving_runtime/)