feat(mlflow): enable authn
This commit is contained in:
2
mlflow/.gitignore
vendored
2
mlflow/.gitignore
vendored
@@ -1,3 +1,5 @@
|
||||
values.yaml
|
||||
mlflow-db-external-secret.yaml
|
||||
mlflow-s3-external-secret.yaml
|
||||
mlflow-oidc-config.yaml
|
||||
image/.buildx-cache
|
||||
|
||||
484
mlflow/README.md
Normal file
484
mlflow/README.md
Normal file
@@ -0,0 +1,484 @@
|
||||
# MLflow
|
||||
|
||||
Open source platform for managing the end-to-end machine learning lifecycle with Keycloak OIDC authentication.
|
||||
|
||||
## Overview
|
||||
|
||||
This module deploys MLflow using the Community Charts Helm chart with:
|
||||
|
||||
- **Keycloak OIDC authentication** for user login
|
||||
- **Custom Docker image** with mlflow-oidc-auth plugin
|
||||
- **PostgreSQL backend** for tracking server and auth databases
|
||||
- **MinIO/S3 artifact storage** with proxied access
|
||||
- **FastAPI/ASGI server** with Uvicorn for production
|
||||
- **HTTPS reverse proxy support** via Traefik
|
||||
- **Group-based access control** via Keycloak groups
|
||||
- **Prometheus metrics** for monitoring
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Kubernetes cluster (k3s)
|
||||
- Keycloak installed and configured
|
||||
- PostgreSQL cluster (CloudNativePG)
|
||||
- MinIO object storage
|
||||
- External Secrets Operator (optional, for Vault integration)
|
||||
- Docker registry (local or remote)
|
||||
|
||||
## Installation
|
||||
|
||||
### Basic Installation
|
||||
|
||||
1. **Build and Push Custom MLflow Image**:
|
||||
|
||||
Set `DOCKER_HOST` to your remote Docker host (where k3s is running):
|
||||
|
||||
```bash
|
||||
export DOCKER_HOST=ssh://yourhost.com
|
||||
just mlflow::build-and-push-image
|
||||
```
|
||||
|
||||
This builds a custom MLflow image with OIDC auth plugin and pushes it to your k3s registry.
|
||||
|
||||
2. **Install MLflow**:
|
||||
|
||||
```bash
|
||||
just mlflow::install
|
||||
```
|
||||
|
||||
You will be prompted for:
|
||||
|
||||
- **MLflow host (FQDN)**: e.g., `mlflow.example.com`
|
||||
|
||||
### What Gets Installed
|
||||
|
||||
- MLflow tracking server (FastAPI with OIDC)
|
||||
- PostgreSQL databases:
|
||||
- `mlflow` - Experiment tracking, models, and runs
|
||||
- `mlflow_auth` - User authentication and permissions
|
||||
- PostgreSQL user `mlflow` with access to both databases
|
||||
- MinIO bucket `mlflow` for artifact storage
|
||||
- Custom MLflow Docker image with OIDC auth plugin
|
||||
- Keycloak OAuth client (confidential client)
|
||||
- Keycloak groups:
|
||||
- `mlflow-admins` - Full administrative access
|
||||
- `mlflow-users` - Basic user access
|
||||
|
||||
## Configuration
|
||||
|
||||
### Docker Build Environment
|
||||
|
||||
For building and pushing the custom MLflow image:
|
||||
|
||||
```bash
|
||||
DOCKER_HOST=ssh://yourhost.com # Remote Docker host (where k3s is running)
|
||||
IMAGE_REGISTRY=localhost:30500 # k3s local registry
|
||||
```
|
||||
|
||||
### Deployment Configuration
|
||||
|
||||
Environment variables (set in `.env.local` or override):
|
||||
|
||||
```bash
|
||||
MLFLOW_NAMESPACE=mlflow # Kubernetes namespace
|
||||
MLFLOW_CHART_VERSION=1.8.0 # Helm chart version
|
||||
MLFLOW_HOST=mlflow.example.com # External hostname
|
||||
MLFLOW_IMAGE_TAG=3.6.0-oidc # Custom image tag
|
||||
MLFLOW_IMAGE_PULL_POLICY=IfNotPresent # Image pull policy
|
||||
KEYCLOAK_HOST=auth.example.com # Keycloak hostname
|
||||
KEYCLOAK_REALM=buunstack # Keycloak realm name
|
||||
```
|
||||
|
||||
### Architecture Notes
|
||||
|
||||
**MLflow 3.6.0 with OIDC**:
|
||||
|
||||
- Uses `mlflow-oidc-auth[full]==5.6.1` plugin
|
||||
- FastAPI/ASGI server with Uvicorn (not Gunicorn)
|
||||
- Server type: `oidc-auth-fastapi` for ASGI compatibility
|
||||
- Session management: `cachelib` with filesystem backend
|
||||
- Custom Docker image built from `burakince/mlflow:3.6.0`
|
||||
|
||||
**Authentication Flow**:
|
||||
|
||||
- OIDC Discovery: `/.well-known/openid-configuration`
|
||||
- Redirect URI: `/callback` (not `/oidc/callback`)
|
||||
- Required scopes: `openid profile email groups`
|
||||
- Group attribute: `groups` from UserInfo
|
||||
|
||||
**Database Structure**:
|
||||
|
||||
- `mlflow` database: Experiment tracking, models, parameters, metrics
|
||||
- `mlflow_auth` database: User accounts, groups, permissions
|
||||
|
||||
## Usage
|
||||
|
||||
### Access MLflow
|
||||
|
||||
1. Navigate to `https://your-mlflow-host/`
|
||||
2. Click "Keycloak" button to authenticate
|
||||
3. After successful login:
|
||||
- First redirect: Permissions Management UI (`/oidc/ui/`)
|
||||
- Click "MLflow" button: Main MLflow UI
|
||||
|
||||
### Grant Admin Access
|
||||
|
||||
Add users to the `mlflow-admins` group:
|
||||
|
||||
```bash
|
||||
just keycloak::add-user-to-group <username> mlflow-admins
|
||||
```
|
||||
|
||||
Admin users have full privileges including:
|
||||
|
||||
- Experiment and model management
|
||||
- User and permission management
|
||||
- Access to all experiments and models
|
||||
|
||||
### Log Experiments
|
||||
|
||||
#### Using Python Client
|
||||
|
||||
```python
|
||||
import mlflow
|
||||
|
||||
# Set tracking URI
|
||||
mlflow.set_tracking_uri("https://mlflow.example.com")
|
||||
|
||||
# Start experiment
|
||||
mlflow.set_experiment("my-experiment")
|
||||
|
||||
# Log parameters, metrics, and artifacts
|
||||
with mlflow.start_run():
|
||||
mlflow.log_param("learning_rate", 0.01)
|
||||
mlflow.log_metric("accuracy", 0.95)
|
||||
mlflow.log_artifact("model.pkl")
|
||||
```
|
||||
|
||||
#### Authentication for API Access
|
||||
|
||||
For programmatic access, create an access token:
|
||||
|
||||
1. Log in to MLflow UI
|
||||
2. Navigate to Permissions UI → Create access token
|
||||
3. Use token in your code:
|
||||
|
||||
```python
|
||||
import os
|
||||
os.environ["MLFLOW_TRACKING_TOKEN"] = "your-token"
|
||||
```
|
||||
|
||||
### Model Registry
|
||||
|
||||
Register and manage models:
|
||||
|
||||
```python
|
||||
# Register model
|
||||
mlflow.register_model(
|
||||
model_uri="runs:/<run-id>/model",
|
||||
name="my-model"
|
||||
)
|
||||
|
||||
# Transition model stage
|
||||
from mlflow.tracking import MlflowClient
|
||||
client = MlflowClient()
|
||||
client.transition_model_version_stage(
|
||||
name="my-model",
|
||||
version=1,
|
||||
stage="Production"
|
||||
)
|
||||
```
|
||||
|
||||
## Features
|
||||
|
||||
- **Experiment Tracking**: Log parameters, metrics, and artifacts
|
||||
- **Model Registry**: Version and manage ML models
|
||||
- **Model Serving**: Deploy models as REST APIs
|
||||
- **Project Reproducibility**: Package code, data, and environment
|
||||
- **Remote Execution**: Run experiments on remote platforms
|
||||
- **UI Dashboard**: Visual experiment comparison and analysis
|
||||
- **LLM Tracking**: Track LLM applications with traces
|
||||
- **Prompt Registry**: Manage and version prompts
|
||||
|
||||
## Architecture
|
||||
|
||||
```plain
|
||||
External Users
|
||||
↓
|
||||
Cloudflare Tunnel (HTTPS)
|
||||
↓
|
||||
Traefik Ingress (HTTPS)
|
||||
↓
|
||||
MLflow Server (HTTP inside cluster)
|
||||
├─ FastAPI/ASGI (Uvicorn)
|
||||
├─ mlflow-oidc-auth plugin
|
||||
│ ├─ OAuth → Keycloak (authentication)
|
||||
│ └─ Session → FileSystemCache
|
||||
├─ PostgreSQL (metadata)
|
||||
│ ├─ mlflow (tracking)
|
||||
│ └─ mlflow_auth (users/groups)
|
||||
└─ MinIO (artifacts via proxied access)
|
||||
```
|
||||
|
||||
**Key Components**:
|
||||
|
||||
- **Server Type**: `oidc-auth-fastapi` for FastAPI/ASGI compatibility
|
||||
- **Allowed Hosts**: Validates `Host` header for security
|
||||
- **Session Backend**: Cachelib with filesystem storage
|
||||
- **Artifact Storage**: Proxied through MLflow server (no direct S3 access needed)
|
||||
|
||||
## Authentication
|
||||
|
||||
### User Login (OIDC)
|
||||
|
||||
- Users authenticate via Keycloak
|
||||
- Standard OIDC flow with Authorization Code grant
|
||||
- Group membership retrieved from `groups` claim in UserInfo
|
||||
- Users automatically created on first login
|
||||
|
||||
### Access Control
|
||||
|
||||
**Group-based Permissions**:
|
||||
|
||||
```python
|
||||
OIDC_ADMIN_GROUP_NAME = "mlflow-admins"
|
||||
OIDC_GROUP_NAME = "mlflow-admins,mlflow-users"
|
||||
```
|
||||
|
||||
**Default Permissions**:
|
||||
|
||||
- New resources: `MANAGE` permission for creator
|
||||
- Admins: Full access to all resources
|
||||
- Users: Access based on explicit permissions
|
||||
|
||||
### Permission Management
|
||||
|
||||
Access the Permissions UI at `/oidc/ui/`:
|
||||
|
||||
- View and manage user permissions
|
||||
- Assign permissions to experiments, models, and prompts
|
||||
- Create and manage groups
|
||||
- View audit logs
|
||||
|
||||
## Management
|
||||
|
||||
### Rebuild Custom Image
|
||||
|
||||
If you need to update the custom MLflow image:
|
||||
|
||||
```bash
|
||||
export DOCKER_HOST=ssh://yourhost.com
|
||||
just mlflow::build-and-push-image
|
||||
```
|
||||
|
||||
After rebuilding, restart MLflow to use the new image:
|
||||
|
||||
```bash
|
||||
kubectl rollout restart deployment/mlflow -n mlflow
|
||||
```
|
||||
|
||||
### Upgrade MLflow
|
||||
|
||||
```bash
|
||||
just mlflow::upgrade
|
||||
```
|
||||
|
||||
Updates the Helm deployment with current configuration.
|
||||
|
||||
### Uninstall
|
||||
|
||||
```bash
|
||||
# Keep PostgreSQL databases
|
||||
just mlflow::uninstall false
|
||||
|
||||
# Delete PostgreSQL databases and user
|
||||
just mlflow::uninstall true
|
||||
```
|
||||
|
||||
### Clean Up All Resources
|
||||
|
||||
```bash
|
||||
just mlflow::cleanup
|
||||
```
|
||||
|
||||
Deletes databases, users, secrets, and Keycloak client (with confirmation).
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Check Pod Status
|
||||
|
||||
```bash
|
||||
kubectl get pods -n mlflow
|
||||
```
|
||||
|
||||
Expected pods:
|
||||
|
||||
- `mlflow-*` - Main application (1 replica)
|
||||
- `mlflow-db-migration-*` - Database migration (Completed)
|
||||
- `mlflow-dbchecker-*` - Database connection check (Completed)
|
||||
|
||||
### OAuth Login Fails
|
||||
|
||||
#### Redirect Loop (Returns to Login Page)
|
||||
|
||||
**Symptoms**: User authenticates with Keycloak but returns to login page
|
||||
|
||||
**Common Causes**:
|
||||
|
||||
1. **Redirect URI Mismatch**:
|
||||
- Check Keycloak client redirect URI matches `/callback`
|
||||
- Verify `OIDC_REDIRECT_URI` is `https://{host}/callback`
|
||||
|
||||
2. **Missing Groups Scope**:
|
||||
- Ensure `groups` scope is added to Keycloak client
|
||||
- Check groups mapper is configured in Keycloak
|
||||
|
||||
3. **Group Membership**:
|
||||
- User must be in `mlflow-admins` or `mlflow-users` group
|
||||
- Add user to group: `just keycloak::add-user-to-group <user> mlflow-admins`
|
||||
|
||||
#### Session Errors
|
||||
|
||||
**Error**: `Session module for filesystem could not be imported`
|
||||
|
||||
**Solution**: Ensure session configuration is correct:
|
||||
|
||||
```yaml
|
||||
SESSION_TYPE: "cachelib"
|
||||
SESSION_CACHE_DIR: "/tmp/session"
|
||||
```
|
||||
|
||||
#### Group Detection Errors
|
||||
|
||||
**Error**: `Group detection error: No module named 'oidc'`
|
||||
|
||||
**Solution**: Remove `OIDC_GROUP_DETECTION_PLUGIN` setting (should be unset or removed)
|
||||
|
||||
### Server Type Errors
|
||||
|
||||
**Error**: `TypeError: Flask.__call__() missing 1 required positional argument: 'start_response'`
|
||||
|
||||
**Cause**: Using Flask server type with Uvicorn (ASGI)
|
||||
|
||||
**Solution**: Ensure `appName: "oidc-auth-fastapi"` in values
|
||||
|
||||
### Database Connection Issues
|
||||
|
||||
Check database credentials:
|
||||
|
||||
```bash
|
||||
kubectl get secret mlflow-db-secret -n mlflow -o yaml
|
||||
```
|
||||
|
||||
Test database connectivity:
|
||||
|
||||
```bash
|
||||
kubectl exec -n mlflow deployment/mlflow -- \
|
||||
psql -h postgres-cluster-rw.postgres -U mlflow -d mlflow -c "SELECT 1"
|
||||
```
|
||||
|
||||
### Artifact Storage Issues
|
||||
|
||||
Check MinIO credentials:
|
||||
|
||||
```bash
|
||||
kubectl get secret mlflow-s3-secret -n mlflow -o yaml
|
||||
```
|
||||
|
||||
Test MinIO connectivity:
|
||||
|
||||
```bash
|
||||
kubectl exec -n mlflow deployment/mlflow -- \
|
||||
python -c "import boto3; import os; \
|
||||
client = boto3.client('s3', \
|
||||
endpoint_url=os.getenv('MLFLOW_S3_ENDPOINT_URL'), \
|
||||
aws_access_key_id=os.getenv('AWS_ACCESS_KEY_ID'), \
|
||||
aws_secret_access_key=os.getenv('AWS_SECRET_ACCESS_KEY')); \
|
||||
print(client.list_buckets())"
|
||||
```
|
||||
|
||||
### Check Logs
|
||||
|
||||
```bash
|
||||
# Application logs
|
||||
kubectl logs -n mlflow deployment/mlflow --tail=100
|
||||
|
||||
# Database migration logs
|
||||
kubectl logs -n mlflow job/mlflow-db-migration
|
||||
|
||||
# Real-time logs
|
||||
kubectl logs -n mlflow deployment/mlflow -f
|
||||
```
|
||||
|
||||
### Common Log Messages
|
||||
|
||||
**Normal**:
|
||||
|
||||
- `Successfully created FastAPI app with OIDC integration`
|
||||
- `OIDC routes, authentication, and UI should now be available`
|
||||
- `Session module for cachelib imported`
|
||||
- `Redirect URI for OIDC login: https://{host}/callback`
|
||||
|
||||
**Issues**:
|
||||
|
||||
- `Group detection error` - Check OIDC configuration
|
||||
- `Authorization error: User is not allowed to login` - User not in required group
|
||||
- `Session error` - Session configuration issue
|
||||
|
||||
### Image Build Issues
|
||||
|
||||
If custom image build fails:
|
||||
|
||||
```bash
|
||||
# Set Docker host
|
||||
export DOCKER_HOST=ssh://yourhost.com
|
||||
|
||||
# Rebuild image manually
|
||||
cd /path/to/buun-stack/mlflow
|
||||
just mlflow::build-and-push-image
|
||||
|
||||
# Check image exists on remote host
|
||||
docker images localhost:30500/mlflow:3.6.0-oidc
|
||||
|
||||
# Test image on remote host
|
||||
docker run --rm localhost:30500/mlflow:3.6.0-oidc mlflow --version
|
||||
```
|
||||
|
||||
**Note**: All Docker commands run on the remote host specified by `DOCKER_HOST`.
|
||||
|
||||
## Custom Image
|
||||
|
||||
### Dockerfile
|
||||
|
||||
Located at `mlflow/image/Dockerfile`:
|
||||
|
||||
```dockerfile
|
||||
FROM burakince/mlflow:3.6.0
|
||||
|
||||
# Install mlflow-oidc-auth plugin with filesystem session support
|
||||
RUN pip install --no-cache-dir \
|
||||
mlflow-oidc-auth[full]==5.6.1 \
|
||||
cachelib[filesystem]
|
||||
```
|
||||
|
||||
### Building Custom Image
|
||||
|
||||
**Important**: Set `DOCKER_HOST` to build on the remote k3s host:
|
||||
|
||||
```bash
|
||||
export DOCKER_HOST=ssh://yourhost.com
|
||||
|
||||
just mlflow::build-image # Build only
|
||||
just mlflow::push-image # Push only (requires prior build)
|
||||
just mlflow::build-and-push-image # Build and push
|
||||
```
|
||||
|
||||
The image is built on the remote Docker host and pushed to the k3s local registry (`localhost:30500`).
|
||||
|
||||
## References
|
||||
|
||||
- [MLflow Documentation](https://mlflow.org/docs/latest/index.html)
|
||||
- [MLflow GitHub](https://github.com/mlflow/mlflow)
|
||||
- [mlflow-oidc-auth Plugin](https://github.com/mlflow-oidc/mlflow-oidc-auth)
|
||||
- [mlflow-oidc-auth Documentation](https://mlflow-oidc.github.io/mlflow-oidc-auth/)
|
||||
- [Community Charts MLflow](https://github.com/community-charts/helm-charts/tree/main/charts/mlflow)
|
||||
- [Keycloak OIDC](https://www.keycloak.org/docs/latest/securing_apps/#_oidc)
|
||||
8
mlflow/image/Dockerfile
Normal file
8
mlflow/image/Dockerfile
Normal file
@@ -0,0 +1,8 @@
|
||||
FROM burakince/mlflow:3.6.0
|
||||
|
||||
# Install mlflow-oidc-auth plugin with filesystem session support
|
||||
RUN pip install --no-cache-dir \
|
||||
mlflow-oidc-auth[full]==5.6.1 \
|
||||
cachelib[filesystem]
|
||||
|
||||
# Keep the original entrypoint
|
||||
184
mlflow/justfile
184
mlflow/justfile
@@ -3,11 +3,18 @@ set fallback := true
|
||||
export MLFLOW_NAMESPACE := env("MLFLOW_NAMESPACE", "mlflow")
|
||||
export MLFLOW_CHART_VERSION := env("MLFLOW_CHART_VERSION", "1.8.0")
|
||||
export MLFLOW_HOST := env("MLFLOW_HOST", "")
|
||||
export IMAGE_REGISTRY := env("IMAGE_REGISTRY", "localhost:30500")
|
||||
export MLFLOW_IMAGE_TAG := env("MLFLOW_IMAGE_TAG", "3.6.0-oidc")
|
||||
export MLFLOW_IMAGE_PULL_POLICY := env("MLFLOW_IMAGE_PULL_POLICY", "IfNotPresent")
|
||||
export MLFLOW_OIDC_ENABLED := env("MLFLOW_OIDC_ENABLED", "true")
|
||||
export POSTGRES_NAMESPACE := env("POSTGRES_NAMESPACE", "postgres")
|
||||
export MINIO_NAMESPACE := env("MINIO_NAMESPACE", "minio")
|
||||
export EXTERNAL_SECRETS_NAMESPACE := env("EXTERNAL_SECRETS_NAMESPACE", "external-secrets")
|
||||
export K8S_VAULT_NAMESPACE := env("K8S_VAULT_NAMESPACE", "vault")
|
||||
export MONITORING_ENABLED := env("MONITORING_ENABLED", "")
|
||||
export PROMETHEUS_NAMESPACE := env("PROMETHEUS_NAMESPACE", "monitoring")
|
||||
export KEYCLOAK_REALM := env("KEYCLOAK_REALM", "buunstack")
|
||||
export KEYCLOAK_HOST := env("KEYCLOAK_HOST", "")
|
||||
|
||||
[private]
|
||||
default:
|
||||
@@ -22,6 +29,26 @@ add-helm-repo:
|
||||
remove-helm-repo:
|
||||
helm repo remove community-charts
|
||||
|
||||
# Build custom MLflow image with OIDC auth plugin
|
||||
build-image:
|
||||
#!/bin/bash
|
||||
set -euo pipefail
|
||||
echo "Building MLflow image with OIDC auth plugin..."
|
||||
cd image
|
||||
docker build -t ${IMAGE_REGISTRY}/mlflow:${MLFLOW_IMAGE_TAG} .
|
||||
echo "Image built: ${IMAGE_REGISTRY}/mlflow:${MLFLOW_IMAGE_TAG}"
|
||||
|
||||
# Push custom MLflow image to registry
|
||||
push-image:
|
||||
#!/bin/bash
|
||||
set -euo pipefail
|
||||
echo "Pushing MLflow image to registry..."
|
||||
docker push ${IMAGE_REGISTRY}/mlflow:${MLFLOW_IMAGE_TAG}
|
||||
echo "Image pushed: ${IMAGE_REGISTRY}/mlflow:${MLFLOW_IMAGE_TAG}"
|
||||
|
||||
# Build and push custom MLflow image
|
||||
build-and-push-image: build-image push-image
|
||||
|
||||
# Create namespace
|
||||
create-namespace:
|
||||
@kubectl get namespace ${MLFLOW_NAMESPACE} &>/dev/null || \
|
||||
@@ -68,6 +95,16 @@ setup-postgres-db:
|
||||
echo "Ensuring database permissions..."
|
||||
just postgres::grant mlflow mlflow
|
||||
|
||||
# Create mlflow_auth database for OIDC user management
|
||||
if just postgres::db-exists mlflow_auth &>/dev/null; then
|
||||
echo "Database 'mlflow_auth' already exists."
|
||||
else
|
||||
echo "Creating new database 'mlflow_auth' for OIDC authentication..."
|
||||
just postgres::create-db mlflow_auth
|
||||
fi
|
||||
echo "Granting permissions on mlflow_auth to mlflow user..."
|
||||
just postgres::grant mlflow_auth mlflow
|
||||
|
||||
if helm status external-secrets -n ${EXTERNAL_SECRETS_NAMESPACE} &>/dev/null; then
|
||||
echo "External Secrets available. Storing credentials in Vault..."
|
||||
just vault::put mlflow/postgres username=mlflow password="${db_password}"
|
||||
@@ -173,7 +210,7 @@ delete-s3-secret:
|
||||
@kubectl delete externalsecret mlflow-s3-external-secret -n ${MLFLOW_NAMESPACE} --ignore-not-found
|
||||
|
||||
# Install MLflow
|
||||
install: check-env
|
||||
install:
|
||||
#!/bin/bash
|
||||
set -euo pipefail
|
||||
echo "Installing MLflow..."
|
||||
@@ -191,11 +228,30 @@ install: check-env
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if [ -z "${MLFLOW_HOST}" ]; then
|
||||
while [ -z "${MLFLOW_HOST}" ]; do
|
||||
MLFLOW_HOST=$(
|
||||
gum input --prompt="MLflow host (FQDN): " --width=100 \
|
||||
--placeholder="e.g., mlflow.example.com"
|
||||
)
|
||||
done
|
||||
fi
|
||||
if helm status kube-prometheus-stack -n ${PROMETHEUS_NAMESPACE} &>/dev/null; then
|
||||
if [ -z "${MONITORING_ENABLED}" ]; then
|
||||
if gum confirm "Enable Prometheus monitoring (ServiceMonitor)?"; then
|
||||
MONITORING_ENABLED="true"
|
||||
else
|
||||
MONITORING_ENABLED="false"
|
||||
fi
|
||||
fi
|
||||
else
|
||||
MONITORING_ENABLED="false"
|
||||
fi
|
||||
|
||||
just setup-postgres-db
|
||||
just create-db-secret
|
||||
just create-s3-secret
|
||||
|
||||
# Create mlflow bucket in MinIO if it doesn't exist
|
||||
if ! just minio::bucket-exists mlflow; then
|
||||
echo "Creating 'mlflow' bucket in MinIO..."
|
||||
just minio::create-bucket mlflow
|
||||
@@ -205,10 +261,81 @@ install: check-env
|
||||
|
||||
just add-helm-repo
|
||||
|
||||
echo "Generating Helm values..."
|
||||
just keycloak::delete-client "${KEYCLOAK_REALM}" "mlflow" || true
|
||||
oidc_client_secret=$(just utils::random-password)
|
||||
redirect_urls="https://${MLFLOW_HOST}/callback"
|
||||
just keycloak::create-client \
|
||||
realm="${KEYCLOAK_REALM}" \
|
||||
client_id="mlflow" \
|
||||
redirect_url="${redirect_urls}" \
|
||||
client_secret="${oidc_client_secret}"
|
||||
echo "✓ Keycloak client 'mlflow' created"
|
||||
|
||||
if ! just keycloak::get-client-scope "${KEYCLOAK_REALM}" groups &>/dev/null; then
|
||||
just keycloak::create-client-scope "${KEYCLOAK_REALM}" groups "User group memberships"
|
||||
just keycloak::add-groups-mapper-to-scope "${KEYCLOAK_REALM}" groups
|
||||
echo "✓ Groups client scope created"
|
||||
else
|
||||
echo "✓ Groups client scope already exists"
|
||||
fi
|
||||
just keycloak::add-scope-to-client "${KEYCLOAK_REALM}" mlflow groups
|
||||
echo "✓ Groups scope added to mlflow client"
|
||||
|
||||
echo "Setting up MLflow groups..."
|
||||
just keycloak::create-group mlflow-admins "" "MLflow administrators with full access" || true
|
||||
just keycloak::create-group mlflow-users "" "MLflow users with basic access" || true
|
||||
echo "✓ MLflow groups configured"
|
||||
|
||||
if helm status external-secrets -n ${EXTERNAL_SECRETS_NAMESPACE} &>/dev/null; then
|
||||
echo "External Secrets Operator detected. Storing OIDC config in Vault..."
|
||||
|
||||
# Get PostgreSQL credentials for auth database
|
||||
db_username=$(just vault::get mlflow/postgres username)
|
||||
db_password=$(just vault::get mlflow/postgres password)
|
||||
auth_db_uri="postgresql://${db_username}:${db_password}@postgres-cluster-rw.${POSTGRES_NAMESPACE}.svc.cluster.local:5432/mlflow_auth"
|
||||
|
||||
just vault::put "mlflow/oidc" \
|
||||
client_id="mlflow" \
|
||||
client_secret="${oidc_client_secret}" \
|
||||
auth_db_uri="${auth_db_uri}"
|
||||
|
||||
kubectl delete secret mlflow-oidc-config -n ${MLFLOW_NAMESPACE} --ignore-not-found
|
||||
kubectl delete externalsecret mlflow-oidc-external-secret -n ${MLFLOW_NAMESPACE} --ignore-not-found
|
||||
|
||||
export OIDC_CLIENT_SECRET="${oidc_client_secret}"
|
||||
gomplate -f mlflow-oidc-external-secret.gomplate.yaml | kubectl apply -f -
|
||||
|
||||
echo "Waiting for ExternalSecret to sync..."
|
||||
kubectl wait --for=condition=Ready externalsecret/mlflow-oidc-external-secret \
|
||||
-n ${MLFLOW_NAMESPACE} --timeout=60s
|
||||
else
|
||||
echo "Creating Kubernetes secret directly..."
|
||||
|
||||
# Get PostgreSQL credentials for auth database
|
||||
db_username=$(just vault::get mlflow/postgres username 2>/dev/null || echo "mlflow")
|
||||
db_password=$(just vault::get mlflow/postgres password)
|
||||
auth_db_uri="postgresql://${db_username}:${db_password}@postgres-cluster-rw.${POSTGRES_NAMESPACE}.svc.cluster.local:5432/mlflow_auth"
|
||||
|
||||
kubectl delete secret mlflow-oidc-config -n ${MLFLOW_NAMESPACE} --ignore-not-found
|
||||
kubectl create secret generic mlflow-oidc-config -n ${MLFLOW_NAMESPACE} \
|
||||
--from-literal=OIDC_CLIENT_ID="mlflow" \
|
||||
--from-literal=OIDC_CLIENT_SECRET="${oidc_client_secret}" \
|
||||
--from-literal=OIDC_USERS_DB_URI="${auth_db_uri}"
|
||||
|
||||
# Store in Vault for backup if available
|
||||
if helm status vault -n ${K8S_VAULT_NAMESPACE} &>/dev/null; then
|
||||
just vault::put "mlflow/oidc" \
|
||||
client_id="mlflow" \
|
||||
client_secret="${oidc_client_secret}" \
|
||||
auth_db_uri="${auth_db_uri}"
|
||||
fi
|
||||
fi
|
||||
|
||||
export MLFLOW_OIDC_ENABLED="true"
|
||||
echo "Generating Helm values with OIDC enabled..."
|
||||
gomplate -f values.gomplate.yaml -o values.yaml
|
||||
|
||||
echo "Installing MLflow Helm chart from Community Charts..."
|
||||
echo "Installing MLflow Helm chart from Community Charts with OIDC..."
|
||||
helm upgrade --cleanup-on-fail --install mlflow \
|
||||
community-charts/mlflow \
|
||||
--version ${MLFLOW_CHART_VERSION} \
|
||||
@@ -218,18 +345,35 @@ install: check-env
|
||||
-f values.yaml
|
||||
|
||||
echo ""
|
||||
echo "=== MLflow installed ==="
|
||||
echo "=== MLflow installed with OIDC authentication ==="
|
||||
echo "MLflow URL: https://${MLFLOW_HOST}"
|
||||
echo ""
|
||||
echo "Next steps:"
|
||||
echo " 1. Configure OAuth2 Proxy for authentication (recommended)"
|
||||
echo " 2. Access MLflow UI at https://${MLFLOW_HOST}"
|
||||
echo "OIDC authentication is enabled using Keycloak"
|
||||
echo "Users can sign in with their Keycloak credentials"
|
||||
|
||||
# Upgrade MLflow
|
||||
upgrade: check-env
|
||||
upgrade:
|
||||
#!/bin/bash
|
||||
set -euo pipefail
|
||||
echo "Upgrading MLflow..."
|
||||
if [ -z "${MLFLOW_HOST}" ]; then
|
||||
while [ -z "${MLFLOW_HOST}" ]; do
|
||||
MLFLOW_HOST=$(
|
||||
gum input --prompt="MLflow host (FQDN): " --width=100 \
|
||||
--placeholder="e.g., mlflow.example.com"
|
||||
)
|
||||
done
|
||||
fi
|
||||
if helm status kube-prometheus-stack -n ${PROMETHEUS_NAMESPACE} &>/dev/null; then
|
||||
if [ -z "${MONITORING_ENABLED}" ]; then
|
||||
if gum confirm "Enable Prometheus monitoring (ServiceMonitor)?"; then
|
||||
MONITORING_ENABLED="true"
|
||||
else
|
||||
MONITORING_ENABLED="false"
|
||||
fi
|
||||
fi
|
||||
else
|
||||
MONITORING_ENABLED="false"
|
||||
fi
|
||||
|
||||
echo "Generating Helm values..."
|
||||
gomplate -f values.gomplate.yaml -o values.yaml
|
||||
@@ -254,11 +398,14 @@ uninstall delete-db='true':
|
||||
helm uninstall mlflow -n ${MLFLOW_NAMESPACE} --ignore-not-found
|
||||
just delete-db-secret
|
||||
just delete-s3-secret
|
||||
kubectl delete secret mlflow-oidc-config -n ${MLFLOW_NAMESPACE} --ignore-not-found
|
||||
kubectl delete externalsecret mlflow-oidc-external-secret -n ${MLFLOW_NAMESPACE} --ignore-not-found
|
||||
just delete-namespace
|
||||
if [ "{{ delete-db }}" = "true" ]; then
|
||||
just postgres::delete-db mlflow || true
|
||||
just postgres::delete-user mlflow || true
|
||||
fi
|
||||
just keycloak::delete-client "${KEYCLOAK_REALM}" "mlflow" || true
|
||||
echo "MLflow uninstalled"
|
||||
|
||||
# Clean up all MLflow resources
|
||||
@@ -272,22 +419,9 @@ cleanup:
|
||||
just postgres::delete-user mlflow || true
|
||||
just vault::delete mlflow/postgres || true
|
||||
just vault::delete mlflow/s3 || true
|
||||
just vault::delete mlflow/oidc || true
|
||||
just keycloak::delete-client "${KEYCLOAK_REALM}" "mlflow" || true
|
||||
echo "Cleanup completed"
|
||||
else
|
||||
echo "Cleanup cancelled"
|
||||
fi
|
||||
|
||||
# Check the environment
|
||||
[private]
|
||||
check-env:
|
||||
#!/bin/bash
|
||||
set -euo pipefail
|
||||
if [ -z "${MLFLOW_HOST}" ]; then
|
||||
while [ -z "${MLFLOW_HOST}" ]; do
|
||||
MLFLOW_HOST=$(
|
||||
gum input --prompt="MLflow host (FQDN): " --width=100 \
|
||||
--placeholder="e.g., mlflow.example.com"
|
||||
)
|
||||
done
|
||||
just env::set MLFLOW_HOST="${MLFLOW_HOST}"
|
||||
fi
|
||||
|
||||
27
mlflow/mlflow-oidc-external-secret.gomplate.yaml
Normal file
27
mlflow/mlflow-oidc-external-secret.gomplate.yaml
Normal file
@@ -0,0 +1,27 @@
|
||||
---
|
||||
apiVersion: external-secrets.io/v1
|
||||
kind: ExternalSecret
|
||||
metadata:
|
||||
name: mlflow-oidc-external-secret
|
||||
namespace: {{ .Env.MLFLOW_NAMESPACE }}
|
||||
spec:
|
||||
refreshInterval: 1h
|
||||
secretStoreRef:
|
||||
name: vault-secret-store
|
||||
kind: ClusterSecretStore
|
||||
target:
|
||||
name: mlflow-oidc-config
|
||||
creationPolicy: Owner
|
||||
data:
|
||||
- secretKey: OIDC_CLIENT_ID
|
||||
remoteRef:
|
||||
key: mlflow/oidc
|
||||
property: client_id
|
||||
- secretKey: OIDC_CLIENT_SECRET
|
||||
remoteRef:
|
||||
key: mlflow/oidc
|
||||
property: client_secret
|
||||
- secretKey: OIDC_USERS_DB_URI
|
||||
remoteRef:
|
||||
key: mlflow/oidc
|
||||
property: auth_db_uri
|
||||
@@ -2,11 +2,18 @@
|
||||
# Replica count
|
||||
replicaCount: 1
|
||||
|
||||
# Image configuration (Community Charts uses burakince/mlflow)
|
||||
# Image configuration
|
||||
{{- if eq (.Env.MLFLOW_OIDC_ENABLED | default "false") "true" }}
|
||||
image:
|
||||
repository: {{ .Env.IMAGE_REGISTRY }}/mlflow
|
||||
pullPolicy: {{ .Env.MLFLOW_IMAGE_PULL_POLICY }}
|
||||
tag: "{{ .Env.MLFLOW_IMAGE_TAG }}" # Custom MLflow with OIDC
|
||||
{{- else }}
|
||||
image:
|
||||
repository: burakince/mlflow
|
||||
pullPolicy: IfNotPresent
|
||||
pullPolicy: {{ .Env.MLFLOW_IMAGE_PULL_POLICY }}
|
||||
tag: "3.6.0" # MLflow 3.6.0
|
||||
{{- end }}
|
||||
|
||||
# Backend store configuration (PostgreSQL)
|
||||
backendStore:
|
||||
@@ -44,12 +51,49 @@ artifactRoot:
|
||||
keyOfAccessKeyId: "AWS_ACCESS_KEY_ID"
|
||||
keyOfSecretAccessKey: "AWS_SECRET_ACCESS_KEY"
|
||||
|
||||
{{- if eq (.Env.MLFLOW_OIDC_ENABLED | default "false") "true" }}
|
||||
# Disable MLflow logging to prevent gunicornOpts auto-injection
|
||||
log:
|
||||
enabled: false
|
||||
|
||||
# A map of arguments to pass to the `mlflow server` command (OIDC enabled)
|
||||
# Use oidc-auth-fastapi for FastAPI/ASGI compatibility with Uvicorn
|
||||
extraArgs:
|
||||
appName: "oidc-auth-fastapi"
|
||||
# Allow connections from external hostname (with and without port)
|
||||
allowedHosts: "{{ .Env.MLFLOW_HOST }},{{ .Env.MLFLOW_HOST }}:443"
|
||||
|
||||
# Extra secrets for OIDC configuration
|
||||
extraSecretNamesForEnvFrom:
|
||||
- mlflow-oidc-config
|
||||
|
||||
# Extra environment variables for OIDC and S3/MinIO configuration
|
||||
extraEnvVars:
|
||||
MLFLOW_S3_ENDPOINT_URL: "http://minio.{{ .Env.MINIO_NAMESPACE }}.svc.cluster.local:9000"
|
||||
MLFLOW_S3_IGNORE_TLS: "true"
|
||||
# OIDC Configuration - mlflow-oidc-auth uses OIDC Discovery
|
||||
OIDC_DISCOVERY_URL: "https://{{ .Env.KEYCLOAK_HOST }}/realms/{{ .Env.KEYCLOAK_REALM }}/.well-known/openid-configuration"
|
||||
OIDC_REDIRECT_URI: "https://{{ .Env.MLFLOW_HOST }}/callback"
|
||||
OIDC_SCOPE: "openid profile email groups"
|
||||
OIDC_PROVIDER_DISPLAY_NAME: "Keycloak"
|
||||
# OIDC attribute mapping
|
||||
OIDC_GROUPS_ATTRIBUTE: "groups"
|
||||
# Group configuration - required for access control
|
||||
OIDC_ADMIN_GROUP_NAME: "mlflow-admins"
|
||||
OIDC_GROUP_NAME: "mlflow-admins,mlflow-users"
|
||||
# Default permission for new resources
|
||||
DEFAULT_MLFLOW_PERMISSION: "MANAGE"
|
||||
# Session configuration - use cachelib with filesystem backend
|
||||
SESSION_TYPE: "cachelib"
|
||||
SESSION_CACHE_DIR: "/tmp/session"
|
||||
{{- else }}
|
||||
# Extra environment variables for S3/MinIO configuration
|
||||
extraEnvVars:
|
||||
MLFLOW_S3_ENDPOINT_URL: "http://minio.{{ .Env.MINIO_NAMESPACE }}.svc.cluster.local:9000"
|
||||
MLFLOW_S3_IGNORE_TLS: "true"
|
||||
# Disable security middleware when using Gunicorn (env var approach)
|
||||
MLFLOW_SERVER_DISABLE_SECURITY_MIDDLEWARE: "true"
|
||||
{{- end }}
|
||||
|
||||
# Service configuration
|
||||
service:
|
||||
@@ -73,7 +117,7 @@ ingress:
|
||||
|
||||
# ServiceMonitor for Prometheus
|
||||
serviceMonitor:
|
||||
enabled: true
|
||||
enabled: {{ .Env.MONITORING_ENABLED }}
|
||||
useServicePort: false
|
||||
namespace: "{{ .Env.PROMETHEUS_NAMESPACE }}"
|
||||
interval: 30s
|
||||
|
||||
Reference in New Issue
Block a user