MLflow
Open source platform for managing the end-to-end machine learning lifecycle with Keycloak OIDC authentication.
Overview
This module deploys MLflow using the Community Charts Helm chart with:
- Keycloak OIDC authentication for user login
- Custom Docker image with mlflow-oidc-auth plugin
- PostgreSQL backend for tracking server and auth databases
- MinIO/S3 artifact storage with proxied access
- FastAPI/ASGI server with Uvicorn for production
- HTTPS reverse proxy support via Traefik
- Group-based access control via Keycloak groups
- Prometheus metrics for monitoring
Prerequisites
- Kubernetes cluster (k3s)
- Keycloak installed and configured
- PostgreSQL cluster (CloudNativePG)
- MinIO object storage
- External Secrets Operator (optional, for Vault integration)
- Docker registry (local or remote)
Installation
Basic Installation
-
Build and Push Custom MLflow Image:
Set
DOCKER_HOSTto your remote Docker host (where k3s is running):export DOCKER_HOST=ssh://yourhost.com just mlflow::build-and-push-imageThis builds a custom MLflow image with OIDC auth plugin and pushes it to your k3s registry.
-
Install MLflow:
just mlflow::installYou will be prompted for:
- MLflow host (FQDN): e.g.,
mlflow.example.com
- MLflow host (FQDN): e.g.,
What Gets Installed
- MLflow tracking server (FastAPI with OIDC)
- PostgreSQL databases:
mlflow- Experiment tracking, models, and runsmlflow_auth- User authentication and permissions
- PostgreSQL user
mlflowwith access to both databases - MinIO bucket
mlflowfor artifact storage - Custom MLflow Docker image with OIDC auth plugin
- Keycloak OAuth client (confidential client)
- Keycloak groups:
mlflow-admins- Full administrative accessmlflow-users- Basic user access
Configuration
Docker Build Environment
For building and pushing the custom MLflow image:
DOCKER_HOST=ssh://yourhost.com # Remote Docker host (where k3s is running)
IMAGE_REGISTRY=localhost:30500 # k3s local registry
Deployment Configuration
Environment variables (set in .env.local or override):
MLFLOW_NAMESPACE=mlflow # Kubernetes namespace
MLFLOW_CHART_VERSION=1.8.0 # Helm chart version
MLFLOW_HOST=mlflow.example.com # External hostname
MLFLOW_IMAGE_TAG=3.6.0-oidc # Custom image tag
MLFLOW_IMAGE_PULL_POLICY=IfNotPresent # Image pull policy
KEYCLOAK_HOST=auth.example.com # Keycloak hostname
KEYCLOAK_REALM=buunstack # Keycloak realm name
Architecture Notes
MLflow 3.6.0 with OIDC:
- Uses
mlflow-oidc-auth[full]==5.6.1plugin - FastAPI/ASGI server with Uvicorn (not Gunicorn)
- Server type:
oidc-auth-fastapifor ASGI compatibility - Session management:
cachelibwith filesystem backend - Custom Docker image built from
burakince/mlflow:3.6.0
Authentication Flow:
- OIDC Discovery:
/.well-known/openid-configuration - Redirect URI:
/callback(not/oidc/callback) - Required scopes:
openid profile email groups - Group attribute:
groupsfrom UserInfo
Database Structure:
mlflowdatabase: Experiment tracking, models, parameters, metricsmlflow_authdatabase: User accounts, groups, permissions
Usage
Access MLflow
- Navigate to
https://your-mlflow-host/ - Click "Keycloak" button to authenticate
- After successful login:
- First redirect: Permissions Management UI (
/oidc/ui/) - Click "MLflow" button: Main MLflow UI
- First redirect: Permissions Management UI (
Grant Admin Access
Add users to the mlflow-admins group:
just keycloak::add-user-to-group <username> mlflow-admins
Admin users have full privileges including:
- Experiment and model management
- User and permission management
- Access to all experiments and models
Log Experiments
Using Python Client
import mlflow
# Set tracking URI
mlflow.set_tracking_uri("https://mlflow.example.com")
# Start experiment
mlflow.set_experiment("my-experiment")
# Log parameters, metrics, and artifacts
with mlflow.start_run():
mlflow.log_param("learning_rate", 0.01)
mlflow.log_metric("accuracy", 0.95)
mlflow.log_artifact("model.pkl")
Authentication for API Access
For programmatic access, create an access token:
- Log in to MLflow UI
- Navigate to Permissions UI → Create access token
- Use token in your code:
import os
os.environ["MLFLOW_TRACKING_TOKEN"] = "your-token"
Model Registry
Register and manage models:
# Register model
mlflow.register_model(
model_uri="runs:/<run-id>/model",
name="my-model"
)
# Transition model stage
from mlflow.tracking import MlflowClient
client = MlflowClient()
client.transition_model_version_stage(
name="my-model",
version=1,
stage="Production"
)
Features
- Experiment Tracking: Log parameters, metrics, and artifacts
- Model Registry: Version and manage ML models
- Model Serving: Deploy models as REST APIs
- Project Reproducibility: Package code, data, and environment
- Remote Execution: Run experiments on remote platforms
- UI Dashboard: Visual experiment comparison and analysis
- LLM Tracking: Track LLM applications with traces
- Prompt Registry: Manage and version prompts
Architecture
External Users
↓
Cloudflare Tunnel (HTTPS)
↓
Traefik Ingress (HTTPS)
↓
MLflow Server (HTTP inside cluster)
├─ FastAPI/ASGI (Uvicorn)
├─ mlflow-oidc-auth plugin
│ ├─ OAuth → Keycloak (authentication)
│ └─ Session → FileSystemCache
├─ PostgreSQL (metadata)
│ ├─ mlflow (tracking)
│ └─ mlflow_auth (users/groups)
└─ MinIO (artifacts via proxied access)
Key Components:
- Server Type:
oidc-auth-fastapifor FastAPI/ASGI compatibility - Allowed Hosts: Validates
Hostheader for security - Session Backend: Cachelib with filesystem storage
- Artifact Storage: Proxied through MLflow server (no direct S3 access needed)
Authentication
User Login (OIDC)
- Users authenticate via Keycloak
- Standard OIDC flow with Authorization Code grant
- Group membership retrieved from
groupsclaim in UserInfo - Users automatically created on first login
Access Control
Group-based Permissions:
OIDC_ADMIN_GROUP_NAME = "mlflow-admins"
OIDC_GROUP_NAME = "mlflow-admins,mlflow-users"
Default Permissions:
- New resources:
MANAGEpermission for creator - Admins: Full access to all resources
- Users: Access based on explicit permissions
Permission Management
Access the Permissions UI at /oidc/ui/:
- View and manage user permissions
- Assign permissions to experiments, models, and prompts
- Create and manage groups
- View audit logs
Management
Rebuild Custom Image
If you need to update the custom MLflow image:
export DOCKER_HOST=ssh://yourhost.com
just mlflow::build-and-push-image
After rebuilding, restart MLflow to use the new image:
kubectl rollout restart deployment/mlflow -n mlflow
Upgrade MLflow
just mlflow::upgrade
Updates the Helm deployment with current configuration.
Uninstall
# Keep PostgreSQL databases
just mlflow::uninstall false
# Delete PostgreSQL databases and user
just mlflow::uninstall true
Clean Up All Resources
just mlflow::cleanup
Deletes databases, users, secrets, and Keycloak client (with confirmation).
Troubleshooting
Check Pod Status
kubectl get pods -n mlflow
Expected pods:
mlflow-*- Main application (1 replica)mlflow-db-migration-*- Database migration (Completed)mlflow-dbchecker-*- Database connection check (Completed)
OAuth Login Fails
Redirect Loop (Returns to Login Page)
Symptoms: User authenticates with Keycloak but returns to login page
Common Causes:
-
Redirect URI Mismatch:
- Check Keycloak client redirect URI matches
/callback - Verify
OIDC_REDIRECT_URIishttps://{host}/callback
- Check Keycloak client redirect URI matches
-
Missing Groups Scope:
- Ensure
groupsscope is added to Keycloak client - Check groups mapper is configured in Keycloak
- Ensure
-
Group Membership:
- User must be in
mlflow-adminsormlflow-usersgroup - Add user to group:
just keycloak::add-user-to-group <user> mlflow-admins
- User must be in
Session Errors
Error: Session module for filesystem could not be imported
Solution: Ensure session configuration is correct:
SESSION_TYPE: "cachelib"
SESSION_CACHE_DIR: "/tmp/session"
Group Detection Errors
Error: Group detection error: No module named 'oidc'
Solution: Remove OIDC_GROUP_DETECTION_PLUGIN setting (should be unset or removed)
Server Type Errors
Error: TypeError: Flask.__call__() missing 1 required positional argument: 'start_response'
Cause: Using Flask server type with Uvicorn (ASGI)
Solution: Ensure appName: "oidc-auth-fastapi" in values
Database Connection Issues
Check database credentials:
kubectl get secret mlflow-db-secret -n mlflow -o yaml
Test database connectivity:
kubectl exec -n mlflow deployment/mlflow -- \
psql -h postgres-cluster-rw.postgres -U mlflow -d mlflow -c "SELECT 1"
Artifact Storage Issues
Check MinIO credentials:
kubectl get secret mlflow-s3-secret -n mlflow -o yaml
Test MinIO connectivity:
kubectl exec -n mlflow deployment/mlflow -- \
python -c "import boto3; import os; \
client = boto3.client('s3', \
endpoint_url=os.getenv('MLFLOW_S3_ENDPOINT_URL'), \
aws_access_key_id=os.getenv('AWS_ACCESS_KEY_ID'), \
aws_secret_access_key=os.getenv('AWS_SECRET_ACCESS_KEY')); \
print(client.list_buckets())"
Check Logs
# Application logs
kubectl logs -n mlflow deployment/mlflow --tail=100
# Database migration logs
kubectl logs -n mlflow job/mlflow-db-migration
# Real-time logs
kubectl logs -n mlflow deployment/mlflow -f
Common Log Messages
Normal:
Successfully created FastAPI app with OIDC integrationOIDC routes, authentication, and UI should now be availableSession module for cachelib importedRedirect URI for OIDC login: https://{host}/callback
Issues:
Group detection error- Check OIDC configurationAuthorization error: User is not allowed to login- User not in required groupSession error- Session configuration issue
Image Build Issues
If custom image build fails:
# Set Docker host
export DOCKER_HOST=ssh://yourhost.com
# Rebuild image manually
cd /path/to/buun-stack/mlflow
just mlflow::build-and-push-image
# Check image exists on remote host
docker images localhost:30500/mlflow:3.6.0-oidc
# Test image on remote host
docker run --rm localhost:30500/mlflow:3.6.0-oidc mlflow --version
Note: All Docker commands run on the remote host specified by DOCKER_HOST.
Custom Image
Dockerfile
Located at mlflow/image/Dockerfile:
FROM burakince/mlflow:3.6.0
# Install mlflow-oidc-auth plugin with filesystem session support
RUN pip install --no-cache-dir \
mlflow-oidc-auth[full]==5.6.1 \
cachelib[filesystem]
Building Custom Image
Important: Set DOCKER_HOST to build on the remote k3s host:
export DOCKER_HOST=ssh://yourhost.com
just mlflow::build-image # Build only
just mlflow::push-image # Push only (requires prior build)
just mlflow::build-and-push-image # Build and push
The image is built on the remote Docker host and pushed to the k3s local registry (localhost:30500).