Files
buun-stack/mlflow

MLflow

Open source platform for managing the end-to-end machine learning lifecycle with Keycloak OIDC authentication.

Overview

This module deploys MLflow using the Community Charts Helm chart with:

  • Keycloak OIDC authentication for user login
  • Custom Docker image with mlflow-oidc-auth plugin
  • PostgreSQL backend for tracking server and auth databases
  • MinIO/S3 artifact storage with proxied access
  • FastAPI/ASGI server with Uvicorn for production
  • HTTPS reverse proxy support via Traefik
  • Group-based access control via Keycloak groups
  • Prometheus metrics for monitoring

Prerequisites

  • Kubernetes cluster (k3s)
  • Keycloak installed and configured
  • PostgreSQL cluster (CloudNativePG)
  • MinIO object storage
  • External Secrets Operator (optional, for Vault integration)
  • Docker registry (local or remote)

Installation

Basic Installation

  1. Build and Push Custom MLflow Image:

    Set DOCKER_HOST to your remote Docker host (where k3s is running):

    export DOCKER_HOST=ssh://yourhost.com
    just mlflow::build-and-push-image
    

    This builds a custom MLflow image with OIDC auth plugin and pushes it to your k3s registry.

  2. Install MLflow:

    just mlflow::install
    

    You will be prompted for:

    • MLflow host (FQDN): e.g., mlflow.example.com

What Gets Installed

  • MLflow tracking server (FastAPI with OIDC)
  • PostgreSQL databases:
    • mlflow - Experiment tracking, models, and runs
    • mlflow_auth - User authentication and permissions
  • PostgreSQL user mlflow with access to both databases
  • MinIO bucket mlflow for artifact storage
  • Custom MLflow Docker image with OIDC auth plugin
  • Keycloak OAuth client (confidential client)
  • Keycloak groups:
    • mlflow-admins - Full administrative access
    • mlflow-users - Basic user access

Configuration

Docker Build Environment

For building and pushing the custom MLflow image:

DOCKER_HOST=ssh://yourhost.com             # Remote Docker host (where k3s is running)
IMAGE_REGISTRY=localhost:30500             # k3s local registry

Deployment Configuration

Environment variables (set in .env.local or override):

MLFLOW_NAMESPACE=mlflow                    # Kubernetes namespace
MLFLOW_CHART_VERSION=1.8.0                 # Helm chart version
MLFLOW_HOST=mlflow.example.com             # External hostname
MLFLOW_IMAGE_TAG=3.6.0-oidc                # Custom image tag
MLFLOW_IMAGE_PULL_POLICY=IfNotPresent     # Image pull policy
KEYCLOAK_HOST=auth.example.com             # Keycloak hostname
KEYCLOAK_REALM=buunstack                   # Keycloak realm name

Architecture Notes

MLflow 3.6.0 with OIDC:

  • Uses mlflow-oidc-auth[full]==5.6.1 plugin
  • FastAPI/ASGI server with Uvicorn (not Gunicorn)
  • Server type: oidc-auth-fastapi for ASGI compatibility
  • Session management: cachelib with filesystem backend
  • Custom Docker image built from burakince/mlflow:3.6.0

Authentication Flow:

  • OIDC Discovery: /.well-known/openid-configuration
  • Redirect URI: /callback (not /oidc/callback)
  • Required scopes: openid profile email groups
  • Group attribute: groups from UserInfo

Database Structure:

  • mlflow database: Experiment tracking, models, parameters, metrics
  • mlflow_auth database: User accounts, groups, permissions

Usage

Access MLflow

  1. Navigate to https://your-mlflow-host/
  2. Click "Keycloak" button to authenticate
  3. After successful login:
    • First redirect: Permissions Management UI (/oidc/ui/)
    • Click "MLflow" button: Main MLflow UI

Grant Admin Access

Add users to the mlflow-admins group:

just keycloak::add-user-to-group <username> mlflow-admins

Admin users have full privileges including:

  • Experiment and model management
  • User and permission management
  • Access to all experiments and models

Log Experiments

Using Python Client

import mlflow

# Set tracking URI
mlflow.set_tracking_uri("https://mlflow.example.com")

# Start experiment
mlflow.set_experiment("my-experiment")

# Log parameters, metrics, and artifacts
with mlflow.start_run():
    mlflow.log_param("learning_rate", 0.01)
    mlflow.log_metric("accuracy", 0.95)
    mlflow.log_artifact("model.pkl")

Authentication for API Access

For programmatic access, create an access token:

  1. Log in to MLflow UI
  2. Navigate to Permissions UI → Create access token
  3. Use token in your code:
import os
os.environ["MLFLOW_TRACKING_TOKEN"] = "your-token"

Model Registry

Register and manage models:

# Register model
mlflow.register_model(
    model_uri="runs:/<run-id>/model",
    name="my-model"
)

# Transition model stage
from mlflow.tracking import MlflowClient
client = MlflowClient()
client.transition_model_version_stage(
    name="my-model",
    version=1,
    stage="Production"
)

Features

  • Experiment Tracking: Log parameters, metrics, and artifacts
  • Model Registry: Version and manage ML models
  • Model Serving: Deploy models as REST APIs
  • Project Reproducibility: Package code, data, and environment
  • Remote Execution: Run experiments on remote platforms
  • UI Dashboard: Visual experiment comparison and analysis
  • LLM Tracking: Track LLM applications with traces
  • Prompt Registry: Manage and version prompts

Architecture

External Users
      ↓
Cloudflare Tunnel (HTTPS)
      ↓
Traefik Ingress (HTTPS)
      ↓
MLflow Server (HTTP inside cluster)
  ├─ FastAPI/ASGI (Uvicorn)
  ├─ mlflow-oidc-auth plugin
  │   ├─ OAuth → Keycloak (authentication)
  │   └─ Session → FileSystemCache
  ├─ PostgreSQL (metadata)
  │   ├─ mlflow (tracking)
  │   └─ mlflow_auth (users/groups)
  └─ MinIO (artifacts via proxied access)

Key Components:

  • Server Type: oidc-auth-fastapi for FastAPI/ASGI compatibility
  • Allowed Hosts: Validates Host header for security
  • Session Backend: Cachelib with filesystem storage
  • Artifact Storage: Proxied through MLflow server (no direct S3 access needed)

Authentication

User Login (OIDC)

  • Users authenticate via Keycloak
  • Standard OIDC flow with Authorization Code grant
  • Group membership retrieved from groups claim in UserInfo
  • Users automatically created on first login

Access Control

Group-based Permissions:

OIDC_ADMIN_GROUP_NAME = "mlflow-admins"
OIDC_GROUP_NAME = "mlflow-admins,mlflow-users"

Default Permissions:

  • New resources: MANAGE permission for creator
  • Admins: Full access to all resources
  • Users: Access based on explicit permissions

Permission Management

Access the Permissions UI at /oidc/ui/:

  • View and manage user permissions
  • Assign permissions to experiments, models, and prompts
  • Create and manage groups
  • View audit logs

Management

Rebuild Custom Image

If you need to update the custom MLflow image:

export DOCKER_HOST=ssh://yourhost.com
just mlflow::build-and-push-image

After rebuilding, restart MLflow to use the new image:

kubectl rollout restart deployment/mlflow -n mlflow

Upgrade MLflow

just mlflow::upgrade

Updates the Helm deployment with current configuration.

Uninstall

# Keep PostgreSQL databases
just mlflow::uninstall false

# Delete PostgreSQL databases and user
just mlflow::uninstall true

Clean Up All Resources

just mlflow::cleanup

Deletes databases, users, secrets, and Keycloak client (with confirmation).

Troubleshooting

Check Pod Status

kubectl get pods -n mlflow

Expected pods:

  • mlflow-* - Main application (1 replica)
  • mlflow-db-migration-* - Database migration (Completed)
  • mlflow-dbchecker-* - Database connection check (Completed)

OAuth Login Fails

Redirect Loop (Returns to Login Page)

Symptoms: User authenticates with Keycloak but returns to login page

Common Causes:

  1. Redirect URI Mismatch:

    • Check Keycloak client redirect URI matches /callback
    • Verify OIDC_REDIRECT_URI is https://{host}/callback
  2. Missing Groups Scope:

    • Ensure groups scope is added to Keycloak client
    • Check groups mapper is configured in Keycloak
  3. Group Membership:

    • User must be in mlflow-admins or mlflow-users group
    • Add user to group: just keycloak::add-user-to-group <user> mlflow-admins

Session Errors

Error: Session module for filesystem could not be imported

Solution: Ensure session configuration is correct:

SESSION_TYPE: "cachelib"
SESSION_CACHE_DIR: "/tmp/session"

Group Detection Errors

Error: Group detection error: No module named 'oidc'

Solution: Remove OIDC_GROUP_DETECTION_PLUGIN setting (should be unset or removed)

Server Type Errors

Error: TypeError: Flask.__call__() missing 1 required positional argument: 'start_response'

Cause: Using Flask server type with Uvicorn (ASGI)

Solution: Ensure appName: "oidc-auth-fastapi" in values

Database Connection Issues

Check database credentials:

kubectl get secret mlflow-db-secret -n mlflow -o yaml

Test database connectivity:

kubectl exec -n mlflow deployment/mlflow -- \
  psql -h postgres-cluster-rw.postgres -U mlflow -d mlflow -c "SELECT 1"

Artifact Storage Issues

Check MinIO credentials:

kubectl get secret mlflow-s3-secret -n mlflow -o yaml

Test MinIO connectivity:

kubectl exec -n mlflow deployment/mlflow -- \
  python -c "import boto3; import os; \
  client = boto3.client('s3', \
    endpoint_url=os.getenv('MLFLOW_S3_ENDPOINT_URL'), \
    aws_access_key_id=os.getenv('AWS_ACCESS_KEY_ID'), \
    aws_secret_access_key=os.getenv('AWS_SECRET_ACCESS_KEY')); \
  print(client.list_buckets())"

Check Logs

# Application logs
kubectl logs -n mlflow deployment/mlflow --tail=100

# Database migration logs
kubectl logs -n mlflow job/mlflow-db-migration

# Real-time logs
kubectl logs -n mlflow deployment/mlflow -f

Common Log Messages

Normal:

  • Successfully created FastAPI app with OIDC integration
  • OIDC routes, authentication, and UI should now be available
  • Session module for cachelib imported
  • Redirect URI for OIDC login: https://{host}/callback

Issues:

  • Group detection error - Check OIDC configuration
  • Authorization error: User is not allowed to login - User not in required group
  • Session error - Session configuration issue

Image Build Issues

If custom image build fails:

# Set Docker host
export DOCKER_HOST=ssh://yourhost.com

# Rebuild image manually
cd /path/to/buun-stack/mlflow
just mlflow::build-and-push-image

# Check image exists on remote host
docker images localhost:30500/mlflow:3.6.0-oidc

# Test image on remote host
docker run --rm localhost:30500/mlflow:3.6.0-oidc mlflow --version

Note: All Docker commands run on the remote host specified by DOCKER_HOST.

Custom Image

Dockerfile

Located at mlflow/image/Dockerfile:

FROM burakince/mlflow:3.6.0

# Install mlflow-oidc-auth plugin with filesystem session support
RUN pip install --no-cache-dir \
    mlflow-oidc-auth[full]==5.6.1 \
    cachelib[filesystem]

Building Custom Image

Important: Set DOCKER_HOST to build on the remote k3s host:

export DOCKER_HOST=ssh://yourhost.com

just mlflow::build-image          # Build only
just mlflow::push-image            # Push only (requires prior build)
just mlflow::build-and-push-image  # Build and push

The image is built on the remote Docker host and pushed to the k3s local registry (localhost:30500).

References