diff --git a/mlflow/.gitignore b/mlflow/.gitignore index 81821cf..ddf1c94 100644 --- a/mlflow/.gitignore +++ b/mlflow/.gitignore @@ -1,3 +1,5 @@ values.yaml mlflow-db-external-secret.yaml mlflow-s3-external-secret.yaml +mlflow-oidc-config.yaml +image/.buildx-cache diff --git a/mlflow/README.md b/mlflow/README.md new file mode 100644 index 0000000..a2405b1 --- /dev/null +++ b/mlflow/README.md @@ -0,0 +1,484 @@ +# MLflow + +Open source platform for managing the end-to-end machine learning lifecycle with Keycloak OIDC authentication. + +## Overview + +This module deploys MLflow using the Community Charts Helm chart with: + +- **Keycloak OIDC authentication** for user login +- **Custom Docker image** with mlflow-oidc-auth plugin +- **PostgreSQL backend** for tracking server and auth databases +- **MinIO/S3 artifact storage** with proxied access +- **FastAPI/ASGI server** with Uvicorn for production +- **HTTPS reverse proxy support** via Traefik +- **Group-based access control** via Keycloak groups +- **Prometheus metrics** for monitoring + +## Prerequisites + +- Kubernetes cluster (k3s) +- Keycloak installed and configured +- PostgreSQL cluster (CloudNativePG) +- MinIO object storage +- External Secrets Operator (optional, for Vault integration) +- Docker registry (local or remote) + +## Installation + +### Basic Installation + +1. **Build and Push Custom MLflow Image**: + + Set `DOCKER_HOST` to your remote Docker host (where k3s is running): + + ```bash + export DOCKER_HOST=ssh://yourhost.com + just mlflow::build-and-push-image + ``` + + This builds a custom MLflow image with OIDC auth plugin and pushes it to your k3s registry. + +2. **Install MLflow**: + + ```bash + just mlflow::install + ``` + + You will be prompted for: + + - **MLflow host (FQDN)**: e.g., `mlflow.example.com` + +### What Gets Installed + +- MLflow tracking server (FastAPI with OIDC) +- PostgreSQL databases: + - `mlflow` - Experiment tracking, models, and runs + - `mlflow_auth` - User authentication and permissions +- PostgreSQL user `mlflow` with access to both databases +- MinIO bucket `mlflow` for artifact storage +- Custom MLflow Docker image with OIDC auth plugin +- Keycloak OAuth client (confidential client) +- Keycloak groups: + - `mlflow-admins` - Full administrative access + - `mlflow-users` - Basic user access + +## Configuration + +### Docker Build Environment + +For building and pushing the custom MLflow image: + +```bash +DOCKER_HOST=ssh://yourhost.com # Remote Docker host (where k3s is running) +IMAGE_REGISTRY=localhost:30500 # k3s local registry +``` + +### Deployment Configuration + +Environment variables (set in `.env.local` or override): + +```bash +MLFLOW_NAMESPACE=mlflow # Kubernetes namespace +MLFLOW_CHART_VERSION=1.8.0 # Helm chart version +MLFLOW_HOST=mlflow.example.com # External hostname +MLFLOW_IMAGE_TAG=3.6.0-oidc # Custom image tag +MLFLOW_IMAGE_PULL_POLICY=IfNotPresent # Image pull policy +KEYCLOAK_HOST=auth.example.com # Keycloak hostname +KEYCLOAK_REALM=buunstack # Keycloak realm name +``` + +### Architecture Notes + +**MLflow 3.6.0 with OIDC**: + +- Uses `mlflow-oidc-auth[full]==5.6.1` plugin +- FastAPI/ASGI server with Uvicorn (not Gunicorn) +- Server type: `oidc-auth-fastapi` for ASGI compatibility +- Session management: `cachelib` with filesystem backend +- Custom Docker image built from `burakince/mlflow:3.6.0` + +**Authentication Flow**: + +- OIDC Discovery: `/.well-known/openid-configuration` +- Redirect URI: `/callback` (not `/oidc/callback`) +- Required scopes: `openid profile email groups` +- Group attribute: `groups` from UserInfo + +**Database Structure**: + +- `mlflow` database: Experiment tracking, models, parameters, metrics +- `mlflow_auth` database: User accounts, groups, permissions + +## Usage + +### Access MLflow + +1. Navigate to `https://your-mlflow-host/` +2. Click "Keycloak" button to authenticate +3. After successful login: + - First redirect: Permissions Management UI (`/oidc/ui/`) + - Click "MLflow" button: Main MLflow UI + +### Grant Admin Access + +Add users to the `mlflow-admins` group: + +```bash +just keycloak::add-user-to-group mlflow-admins +``` + +Admin users have full privileges including: + +- Experiment and model management +- User and permission management +- Access to all experiments and models + +### Log Experiments + +#### Using Python Client + +```python +import mlflow + +# Set tracking URI +mlflow.set_tracking_uri("https://mlflow.example.com") + +# Start experiment +mlflow.set_experiment("my-experiment") + +# Log parameters, metrics, and artifacts +with mlflow.start_run(): + mlflow.log_param("learning_rate", 0.01) + mlflow.log_metric("accuracy", 0.95) + mlflow.log_artifact("model.pkl") +``` + +#### Authentication for API Access + +For programmatic access, create an access token: + +1. Log in to MLflow UI +2. Navigate to Permissions UI → Create access token +3. Use token in your code: + +```python +import os +os.environ["MLFLOW_TRACKING_TOKEN"] = "your-token" +``` + +### Model Registry + +Register and manage models: + +```python +# Register model +mlflow.register_model( + model_uri="runs://model", + name="my-model" +) + +# Transition model stage +from mlflow.tracking import MlflowClient +client = MlflowClient() +client.transition_model_version_stage( + name="my-model", + version=1, + stage="Production" +) +``` + +## Features + +- **Experiment Tracking**: Log parameters, metrics, and artifacts +- **Model Registry**: Version and manage ML models +- **Model Serving**: Deploy models as REST APIs +- **Project Reproducibility**: Package code, data, and environment +- **Remote Execution**: Run experiments on remote platforms +- **UI Dashboard**: Visual experiment comparison and analysis +- **LLM Tracking**: Track LLM applications with traces +- **Prompt Registry**: Manage and version prompts + +## Architecture + +```plain +External Users + ↓ +Cloudflare Tunnel (HTTPS) + ↓ +Traefik Ingress (HTTPS) + ↓ +MLflow Server (HTTP inside cluster) + ├─ FastAPI/ASGI (Uvicorn) + ├─ mlflow-oidc-auth plugin + │ ├─ OAuth → Keycloak (authentication) + │ └─ Session → FileSystemCache + ├─ PostgreSQL (metadata) + │ ├─ mlflow (tracking) + │ └─ mlflow_auth (users/groups) + └─ MinIO (artifacts via proxied access) +``` + +**Key Components**: + +- **Server Type**: `oidc-auth-fastapi` for FastAPI/ASGI compatibility +- **Allowed Hosts**: Validates `Host` header for security +- **Session Backend**: Cachelib with filesystem storage +- **Artifact Storage**: Proxied through MLflow server (no direct S3 access needed) + +## Authentication + +### User Login (OIDC) + +- Users authenticate via Keycloak +- Standard OIDC flow with Authorization Code grant +- Group membership retrieved from `groups` claim in UserInfo +- Users automatically created on first login + +### Access Control + +**Group-based Permissions**: + +```python +OIDC_ADMIN_GROUP_NAME = "mlflow-admins" +OIDC_GROUP_NAME = "mlflow-admins,mlflow-users" +``` + +**Default Permissions**: + +- New resources: `MANAGE` permission for creator +- Admins: Full access to all resources +- Users: Access based on explicit permissions + +### Permission Management + +Access the Permissions UI at `/oidc/ui/`: + +- View and manage user permissions +- Assign permissions to experiments, models, and prompts +- Create and manage groups +- View audit logs + +## Management + +### Rebuild Custom Image + +If you need to update the custom MLflow image: + +```bash +export DOCKER_HOST=ssh://yourhost.com +just mlflow::build-and-push-image +``` + +After rebuilding, restart MLflow to use the new image: + +```bash +kubectl rollout restart deployment/mlflow -n mlflow +``` + +### Upgrade MLflow + +```bash +just mlflow::upgrade +``` + +Updates the Helm deployment with current configuration. + +### Uninstall + +```bash +# Keep PostgreSQL databases +just mlflow::uninstall false + +# Delete PostgreSQL databases and user +just mlflow::uninstall true +``` + +### Clean Up All Resources + +```bash +just mlflow::cleanup +``` + +Deletes databases, users, secrets, and Keycloak client (with confirmation). + +## Troubleshooting + +### Check Pod Status + +```bash +kubectl get pods -n mlflow +``` + +Expected pods: + +- `mlflow-*` - Main application (1 replica) +- `mlflow-db-migration-*` - Database migration (Completed) +- `mlflow-dbchecker-*` - Database connection check (Completed) + +### OAuth Login Fails + +#### Redirect Loop (Returns to Login Page) + +**Symptoms**: User authenticates with Keycloak but returns to login page + +**Common Causes**: + +1. **Redirect URI Mismatch**: + - Check Keycloak client redirect URI matches `/callback` + - Verify `OIDC_REDIRECT_URI` is `https://{host}/callback` + +2. **Missing Groups Scope**: + - Ensure `groups` scope is added to Keycloak client + - Check groups mapper is configured in Keycloak + +3. **Group Membership**: + - User must be in `mlflow-admins` or `mlflow-users` group + - Add user to group: `just keycloak::add-user-to-group mlflow-admins` + +#### Session Errors + +**Error**: `Session module for filesystem could not be imported` + +**Solution**: Ensure session configuration is correct: + +```yaml +SESSION_TYPE: "cachelib" +SESSION_CACHE_DIR: "/tmp/session" +``` + +#### Group Detection Errors + +**Error**: `Group detection error: No module named 'oidc'` + +**Solution**: Remove `OIDC_GROUP_DETECTION_PLUGIN` setting (should be unset or removed) + +### Server Type Errors + +**Error**: `TypeError: Flask.__call__() missing 1 required positional argument: 'start_response'` + +**Cause**: Using Flask server type with Uvicorn (ASGI) + +**Solution**: Ensure `appName: "oidc-auth-fastapi"` in values + +### Database Connection Issues + +Check database credentials: + +```bash +kubectl get secret mlflow-db-secret -n mlflow -o yaml +``` + +Test database connectivity: + +```bash +kubectl exec -n mlflow deployment/mlflow -- \ + psql -h postgres-cluster-rw.postgres -U mlflow -d mlflow -c "SELECT 1" +``` + +### Artifact Storage Issues + +Check MinIO credentials: + +```bash +kubectl get secret mlflow-s3-secret -n mlflow -o yaml +``` + +Test MinIO connectivity: + +```bash +kubectl exec -n mlflow deployment/mlflow -- \ + python -c "import boto3; import os; \ + client = boto3.client('s3', \ + endpoint_url=os.getenv('MLFLOW_S3_ENDPOINT_URL'), \ + aws_access_key_id=os.getenv('AWS_ACCESS_KEY_ID'), \ + aws_secret_access_key=os.getenv('AWS_SECRET_ACCESS_KEY')); \ + print(client.list_buckets())" +``` + +### Check Logs + +```bash +# Application logs +kubectl logs -n mlflow deployment/mlflow --tail=100 + +# Database migration logs +kubectl logs -n mlflow job/mlflow-db-migration + +# Real-time logs +kubectl logs -n mlflow deployment/mlflow -f +``` + +### Common Log Messages + +**Normal**: + +- `Successfully created FastAPI app with OIDC integration` +- `OIDC routes, authentication, and UI should now be available` +- `Session module for cachelib imported` +- `Redirect URI for OIDC login: https://{host}/callback` + +**Issues**: + +- `Group detection error` - Check OIDC configuration +- `Authorization error: User is not allowed to login` - User not in required group +- `Session error` - Session configuration issue + +### Image Build Issues + +If custom image build fails: + +```bash +# Set Docker host +export DOCKER_HOST=ssh://yourhost.com + +# Rebuild image manually +cd /path/to/buun-stack/mlflow +just mlflow::build-and-push-image + +# Check image exists on remote host +docker images localhost:30500/mlflow:3.6.0-oidc + +# Test image on remote host +docker run --rm localhost:30500/mlflow:3.6.0-oidc mlflow --version +``` + +**Note**: All Docker commands run on the remote host specified by `DOCKER_HOST`. + +## Custom Image + +### Dockerfile + +Located at `mlflow/image/Dockerfile`: + +```dockerfile +FROM burakince/mlflow:3.6.0 + +# Install mlflow-oidc-auth plugin with filesystem session support +RUN pip install --no-cache-dir \ + mlflow-oidc-auth[full]==5.6.1 \ + cachelib[filesystem] +``` + +### Building Custom Image + +**Important**: Set `DOCKER_HOST` to build on the remote k3s host: + +```bash +export DOCKER_HOST=ssh://yourhost.com + +just mlflow::build-image # Build only +just mlflow::push-image # Push only (requires prior build) +just mlflow::build-and-push-image # Build and push +``` + +The image is built on the remote Docker host and pushed to the k3s local registry (`localhost:30500`). + +## References + +- [MLflow Documentation](https://mlflow.org/docs/latest/index.html) +- [MLflow GitHub](https://github.com/mlflow/mlflow) +- [mlflow-oidc-auth Plugin](https://github.com/mlflow-oidc/mlflow-oidc-auth) +- [mlflow-oidc-auth Documentation](https://mlflow-oidc.github.io/mlflow-oidc-auth/) +- [Community Charts MLflow](https://github.com/community-charts/helm-charts/tree/main/charts/mlflow) +- [Keycloak OIDC](https://www.keycloak.org/docs/latest/securing_apps/#_oidc) diff --git a/mlflow/image/Dockerfile b/mlflow/image/Dockerfile new file mode 100644 index 0000000..d2d2eb1 --- /dev/null +++ b/mlflow/image/Dockerfile @@ -0,0 +1,8 @@ +FROM burakince/mlflow:3.6.0 + +# Install mlflow-oidc-auth plugin with filesystem session support +RUN pip install --no-cache-dir \ + mlflow-oidc-auth[full]==5.6.1 \ + cachelib[filesystem] + +# Keep the original entrypoint diff --git a/mlflow/justfile b/mlflow/justfile index 1180916..3df5153 100644 --- a/mlflow/justfile +++ b/mlflow/justfile @@ -3,11 +3,18 @@ set fallback := true export MLFLOW_NAMESPACE := env("MLFLOW_NAMESPACE", "mlflow") export MLFLOW_CHART_VERSION := env("MLFLOW_CHART_VERSION", "1.8.0") export MLFLOW_HOST := env("MLFLOW_HOST", "") +export IMAGE_REGISTRY := env("IMAGE_REGISTRY", "localhost:30500") +export MLFLOW_IMAGE_TAG := env("MLFLOW_IMAGE_TAG", "3.6.0-oidc") +export MLFLOW_IMAGE_PULL_POLICY := env("MLFLOW_IMAGE_PULL_POLICY", "IfNotPresent") +export MLFLOW_OIDC_ENABLED := env("MLFLOW_OIDC_ENABLED", "true") export POSTGRES_NAMESPACE := env("POSTGRES_NAMESPACE", "postgres") export MINIO_NAMESPACE := env("MINIO_NAMESPACE", "minio") export EXTERNAL_SECRETS_NAMESPACE := env("EXTERNAL_SECRETS_NAMESPACE", "external-secrets") export K8S_VAULT_NAMESPACE := env("K8S_VAULT_NAMESPACE", "vault") +export MONITORING_ENABLED := env("MONITORING_ENABLED", "") export PROMETHEUS_NAMESPACE := env("PROMETHEUS_NAMESPACE", "monitoring") +export KEYCLOAK_REALM := env("KEYCLOAK_REALM", "buunstack") +export KEYCLOAK_HOST := env("KEYCLOAK_HOST", "") [private] default: @@ -22,6 +29,26 @@ add-helm-repo: remove-helm-repo: helm repo remove community-charts +# Build custom MLflow image with OIDC auth plugin +build-image: + #!/bin/bash + set -euo pipefail + echo "Building MLflow image with OIDC auth plugin..." + cd image + docker build -t ${IMAGE_REGISTRY}/mlflow:${MLFLOW_IMAGE_TAG} . + echo "Image built: ${IMAGE_REGISTRY}/mlflow:${MLFLOW_IMAGE_TAG}" + +# Push custom MLflow image to registry +push-image: + #!/bin/bash + set -euo pipefail + echo "Pushing MLflow image to registry..." + docker push ${IMAGE_REGISTRY}/mlflow:${MLFLOW_IMAGE_TAG} + echo "Image pushed: ${IMAGE_REGISTRY}/mlflow:${MLFLOW_IMAGE_TAG}" + +# Build and push custom MLflow image +build-and-push-image: build-image push-image + # Create namespace create-namespace: @kubectl get namespace ${MLFLOW_NAMESPACE} &>/dev/null || \ @@ -68,6 +95,16 @@ setup-postgres-db: echo "Ensuring database permissions..." just postgres::grant mlflow mlflow + # Create mlflow_auth database for OIDC user management + if just postgres::db-exists mlflow_auth &>/dev/null; then + echo "Database 'mlflow_auth' already exists." + else + echo "Creating new database 'mlflow_auth' for OIDC authentication..." + just postgres::create-db mlflow_auth + fi + echo "Granting permissions on mlflow_auth to mlflow user..." + just postgres::grant mlflow_auth mlflow + if helm status external-secrets -n ${EXTERNAL_SECRETS_NAMESPACE} &>/dev/null; then echo "External Secrets available. Storing credentials in Vault..." just vault::put mlflow/postgres username=mlflow password="${db_password}" @@ -173,7 +210,7 @@ delete-s3-secret: @kubectl delete externalsecret mlflow-s3-external-secret -n ${MLFLOW_NAMESPACE} --ignore-not-found # Install MLflow -install: check-env +install: #!/bin/bash set -euo pipefail echo "Installing MLflow..." @@ -191,11 +228,30 @@ install: check-env exit 1 fi + if [ -z "${MLFLOW_HOST}" ]; then + while [ -z "${MLFLOW_HOST}" ]; do + MLFLOW_HOST=$( + gum input --prompt="MLflow host (FQDN): " --width=100 \ + --placeholder="e.g., mlflow.example.com" + ) + done + fi + if helm status kube-prometheus-stack -n ${PROMETHEUS_NAMESPACE} &>/dev/null; then + if [ -z "${MONITORING_ENABLED}" ]; then + if gum confirm "Enable Prometheus monitoring (ServiceMonitor)?"; then + MONITORING_ENABLED="true" + else + MONITORING_ENABLED="false" + fi + fi + else + MONITORING_ENABLED="false" + fi + just setup-postgres-db just create-db-secret just create-s3-secret - # Create mlflow bucket in MinIO if it doesn't exist if ! just minio::bucket-exists mlflow; then echo "Creating 'mlflow' bucket in MinIO..." just minio::create-bucket mlflow @@ -205,10 +261,81 @@ install: check-env just add-helm-repo - echo "Generating Helm values..." + just keycloak::delete-client "${KEYCLOAK_REALM}" "mlflow" || true + oidc_client_secret=$(just utils::random-password) + redirect_urls="https://${MLFLOW_HOST}/callback" + just keycloak::create-client \ + realm="${KEYCLOAK_REALM}" \ + client_id="mlflow" \ + redirect_url="${redirect_urls}" \ + client_secret="${oidc_client_secret}" + echo "✓ Keycloak client 'mlflow' created" + + if ! just keycloak::get-client-scope "${KEYCLOAK_REALM}" groups &>/dev/null; then + just keycloak::create-client-scope "${KEYCLOAK_REALM}" groups "User group memberships" + just keycloak::add-groups-mapper-to-scope "${KEYCLOAK_REALM}" groups + echo "✓ Groups client scope created" + else + echo "✓ Groups client scope already exists" + fi + just keycloak::add-scope-to-client "${KEYCLOAK_REALM}" mlflow groups + echo "✓ Groups scope added to mlflow client" + + echo "Setting up MLflow groups..." + just keycloak::create-group mlflow-admins "" "MLflow administrators with full access" || true + just keycloak::create-group mlflow-users "" "MLflow users with basic access" || true + echo "✓ MLflow groups configured" + + if helm status external-secrets -n ${EXTERNAL_SECRETS_NAMESPACE} &>/dev/null; then + echo "External Secrets Operator detected. Storing OIDC config in Vault..." + + # Get PostgreSQL credentials for auth database + db_username=$(just vault::get mlflow/postgres username) + db_password=$(just vault::get mlflow/postgres password) + auth_db_uri="postgresql://${db_username}:${db_password}@postgres-cluster-rw.${POSTGRES_NAMESPACE}.svc.cluster.local:5432/mlflow_auth" + + just vault::put "mlflow/oidc" \ + client_id="mlflow" \ + client_secret="${oidc_client_secret}" \ + auth_db_uri="${auth_db_uri}" + + kubectl delete secret mlflow-oidc-config -n ${MLFLOW_NAMESPACE} --ignore-not-found + kubectl delete externalsecret mlflow-oidc-external-secret -n ${MLFLOW_NAMESPACE} --ignore-not-found + + export OIDC_CLIENT_SECRET="${oidc_client_secret}" + gomplate -f mlflow-oidc-external-secret.gomplate.yaml | kubectl apply -f - + + echo "Waiting for ExternalSecret to sync..." + kubectl wait --for=condition=Ready externalsecret/mlflow-oidc-external-secret \ + -n ${MLFLOW_NAMESPACE} --timeout=60s + else + echo "Creating Kubernetes secret directly..." + + # Get PostgreSQL credentials for auth database + db_username=$(just vault::get mlflow/postgres username 2>/dev/null || echo "mlflow") + db_password=$(just vault::get mlflow/postgres password) + auth_db_uri="postgresql://${db_username}:${db_password}@postgres-cluster-rw.${POSTGRES_NAMESPACE}.svc.cluster.local:5432/mlflow_auth" + + kubectl delete secret mlflow-oidc-config -n ${MLFLOW_NAMESPACE} --ignore-not-found + kubectl create secret generic mlflow-oidc-config -n ${MLFLOW_NAMESPACE} \ + --from-literal=OIDC_CLIENT_ID="mlflow" \ + --from-literal=OIDC_CLIENT_SECRET="${oidc_client_secret}" \ + --from-literal=OIDC_USERS_DB_URI="${auth_db_uri}" + + # Store in Vault for backup if available + if helm status vault -n ${K8S_VAULT_NAMESPACE} &>/dev/null; then + just vault::put "mlflow/oidc" \ + client_id="mlflow" \ + client_secret="${oidc_client_secret}" \ + auth_db_uri="${auth_db_uri}" + fi + fi + + export MLFLOW_OIDC_ENABLED="true" + echo "Generating Helm values with OIDC enabled..." gomplate -f values.gomplate.yaml -o values.yaml - echo "Installing MLflow Helm chart from Community Charts..." + echo "Installing MLflow Helm chart from Community Charts with OIDC..." helm upgrade --cleanup-on-fail --install mlflow \ community-charts/mlflow \ --version ${MLFLOW_CHART_VERSION} \ @@ -218,18 +345,35 @@ install: check-env -f values.yaml echo "" - echo "=== MLflow installed ===" + echo "=== MLflow installed with OIDC authentication ===" echo "MLflow URL: https://${MLFLOW_HOST}" echo "" - echo "Next steps:" - echo " 1. Configure OAuth2 Proxy for authentication (recommended)" - echo " 2. Access MLflow UI at https://${MLFLOW_HOST}" + echo "OIDC authentication is enabled using Keycloak" + echo "Users can sign in with their Keycloak credentials" # Upgrade MLflow -upgrade: check-env +upgrade: #!/bin/bash set -euo pipefail - echo "Upgrading MLflow..." + if [ -z "${MLFLOW_HOST}" ]; then + while [ -z "${MLFLOW_HOST}" ]; do + MLFLOW_HOST=$( + gum input --prompt="MLflow host (FQDN): " --width=100 \ + --placeholder="e.g., mlflow.example.com" + ) + done + fi + if helm status kube-prometheus-stack -n ${PROMETHEUS_NAMESPACE} &>/dev/null; then + if [ -z "${MONITORING_ENABLED}" ]; then + if gum confirm "Enable Prometheus monitoring (ServiceMonitor)?"; then + MONITORING_ENABLED="true" + else + MONITORING_ENABLED="false" + fi + fi + else + MONITORING_ENABLED="false" + fi echo "Generating Helm values..." gomplate -f values.gomplate.yaml -o values.yaml @@ -254,11 +398,14 @@ uninstall delete-db='true': helm uninstall mlflow -n ${MLFLOW_NAMESPACE} --ignore-not-found just delete-db-secret just delete-s3-secret + kubectl delete secret mlflow-oidc-config -n ${MLFLOW_NAMESPACE} --ignore-not-found + kubectl delete externalsecret mlflow-oidc-external-secret -n ${MLFLOW_NAMESPACE} --ignore-not-found just delete-namespace if [ "{{ delete-db }}" = "true" ]; then just postgres::delete-db mlflow || true just postgres::delete-user mlflow || true fi + just keycloak::delete-client "${KEYCLOAK_REALM}" "mlflow" || true echo "MLflow uninstalled" # Clean up all MLflow resources @@ -272,22 +419,9 @@ cleanup: just postgres::delete-user mlflow || true just vault::delete mlflow/postgres || true just vault::delete mlflow/s3 || true + just vault::delete mlflow/oidc || true + just keycloak::delete-client "${KEYCLOAK_REALM}" "mlflow" || true echo "Cleanup completed" else echo "Cleanup cancelled" fi - -# Check the environment -[private] -check-env: - #!/bin/bash - set -euo pipefail - if [ -z "${MLFLOW_HOST}" ]; then - while [ -z "${MLFLOW_HOST}" ]; do - MLFLOW_HOST=$( - gum input --prompt="MLflow host (FQDN): " --width=100 \ - --placeholder="e.g., mlflow.example.com" - ) - done - just env::set MLFLOW_HOST="${MLFLOW_HOST}" - fi diff --git a/mlflow/mlflow-oidc-external-secret.gomplate.yaml b/mlflow/mlflow-oidc-external-secret.gomplate.yaml new file mode 100644 index 0000000..720f5fc --- /dev/null +++ b/mlflow/mlflow-oidc-external-secret.gomplate.yaml @@ -0,0 +1,27 @@ +--- +apiVersion: external-secrets.io/v1 +kind: ExternalSecret +metadata: + name: mlflow-oidc-external-secret + namespace: {{ .Env.MLFLOW_NAMESPACE }} +spec: + refreshInterval: 1h + secretStoreRef: + name: vault-secret-store + kind: ClusterSecretStore + target: + name: mlflow-oidc-config + creationPolicy: Owner + data: + - secretKey: OIDC_CLIENT_ID + remoteRef: + key: mlflow/oidc + property: client_id + - secretKey: OIDC_CLIENT_SECRET + remoteRef: + key: mlflow/oidc + property: client_secret + - secretKey: OIDC_USERS_DB_URI + remoteRef: + key: mlflow/oidc + property: auth_db_uri diff --git a/mlflow/values.gomplate.yaml b/mlflow/values.gomplate.yaml index ace96d2..686a786 100644 --- a/mlflow/values.gomplate.yaml +++ b/mlflow/values.gomplate.yaml @@ -2,11 +2,18 @@ # Replica count replicaCount: 1 -# Image configuration (Community Charts uses burakince/mlflow) +# Image configuration +{{- if eq (.Env.MLFLOW_OIDC_ENABLED | default "false") "true" }} +image: + repository: {{ .Env.IMAGE_REGISTRY }}/mlflow + pullPolicy: {{ .Env.MLFLOW_IMAGE_PULL_POLICY }} + tag: "{{ .Env.MLFLOW_IMAGE_TAG }}" # Custom MLflow with OIDC +{{- else }} image: repository: burakince/mlflow - pullPolicy: IfNotPresent + pullPolicy: {{ .Env.MLFLOW_IMAGE_PULL_POLICY }} tag: "3.6.0" # MLflow 3.6.0 +{{- end }} # Backend store configuration (PostgreSQL) backendStore: @@ -44,12 +51,49 @@ artifactRoot: keyOfAccessKeyId: "AWS_ACCESS_KEY_ID" keyOfSecretAccessKey: "AWS_SECRET_ACCESS_KEY" +{{- if eq (.Env.MLFLOW_OIDC_ENABLED | default "false") "true" }} +# Disable MLflow logging to prevent gunicornOpts auto-injection +log: + enabled: false + +# A map of arguments to pass to the `mlflow server` command (OIDC enabled) +# Use oidc-auth-fastapi for FastAPI/ASGI compatibility with Uvicorn +extraArgs: + appName: "oidc-auth-fastapi" + # Allow connections from external hostname (with and without port) + allowedHosts: "{{ .Env.MLFLOW_HOST }},{{ .Env.MLFLOW_HOST }}:443" + +# Extra secrets for OIDC configuration +extraSecretNamesForEnvFrom: + - mlflow-oidc-config + +# Extra environment variables for OIDC and S3/MinIO configuration +extraEnvVars: + MLFLOW_S3_ENDPOINT_URL: "http://minio.{{ .Env.MINIO_NAMESPACE }}.svc.cluster.local:9000" + MLFLOW_S3_IGNORE_TLS: "true" + # OIDC Configuration - mlflow-oidc-auth uses OIDC Discovery + OIDC_DISCOVERY_URL: "https://{{ .Env.KEYCLOAK_HOST }}/realms/{{ .Env.KEYCLOAK_REALM }}/.well-known/openid-configuration" + OIDC_REDIRECT_URI: "https://{{ .Env.MLFLOW_HOST }}/callback" + OIDC_SCOPE: "openid profile email groups" + OIDC_PROVIDER_DISPLAY_NAME: "Keycloak" + # OIDC attribute mapping + OIDC_GROUPS_ATTRIBUTE: "groups" + # Group configuration - required for access control + OIDC_ADMIN_GROUP_NAME: "mlflow-admins" + OIDC_GROUP_NAME: "mlflow-admins,mlflow-users" + # Default permission for new resources + DEFAULT_MLFLOW_PERMISSION: "MANAGE" + # Session configuration - use cachelib with filesystem backend + SESSION_TYPE: "cachelib" + SESSION_CACHE_DIR: "/tmp/session" +{{- else }} # Extra environment variables for S3/MinIO configuration extraEnvVars: MLFLOW_S3_ENDPOINT_URL: "http://minio.{{ .Env.MINIO_NAMESPACE }}.svc.cluster.local:9000" MLFLOW_S3_IGNORE_TLS: "true" # Disable security middleware when using Gunicorn (env var approach) MLFLOW_SERVER_DISABLE_SECURITY_MIDDLEWARE: "true" +{{- end }} # Service configuration service: @@ -73,7 +117,7 @@ ingress: # ServiceMonitor for Prometheus serviceMonitor: - enabled: true + enabled: {{ .Env.MONITORING_ENABLED }} useServicePort: false namespace: "{{ .Env.PROMETHEUS_NAMESPACE }}" interval: 30s