diff --git a/README.md b/README.md index 99806a1..6b8b499 100644 --- a/README.md +++ b/README.md @@ -40,6 +40,7 @@ A remotely accessible Kubernetes home lab with OIDC authentication. Build a mode ### Data & Analytics (Optional) - **[JupyterHub](https://jupyter.org/hub)**: Interactive computing with collaborative notebooks +- **[MLflow](https://mlflow.org/)**: Machine learning lifecycle management with experiment tracking and model registry - **[Trino](https://trino.io/)**: Distributed SQL query engine for querying multiple data sources - **[Querybook](https://www.querybook.org/)**: Big data querying UI with notebook interface - **[ClickHouse](https://clickhouse.com/)**: High-performance columnar analytics database @@ -170,6 +171,16 @@ Multi-user platform for interactive computing: [📖 See JupyterHub Documentation](./jupyterhub/README.md) +### MLflow + +Machine learning lifecycle management platform: + +- **Experiment Tracking**: Log parameters, metrics, and artifacts for ML experiments +- **Model Registry**: Version and manage ML models with deployment lifecycle +- **Keycloak Authentication**: OAuth2 integration with group-based access control + +[📖 See MLflow Documentation](./mlflow/README.md) + ### Apache Superset Modern business intelligence platform: @@ -376,6 +387,7 @@ kubectl --context yourpc-oidc get nodes # Metabase: https://metabase.yourdomain.com # Airflow: https://airflow.yourdomain.com # JupyterHub: https://jupyter.yourdomain.com +# MLflow: https://mlflow.yourdomain.com ``` ## Customization diff --git a/mlflow/README.md b/mlflow/README.md index 1522a71..9587c9f 100644 --- a/mlflow/README.md +++ b/mlflow/README.md @@ -15,6 +15,8 @@ This module deploys MLflow using the Community Charts Helm chart with: - **Group-based access control** via Keycloak groups - **Prometheus metrics** for monitoring +> **⚠️ Authentication Note**: This deployment uses `mlflow-oidc-auth` which replaces MLflow's standard authentication. For programmatic access, use HTTP Basic Auth with `MLFLOW_TRACKING_USERNAME` (full email) and `MLFLOW_TRACKING_PASSWORD` (access token from UI). See [Authentication for API Access](#authentication-for-api-access) for details. + ## Prerequisites - Kubernetes cluster (k3s) @@ -156,9 +158,13 @@ with mlflow.start_run(): #### Authentication for API Access -For programmatic access (Python scripts, notebooks, CI/CD), you need to create an access key. +**IMPORTANT**: mlflow-oidc-auth replaces MLflow's standard token authentication system entirely. The "tokens" created in the Web UI are actually passwords for HTTP Basic Authentication, not Bearer tokens. -**Step 1: Create Access Key via Web UI** +For programmatic access (Python scripts, notebooks, CI/CD), use one of the following methods: + +##### Method 1: HTTP Basic Authentication with Access Token (Recommended) + +**Step 1: Create Access Token via Web UI** 1. Navigate to `https://your-mlflow-host/` and log in via Keycloak 2. You will be redirected to the MLflow Permission Manager UI @@ -166,25 +172,22 @@ For programmatic access (Python scripts, notebooks, CI/CD), you need to create a 4. In the dialog that appears: - Select an expiration date (maximum 1 year from today) - Click **"Request Token"** -5. Copy the generated access key from the "Access Key" field +5. Copy the generated access token (e.g., `PRI6u33USGwyxlzYqWzVwPrG`) 6. Store it securely (you won't be able to retrieve it again) -**Step 2: Use Access Key in Python** +**Step 2: Use Access Token in Python** -Set the access key as an environment variable or in your Python code: +The access token is used as a **password** with HTTP Basic Authentication. Your username must be your **full email address** (e.g., `user@domain.com`): ```python import os import mlflow -# Method 1: Set environment variable (recommended) -os.environ["MLFLOW_TRACKING_TOKEN"] = "your-access-key-here" -os.environ["MLFLOW_TRACKING_URI"] = "https://mlflow.example.com" +# IMPORTANT: Username must be your full email address (as registered in Keycloak) +os.environ["MLFLOW_TRACKING_USERNAME"] = "user@domain.com" +os.environ["MLFLOW_TRACKING_PASSWORD"] = "your-access-token-here" # Token from Web UI -# Method 2: Set tracking URI directly mlflow.set_tracking_uri("https://mlflow.example.com") - -# Now you can use MLflow client mlflow.set_experiment("my-experiment") with mlflow.start_run(): @@ -203,8 +206,11 @@ from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score -# Configure MLflow -os.environ["MLFLOW_TRACKING_TOKEN"] = "your-access-key-here" +# Configure MLflow authentication +# Username MUST be your full email address (e.g., user@domain.com) +os.environ["MLFLOW_TRACKING_USERNAME"] = "user@domain.com" +os.environ["MLFLOW_TRACKING_PASSWORD"] = "your-access-token-here" + mlflow.set_tracking_uri("https://mlflow.example.com") mlflow.set_experiment("iris-classification") @@ -229,8 +235,9 @@ with mlflow.start_run(): accuracy = accuracy_score(y_test, y_pred) mlflow.log_metric("accuracy", accuracy) - # Log model - mlflow.sklearn.log_model(clf, "model") + # Log model with input example for signature inference + input_example = X_train[:5] + mlflow.sklearn.log_model(sk_model=clf, name="model", input_example=input_example) print(f"Model logged with accuracy: {accuracy}") ``` @@ -241,7 +248,8 @@ Create a `.env` file in your project: ```bash MLFLOW_TRACKING_URI=https://mlflow.example.com -MLFLOW_TRACKING_TOKEN=your-access-key-here +MLFLOW_TRACKING_USERNAME=user@domain.com +MLFLOW_TRACKING_PASSWORD=your-access-token-here ``` Load it in your Python code: @@ -250,20 +258,52 @@ Load it in your Python code: from dotenv import load_dotenv import mlflow -load_dotenv() # Loads MLFLOW_TRACKING_URI and MLFLOW_TRACKING_TOKEN +load_dotenv() # Loads credentials from .env file mlflow.set_experiment("my-experiment") with mlflow.start_run(): mlflow.log_param("param1", 5) ``` +##### Method 2: JWT Bearer Token from Keycloak + +For advanced use cases, you can obtain a JWT token directly from Keycloak: + +```python +import os +import requests +import mlflow + +# Get JWT token from Keycloak +token_response = requests.post( + "https://auth.example.com/realms/buunstack/protocol/openid-connect/token", + data={ + 'grant_type': 'password', + 'client_id': 'mlflow', + 'client_secret': 'your-client-secret', # From Vault + 'username': 'user@domain.com', + 'password': 'your-keycloak-password', + 'scope': 'openid profile email groups' + }, + verify=False +) + +access_token = token_response.json()['access_token'] +os.environ["MLFLOW_TRACKING_TOKEN"] = access_token + +mlflow.set_tracking_uri("https://mlflow.example.com") +mlflow.set_experiment("my-experiment") +``` + **Important Notes** -- Access keys have an expiration date (max 1 year) -- Store access keys securely (use environment variables or secret management) -- Never commit access keys to version control -- Each user should create their own access key -- Expired keys need to be regenerated via the Web UI +- **Username format**: Must be your full email address (e.g., `user@domain.com`), not just the username +- **Access tokens expire**: Maximum lifetime is 1 year, needs regeneration via Web UI +- **Token is a password**: The Web UI "token" is used with Basic Auth, not as a Bearer token +- **MLflow standard tokens don't work**: mlflow-oidc-auth replaces MLflow's built-in authentication +- **Security**: Store credentials in environment variables or secret management systems +- **Never commit**: Don't commit credentials to version control +- **Per-user tokens**: Each user should create and use their own access token ### Model Registry @@ -326,12 +366,15 @@ MLflow Server (HTTP inside cluster) ## Authentication +**IMPORTANT**: This MLflow deployment uses `mlflow-oidc-auth` plugin, which replaces MLflow's standard authentication system. MLflow's built-in token authentication does not work with this setup. + ### User Login (OIDC) - Users authenticate via Keycloak - Standard OIDC flow with Authorization Code grant - Group membership retrieved from `groups` claim in UserInfo - Users automatically created on first login +- Username is stored as full email address (e.g., `user@domain.com`) ### Access Control