docs(mlflow): write about MLflow

2025-11-09 22:37:21 +09:00
parent 96e60124e2
commit 36e3ee685d
2 changed files with 77 additions and 22 deletions
--- a/README.md
+++ b/README.md
@@ -40,6 +40,7 @@ A remotely accessible Kubernetes home lab with OIDC authentication. Build a mode
 ### Data & Analytics (Optional)
 - **[JupyterHub](https://jupyter.org/hub)**: Interactive computing with collaborative notebooks
 - **[MLflow](https://mlflow.org/)**: Machine learning lifecycle management with experiment tracking and model registry
 - **[Trino](https://trino.io/)**: Distributed SQL query engine for querying multiple data sources
 - **[Querybook](https://www.querybook.org/)**: Big data querying UI with notebook interface
 - **[ClickHouse](https://clickhouse.com/)**: High-performance columnar analytics database
@@ -170,6 +171,16 @@ Multi-user platform for interactive computing:
 [📖 See JupyterHub Documentation](./jupyterhub/README.md)
 ### MLflow
 Machine learning lifecycle management platform:
 - **Experiment Tracking**: Log parameters, metrics, and artifacts for ML experiments
 - **Model Registry**: Version and manage ML models with deployment lifecycle
 - **Keycloak Authentication**: OAuth2 integration with group-based access control
 [📖 See MLflow Documentation](./mlflow/README.md)
 ### Apache Superset
 Modern business intelligence platform:
@@ -376,6 +387,7 @@ kubectl --context yourpc-oidc get nodes
 # Metabase: https://metabase.yourdomain.com
 # Airflow: https://airflow.yourdomain.com
 # JupyterHub: https://jupyter.yourdomain.com
 # MLflow: https://mlflow.yourdomain.com
 ```
 ## Customization
--- a/mlflow/README.md
+++ b/mlflow/README.md
@@ -15,6 +15,8 @@ This module deploys MLflow using the Community Charts Helm chart with:
 - **Group-based access control** via Keycloak groups
 - **Prometheus metrics** for monitoring
 > **⚠️ Authentication Note**: This deployment uses `mlflow-oidc-auth` which replaces MLflow's standard authentication. For programmatic access, use HTTP Basic Auth with `MLFLOW_TRACKING_USERNAME` (full email) and `MLFLOW_TRACKING_PASSWORD` (access token from UI). See [Authentication for API Access](#authentication-for-api-access) for details.
 ## Prerequisites
 - Kubernetes cluster (k3s)
@@ -156,9 +158,13 @@ with mlflow.start_run():
 #### Authentication for API Access
-For programmatic access (Python scripts, notebooks, CI/CD), you need to create an access key.
+**IMPORTANT**: mlflow-oidc-auth replaces MLflow's standard token authentication system entirely. The "tokens" created in the Web UI are actually passwords for HTTP Basic Authentication, not Bearer tokens.
-**Step 1: Create Access Key via Web UI**
+For programmatic access (Python scripts, notebooks, CI/CD), use one of the following methods:
 ##### Method 1: HTTP Basic Authentication with Access Token (Recommended)
 **Step 1: Create Access Token via Web UI**
 1. Navigate to `https://your-mlflow-host/` and log in via Keycloak
 2. You will be redirected to the MLflow Permission Manager UI
@@ -166,25 +172,22 @@ For programmatic access (Python scripts, notebooks, CI/CD), you need to create a
 4. In the dialog that appears:
   - Select an expiration date (maximum 1 year from today)
   - Click **"Request Token"**
-5. Copy the generated access key from the "Access Key" field
+5. Copy the generated access token (e.g., `PRI6u33USGwyxlzYqWzVwPrG`)
 6. Store it securely (you won't be able to retrieve it again)
-**Step 2: Use Access Key in Python**
+**Step 2: Use Access Token in Python**
-Set the access key as an environment variable or in your Python code:
+The access token is used as a **password** with HTTP Basic Authentication. Your username must be your **full email address** (e.g., `user@domain.com`):
 ```python
 import os
 import mlflow
-# Method 1: Set environment variable (recommended)
+# IMPORTANT: Username must be your full email address (as registered in Keycloak)
-os.environ["MLFLOW_TRACKING_TOKEN"] = "your-access-key-here"
+os.environ["MLFLOW_TRACKING_USERNAME"] = "user@domain.com"
-os.environ["MLFLOW_TRACKING_URI"] = "https://mlflow.example.com"
+os.environ["MLFLOW_TRACKING_PASSWORD"] = "your-access-token-here"  # Token from Web UI
 # Method 2: Set tracking URI directly
 mlflow.set_tracking_uri("https://mlflow.example.com")
 # Now you can use MLflow client
 mlflow.set_experiment("my-experiment")
 with mlflow.start_run():
@@ -203,8 +206,11 @@ from sklearn.datasets import load_iris
 from sklearn.model_selection import train_test_split
 from sklearn.metrics import accuracy_score
-# Configure MLflow
+# Configure MLflow authentication
-os.environ["MLFLOW_TRACKING_TOKEN"] = "your-access-key-here"
+# Username MUST be your full email address (e.g., user@domain.com)
 os.environ["MLFLOW_TRACKING_USERNAME"] = "user@domain.com"
 os.environ["MLFLOW_TRACKING_PASSWORD"] = "your-access-token-here"
 mlflow.set_tracking_uri("https://mlflow.example.com")
 mlflow.set_experiment("iris-classification")
@@ -229,8 +235,9 @@ with mlflow.start_run():
    accuracy = accuracy_score(y_test, y_pred)
    mlflow.log_metric("accuracy", accuracy)
-    # Log model
+    # Log model with input example for signature inference
-    mlflow.sklearn.log_model(clf, "model")
+    input_example = X_train[:5]
    mlflow.sklearn.log_model(sk_model=clf, name="model", input_example=input_example)
    print(f"Model logged with accuracy: {accuracy}")
 ```
@@ -241,7 +248,8 @@ Create a `.env` file in your project:
 ```bash
 MLFLOW_TRACKING_URI=https://mlflow.example.com
-MLFLOW_TRACKING_TOKEN=your-access-key-here
+MLFLOW_TRACKING_USERNAME=user@domain.com
 MLFLOW_TRACKING_PASSWORD=your-access-token-here
 ```
 Load it in your Python code:
@@ -250,20 +258,52 @@ Load it in your Python code:
 from dotenv import load_dotenv
 import mlflow
-load_dotenv()  # Loads MLFLOW_TRACKING_URI and MLFLOW_TRACKING_TOKEN
+load_dotenv()  # Loads credentials from .env file
 mlflow.set_experiment("my-experiment")
 with mlflow.start_run():
    mlflow.log_param("param1", 5)
 ```
 ##### Method 2: JWT Bearer Token from Keycloak
 For advanced use cases, you can obtain a JWT token directly from Keycloak:
 ```python
 import os
 import requests
 import mlflow
 # Get JWT token from Keycloak
 token_response = requests.post(
    "https://auth.example.com/realms/buunstack/protocol/openid-connect/token",
    data={
        'grant_type': 'password',
        'client_id': 'mlflow',
        'client_secret': 'your-client-secret',  # From Vault
        'username': 'user@domain.com',
        'password': 'your-keycloak-password',
        'scope': 'openid profile email groups'
    },
    verify=False
 )
 access_token = token_response.json()['access_token']
 os.environ["MLFLOW_TRACKING_TOKEN"] = access_token
 mlflow.set_tracking_uri("https://mlflow.example.com")
 mlflow.set_experiment("my-experiment")
 ```
 **Important Notes**
- Access keys have an expiration date (max 1 year)
+- **Username format**: Must be your full email address (e.g., `user@domain.com`), not just the username
- Store access keys securely (use environment variables or secret management)
+- **Access tokens expire**: Maximum lifetime is 1 year, needs regeneration via Web UI
- Never commit access keys to version control
+- **Token is a password**: The Web UI "token" is used with Basic Auth, not as a Bearer token
- Each user should create their own access key
+- **MLflow standard tokens don't work**: mlflow-oidc-auth replaces MLflow's built-in authentication
- Expired keys need to be regenerated via the Web UI
+- **Security**: Store credentials in environment variables or secret management systems
 - **Never commit**: Don't commit credentials to version control
 - **Per-user tokens**: Each user should create and use their own access token
 ### Model Registry
@@ -326,12 +366,15 @@ MLflow Server (HTTP inside cluster)
 ## Authentication
 **IMPORTANT**: This MLflow deployment uses `mlflow-oidc-auth` plugin, which replaces MLflow's standard authentication system. MLflow's built-in token authentication does not work with this setup.
 ### User Login (OIDC)
 - Users authenticate via Keycloak
 - Standard OIDC flow with Authorization Code grant
 - Group membership retrieved from `groups` claim in UserInfo
 - Users automatically created on first login
 - Username is stored as full email address (e.g., `user@domain.com`)
 ### Access Control