diff --git a/docs/jupyterhub.md b/docs/jupyterhub.md index c3f9416..d9eb88d 100644 --- a/docs/jupyterhub.md +++ b/docs/jupyterhub.md @@ -15,13 +15,13 @@ This will prompt for: - JupyterHub host (FQDN) - NFS PV usage (if Longhorn is installed) - NFS server details (if NFS is enabled) -- Vault integration setup +- Vault integration setup (requires root token for initial setup) ### Prerequisites - Keycloak must be installed and configured - For NFS storage: Longhorn must be installed -- For Vault integration: Vault must be installed and configured +- For Vault integration: Vault and External Secrets Operator must be installed - Helm repository must be accessible ## Kernel Images @@ -52,13 +52,13 @@ Enable/disable profiles using environment variables: ```bash # Enable buun-stack profile (CPU version) -export JUPYTER_PROFILE_BUUN_STACK_ENABLED=true +JUPYTER_PROFILE_BUUN_STACK_ENABLED=true # Enable buun-stack CUDA profile (GPU version) -export JUPYTER_PROFILE_BUUN_STACK_CUDA_ENABLED=true +JUPYTER_PROFILE_BUUN_STACK_CUDA_ENABLED=true # Disable default datascience profile -export JUPYTER_PROFILE_DATASCIENCE_ENABLED=false +JUPYTER_PROFILE_DATASCIENCE_ENABLED=false ``` Available profile variables: @@ -122,35 +122,66 @@ JUPYTER_PYTHON_KERNEL_TAG=python-3.12-28 ### Overview -Vault integration enables secure secrets management directly from Jupyter notebooks using user-specific Vault tokens. Each user receives their own isolated Vault token during notebook spawn, ensuring complete separation of secrets between users. Users can store and retrieve API keys, database credentials, and other sensitive data securely with automatic token renewal. +Vault integration enables secure secrets management directly from Jupyter notebooks. The system uses: + +- **ExternalSecret** to fetch the admin token from Vault +- **Renewable tokens** with unlimited Max TTL to avoid 30-day system limitations +- **Token renewal script** that automatically renews tokens every 12 hours +- **User-specific tokens** created during notebook spawn with isolated access + +### Architecture + +```plain +┌──────────────────────────────────────────────────────────────────┐ +│ JupyterHub Hub Pod │ +│ │ +│ ┌──────────────┐ ┌────────────────┐ ┌────────────────────┐ │ +│ │ Hub │ │ Token Renewer │ │ ExternalSecret │ │ +│ │ Container │◄─┤ Sidecar │◄─┤ (mounted as │ │ +│ │ │ │ │ │ Secret) │ │ +│ └──────────────┘ └────────────────┘ └────────────────────┘ │ +│ │ │ ▲ │ +│ │ │ │ │ +│ ▼ ▼ │ │ +│ ┌──────────────────────────────────┐ │ │ +│ │ /vault/secrets/vault-token │ │ │ +│ │ (Admin token for user creation) │ │ │ +│ └──────────────────────────────────┘ │ │ +└────────────────────────────────────────────────────┼────────────┘ + │ + ┌───────────▼──────────┐ + │ Vault │ + │ secret/jupyterhub/ │ + │ vault-token │ + └──────────────────────┘ +``` ### Prerequisites Vault integration requires: - Vault server installed and configured -- Keycloak OIDC authentication configured +- External Secrets Operator installed +- ClusterSecretStore configured for Vault - **Buun-stack kernel images** (standard images don't include Vault integration) ### Setup -Vault integration is configured during JupyterHub installation. You have two options: - -#### Option 1: Interactive setup (recommended) +Vault integration is configured during JupyterHub installation: ```bash just jupyterhub::install # Answer "yes" when prompted about Vault integration +# Provide Vault root token when prompted ``` -#### Option 2: Pre-configured setup +The setup process: -```bash -export JUPYTERHUB_VAULT_INTEGRATION_ENABLED=true -just jupyterhub::install -``` - -**Note**: The `just jupyterhub::setup-vault-integration` command is called automatically during installation if Vault integration is enabled. This configures Vault Agent for automatic token renewal and user-specific token management. +1. Creates `jupyterhub-admin` policy with necessary permissions including `sudo` for orphan token creation +2. Creates renewable admin token with 24h TTL and unlimited Max TTL +3. Stores token in Vault at `secret/jupyterhub/vault-token` +4. Creates ExternalSecret to fetch token from Vault +5. Deploys token renewal sidecar for automatic renewal ### Usage in Notebooks @@ -182,11 +213,31 @@ secrets.delete('api-keys', field='github') # Delete only github field ### Security Features -- **User isolation**: Each user receives a unique Vault token with access only to their own secrets -- **Automatic token renewal**: Both admin and user tokens are automatically renewed by Vault Agent -- **Vault Agent integration**: JupyterHub admin token is automatically renewed using Kubernetes authentication +- **User isolation**: Each user receives an orphan token with access only to their namespace +- **Automatic renewal**: Token renewal script renews admin token every 12 hours +- **ExternalSecret integration**: Admin token fetched securely from Vault +- **Orphan tokens**: User tokens are orphan tokens, not limited by parent policy restrictions - **Audit trail**: All secret access is logged in Vault -- **Individual policies**: Each user has their own Vault policy restricting access to their namespace + +### Token Management + +#### Admin Token + +The admin token is managed through: + +1. **Creation**: `just jupyterhub::create-jupyterhub-vault-token` creates renewable token +2. **Storage**: Stored in Vault at `secret/jupyterhub/vault-token` +3. **Retrieval**: ExternalSecret fetches and mounts as Kubernetes Secret +4. **Renewal**: `vault-token-renewer.sh` script renews every 12 hours + +#### User Tokens + +User tokens are created dynamically: + +1. **Pre-spawn hook** reads admin token from `/vault/secrets/vault-token` +2. **Creates user policy** `jupyter-user-{username}` with restricted access +3. **Creates orphan token** with user policy (requires `sudo` permission) +4. **Sets environment variable** `NOTEBOOK_VAULT_TOKEN` in notebook container ## Storage Options @@ -199,9 +250,9 @@ Uses Kubernetes PersistentVolumes for user home directories. For shared storage across nodes, configure NFS: ```bash -export JUPYTERHUB_NFS_PV_ENABLED=true -export JUPYTER_NFS_IP=192.168.10.1 -export JUPYTER_NFS_PATH=/volume1/drive1/jupyter +JUPYTERHUB_NFS_PV_ENABLED=true +JUPYTER_NFS_IP=192.168.10.1 +JUPYTER_NFS_PATH=/volume1/drive1/jupyter ``` NFS storage requires: @@ -230,22 +281,18 @@ JUPYTERHUB_NFS_PV_ENABLED=false # Vault integration JUPYTERHUB_VAULT_INTEGRATION_ENABLED=false -VAULT_ADDR=http://vault.vault.svc:8200 +VAULT_ADDR=https://vault.example.com # Image settings JUPYTER_PYTHON_KERNEL_TAG=python-3.12-28 IMAGE_REGISTRY=localhost:30500 # Vault token TTL settings -JUPYTERHUB_VAULT_TOKEN_TTL=24h # Admin token: 1 day (auto-renewed by Vault Agent) -JUPYTERHUB_VAULT_TOKEN_MAX_TTL=720h # Admin token: 30 days (max renewal limit) -NOTEBOOK_VAULT_TOKEN_TTL=24h # User token: 1 day (auto-renewed) -NOTEBOOK_VAULT_TOKEN_MAX_TTL=168h # User token: 7 days (max renewal limit) +JUPYTERHUB_VAULT_TOKEN_TTL=24h # Admin token: renewed every 12h +NOTEBOOK_VAULT_TOKEN_TTL=24h # User token: 1 day +NOTEBOOK_VAULT_TOKEN_MAX_TTL=168h # User token: 7 days max -# Vault Agent logging -VAULT_AGENT_LOG_LEVEL=info # Options: trace, debug, info, warn, error - -# Application logging +# Logging JUPYTER_BUUNSTACK_LOG_LEVEL=warning # Options: debug, info, warning, error ``` @@ -261,6 +308,13 @@ Customize JupyterHub behavior by editing `jupyterhub-values.gomplate.yaml` templ just jupyterhub::uninstall ``` +This removes: + +- JupyterHub deployment +- User pods +- PVCs +- ExternalSecret + ### Update Upgrade to newer versions: @@ -277,6 +331,18 @@ just jupyterhub::push-kernel-images just jupyterhub::install ``` +### Manual Token Refresh + +If needed, manually refresh the admin token: + +```bash +# Create new renewable token +just jupyterhub::create-jupyterhub-vault-token + +# Restart JupyterHub to pick up new token +kubectl rollout restart deployment/hub -n jupyter +``` + ## Troubleshooting ### Image Pull Issues @@ -291,28 +357,33 @@ kubectl get pods -n jupyter kubectl describe pod -n jupyter # Increase timeout if needed -helm upgrade jupyterhub jupyterhub/jupyterhub \ - --timeout=30m -f jupyterhub-values.yaml +helm upgrade jupyterhub jupyterhub/jupyterhub --timeout=30m -f jupyterhub-values.yaml ``` ### Vault Integration Issues -Check Vault connectivity and authentication: +Check token and authentication: -```python -# In a notebook -import os -print("Vault Address:", os.getenv('VAULT_ADDR')) -print("JWT Token:", bool(os.getenv('NOTEBOOK_VAULT_JWT'))) -print("Vault Token:", bool(os.getenv('NOTEBOOK_VAULT_TOKEN'))) +```bash +# Check ExternalSecret status +kubectl get externalsecret -n jupyter jupyterhub-vault-token -# Test SecretStore -from buunstack import SecretStore -secrets = SecretStore() -status = secrets.get_status() -print(status) +# Check if Secret was created +kubectl get secret -n jupyter jupyterhub-vault-token + +# Check token renewal logs +kubectl logs -n jupyter -l app.kubernetes.io/component=hub -c vault-agent + +# In a notebook, verify environment +%env NOTEBOOK_VAULT_TOKEN ``` +Common issues: + +1. **"child policies must be subset of parent"**: Admin policy needs `sudo` permission for orphan tokens +2. **Token not found**: Check ExternalSecret and ClusterSecretStore configuration +3. **Permission denied**: Verify `jupyterhub-admin` policy has all required permissions + ### Authentication Issues Verify Keycloak client configuration: @@ -336,177 +407,38 @@ JupyterHub uses the official Zero to JupyterHub (Z2JH) Helm chart: - Version: `4.2.0` (configurable via `JUPYTERHUB_CHART_VERSION`) - Documentation: https://z2jh.jupyter.org/ -### User-Specific Vault Token System +### Token System Architecture -The `buunstack` SecretStore uses pre-created user-specific Vault tokens that are generated during notebook spawn, ensuring complete user isolation and secure access to individual secret namespaces. +The system uses a three-tier token approach: -#### Architecture Overview +1. **Renewable Admin Token**: + - Created with `explicit-max-ttl=0` (unlimited Max TTL) + - Renewed automatically every 12 hours + - Stored in Vault and fetched via ExternalSecret -```plain -┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐ -│ JupyterHub │ │ Notebook │ │ Vault │ -│ │ │ │ │ │ -│ ┌───────────┐ │ │ ┌────────────┐ │ │ ┌───────────┐ │ -│ │Pre-spawn │ │───►│ │SecretStore │ ├───►│ │User Token │ │ -│ │ Hook │ │ │ │ │ │ │ │ + Policy │ │ -│ └───────────┘ │ │ └────────────┘ │ │ └───────────┘ │ -└─────────────────┘ └──────────────────┘ └─────────────────┘ -``` +2. **Orphan User Tokens**: + - Created with `create_orphan()` API call + - Not limited by parent token policies + - Individual TTL and Max TTL settings -**Key Components**: +3. **Token Renewal Script**: + - Runs as sidecar container + - Reads token from ExternalSecret mount + - Handles renewal and re-retrieval on failure -- **JupyterHub Admin Token**: Automatically renewed by Vault Agent, read from file at `/vault/secrets/vault-token` -- **User-Specific Tokens**: Created dynamically during notebook spawn, available as `NOTEBOOK_VAULT_TOKEN` environment variable -- **User Policies**: Restrict access to `secret/data/jupyter/users/{username}/*` -- **Vault Agent**: Sidecar container that handles automatic token renewal using Kubernetes authentication +### Key Files -#### Token Lifecycle - -1. **Pre-spawn Hook Setup** - - JupyterHub uses admin Vault token to access Vault API - - Creates user-specific Vault policy with restricted path access - - Generates new user-specific Vault token with the created policy - - Passes user token to notebook environment via `NOTEBOOK_VAULT_TOKEN` - -2. **SecretStore Initialization** - - Reads user-specific token from environment variable: - - `NOTEBOOK_VAULT_TOKEN` (User-specific Vault token) - - Uses token for all Vault operations within user's namespace - -3. **Token Validation** - - Before operations, checks token validity using `lookup_self` - - Verifies token TTL and renewable status - -4. **Automatic Token Renewal** - - If token TTL is low (< 10 minutes) and renewable, renews token - - Uses `renew_self` capability granted by user policy - - Logs renewal success for monitoring - -#### Code Flow - -```python -def _ensure_authenticated(self): - # Check if current Vault token is valid - try: - if self.client.is_authenticated(): - # Check if token needs renewal - token_info = self.client.auth.token.lookup_self() - ttl = token_info.get("data", {}).get("ttl", 0) - renewable = token_info.get("data", {}).get("renewable", False) - - # Renew if TTL < 10 minutes and renewable - if renewable and ttl > 0 and ttl < 600: - self.client.auth.token.renew_self() - logger.info("✅ Vault token renewed successfully") - return - except Exception: - pass - - # Token expired and cannot be refreshed - raise Exception("User-specific Vault token expired and cannot be refreshed. Please restart your notebook server.") -``` - -#### Key Design Decisions - -##### 1. User-Specific Token Creation - -- Each user receives a unique Vault token during notebook spawn -- Individual policies ensure complete user isolation -- Admin token used only during pre-spawn hook for token creation - -##### 2. Policy-Based Access Control - -- User policies restrict access to `secret/data/jupyter/users/{username}/*` -- Each user can only access their own secret namespace -- Token management capabilities (`lookup_self`, `renew_self`) included - -##### 3. Singleton Pattern - -- Single SecretStore instance per notebook session -- Prevents multiple simultaneous authentications -- Maintains consistent token state - -##### 4. Pre-created User Tokens - -- Tokens are created during notebook spawn via pre-spawn hook -- Reduces initialization overhead in notebooks -- Provides immediate access to user's secret namespace - -#### Error Handling - -```python -# Primary error scenarios and responses: - -1. User token unavailable - → Token stored in NOTEBOOK_VAULT_TOKEN env var - → Prompt to restart notebook server if missing - -2. Vault token expired - → Automatic renewal using renew_self if renewable - → Restart notebook server required if not renewable - -3. Vault authentication failure - → Log detailed error information - → Check user policy and token configuration - -4. Network connectivity issues - → Built-in retry in hvac client - → Provide actionable error messages -``` - -#### Configuration - -Environment variables passed to notebooks: - -```yaml -# JupyterHub pre_spawn_hook sets: -spawner.environment: - # Core services - POSTGRES_HOST: 'postgres-cluster-rw.postgres' - POSTGRES_PORT: '5432' - JUPYTERHUB_API_URL: 'http://hub:8081/hub/api' - BUUNSTACK_LOG_LEVEL: 'info' # or 'debug' for detailed logging - - # Vault integration - NOTEBOOK_VAULT_TOKEN: '' - VAULT_ADDR: 'http://vault.vault.svc:8200' -``` - -#### Monitoring and Debugging - -Enable detailed logging for troubleshooting: - -```python -# In notebook -import os -os.environ['BUUNSTACK_LOG_LEVEL'] = 'DEBUG' - -# Restart kernel and check logs -from buunstack import SecretStore -secrets = SecretStore() - -# Check authentication status -status = secrets.get_status() -print("Username:", status['username']) -print("Vault Address:", status['vault_addr']) -print("Authentication Method:", status['authentication_method']) -print("Vault Authenticated:", status['vault_authenticated']) -``` - -#### Performance Characteristics - -- **Token renewal overhead**: ~10-50ms for renew_self call -- **Memory usage**: Minimal (single token stored as string) -- **Network traffic**: Only during token renewal (when TTL < 10 minutes) -- **Vault impact**: Standard token operations (lookup_self, renew_self) +- `jupyterhub-admin-policy.hcl`: Vault policy with admin permissions +- `user_policy.hcl`: Template for user-specific policies +- `vault-token-renewer.sh`: Token renewal script +- `jupyterhub-vault-token-external-secret.gomplate.yaml`: ExternalSecret configuration ## Performance Considerations - **Image Size**: Buun-stack images are ~13GB, plan storage accordingly - **Pull Time**: Initial pulls take 5-15 minutes depending on network - **Resource Usage**: Data science workloads require adequate CPU/memory -- **Storage**: NFS provides better performance for shared datasets -- **Token Renewal**: User token renewal adds minimal overhead +- **Token Renewal**: Minimal overhead (renewal every 12 hours) For production deployments, consider: @@ -514,87 +446,13 @@ For production deployments, consider: - Using faster storage backends - Configuring resource limits per user - Setting up monitoring and alerts -- Monitoring Vault token expiration and renewal patterns - -## Vault Agent Integration - -### Overview - -JupyterHub now uses Vault Agent for automatic token renewal, eliminating the need for manual token management. Vault Agent runs as a sidecar container in the JupyterHub hub pod and automatically renews the admin token using Kubernetes authentication. - -### Architecture - -```plain -┌─────────────────────────────────────────────────────────────┐ -│ JupyterHub Hub Pod │ -│ │ -│ ┌─────────────────┐ ┌─────────────────────┐ │ -│ │ Hub Container │ │ Vault Agent Sidecar │ │ -│ │ │ │ │ │ -│ │ Reads token │◄─────────────┤ Writes token │ │ -│ │ from file │ │ to shared volume │ │ -│ │ │ │ │ │ -│ └─────────────────┘ └─────────────────────┘ │ -│ │ │ │ -│ │ │ │ -│ │ │ │ -│ ┌─────────▼─────────┐ ┌─────────▼─────────┐ │ -│ │ /vault/secrets/ │ │ Kubernetes Auth │ │ -│ │ vault-token │ │ with Vault │ │ -│ └───────────────────┘ └───────────────────┘ │ -└─────────────────────────────────────────────────────────────┘ -``` - -### Key Features - -- **Automatic Renewal**: Admin token is automatically renewed every TTL/2 interval -- **Kubernetes Authentication**: Uses Kubernetes ServiceAccount for secure token acquisition -- **File-based Token Sharing**: Vault Agent writes tokens to shared volume, Hub reads from file -- **Zero Downtime**: Token renewal happens in background without service interruption -- **Configurable Logging**: Vault Agent log level can be configured via `VAULT_AGENT_LOG_LEVEL` - -### Monitoring Token Renewal - -Check Vault Agent status and token renewal: - -```bash -# Monitor Vault Agent logs -kubectl logs -n jupyter -l app.kubernetes.io/component=hub -c vault-agent -f - -# Use monitoring script -cd jupyterhub -./monitor-vault-token.sh - -# Check token details -kubectl exec -n jupyter -c hub -- curl -s -H "X-Vault-Token: $(cat /vault/secrets/vault-token)" $VAULT_ADDR/v1/auth/token/lookup-self -``` - -### Testing Token Renewal - -For testing purposes, you can use shorter TTL values to observe rapid token renewal: - -```bash -# Test with 1-minute TTL (renews every 30 seconds) -JUPYTERHUB_VAULT_TOKEN_TTL=1m VAULT_AGENT_LOG_LEVEL=debug just jupyterhub::install - -# Monitor renewal activity -kubectl logs -n jupyter -l app.kubernetes.io/component=hub -c vault-agent -f | grep "renewed auth token" -``` - -### Configuration Files - -The Vault Agent integration uses several configuration files: - -- `vault-agent-config.gomplate.hcl`: Vault Agent configuration template -- `token-monitor.tpl`: Template for logging token information -- `monitor-vault-token.sh`: Monitoring script for token status ## Known Limitations -1. **Token Max TTL**: Even with Vault Agent auto-renewal, tokens cannot be renewed beyond `JUPYTERHUB_VAULT_TOKEN_MAX_TTL` (default: 720h/30 days). After this period, JupyterHub must be redeployed to acquire a new token from Vault. With the default 30-day limit, this requires monthly maintenance. +1. **Annual Token Recreation**: While tokens have unlimited Max TTL, best practice suggests recreating them annually -2. **Cull Settings**: Server idle timeout is set to 2 hours by default. Adjust `cull.timeout` and `cull.every` in the Helm values for different requirements. +2. **Cull Settings**: Server idle timeout is set to 2 hours by default. Adjust `cull.timeout` and `cull.every` in the Helm values for different requirements -3. **NFS Storage**: When using NFS storage, ensure proper permissions are set on the NFS server. The default `JUPYTER_FSGID` is 100. +3. **NFS Storage**: When using NFS storage, ensure proper permissions are set on the NFS server. The default `JUPYTER_FSGID` is 100 -4. **Vault Agent Resource Usage**: The Vault Agent sidecar uses minimal resources (50m CPU, 64Mi memory) but adds slight overhead to the hub pod. +4. **ExternalSecret Dependency**: Requires External Secrets Operator to be installed and configured diff --git a/jupyterhub/jupyterhub-admin-policy.hcl b/jupyterhub/jupyterhub-admin-policy.hcl new file mode 100644 index 0000000..348f52d --- /dev/null +++ b/jupyterhub/jupyterhub-admin-policy.hcl @@ -0,0 +1,50 @@ +# JupyterHub Minimal Admin Policy +# Provides only necessary permissions for JupyterHub operations + +# Read Keycloak credentials for OIDC authentication +path "secret/data/keycloak/admin" { + capabilities = ["read"] +} + +# Full access to user secrets namespace for notebook users +path "secret/data/jupyter/*" { + capabilities = ["create", "read", "update", "delete", "list", "patch"] +} + +# List secrets for user management +path "secret/metadata/jupyter/*" { + capabilities = ["list"] +} + +# Token creation and management for user-specific tokens +path "auth/token/create" { + capabilities = ["create", "update"] +} + +# Create orphan tokens (requires sudo for policy override) +path "auth/token/create-orphan" { + capabilities = ["create", "update", "sudo"] +} + +path "auth/token/lookup-self" { + capabilities = ["read"] +} + +path "auth/token/renew-self" { + capabilities = ["update"] +} + +# Create user-specific policies dynamically +path "sys/policies/acl/jupyter-user-*" { + capabilities = ["create", "read", "update", "delete"] +} + +# Read user policies to allow token creation with these policies +path "sys/policies/acl/*" { + capabilities = ["read", "list"] +} + +# System capabilities check +path "sys/capabilities-self" { + capabilities = ["read"] +} \ No newline at end of file diff --git a/jupyterhub/jupyterhub-values.gomplate.yaml b/jupyterhub/jupyterhub-values.gomplate.yaml index eaa56cb..761f3bb 100644 --- a/jupyterhub/jupyterhub-values.gomplate.yaml +++ b/jupyterhub/jupyterhub-values.gomplate.yaml @@ -5,11 +5,8 @@ hub: NOTEBOOK_VAULT_TOKEN_TTL: {{ .Env.NOTEBOOK_VAULT_TOKEN_TTL | quote }} NOTEBOOK_VAULT_TOKEN_MAX_TTL: {{ .Env.NOTEBOOK_VAULT_TOKEN_MAX_TTL | quote }} {{- if eq .Env.JUPYTERHUB_VAULT_INTEGRATION_ENABLED "true" }} - # Vault Agent will provide token via file + # Vault Agent provides renewable token via file (unlimited max TTL) VAULT_TOKEN_FILE: "/vault/secrets/vault-token" - {{- else }} - # Traditional token via environment variable - JUPYTERHUB_VAULT_TOKEN: {{ .Env.JUPYTERHUB_VAULT_TOKEN | quote }} {{- end }} # Install packages at container startup @@ -63,24 +60,24 @@ hub: # Set environment variables for spawned containers import hvac + {{- if eq .Env.JUPYTERHUB_VAULT_INTEGRATION_ENABLED "true" }} def get_vault_token(): - """Read Vault token from file written by Vault Agent""" + """Read Vault token from file""" import os - token_file = os.environ.get('VAULT_TOKEN_FILE', '/vault/secrets/vault-token') + + token_file = '/vault/secrets/vault-token' try: with open(token_file, 'r') as f: token = f.read().strip() if token: return token - else: - raise Exception(f"Empty token file: {token_file}") except FileNotFoundError: - # Fallback to environment variable for backward compatibility - return os.environ.get("JUPYTERHUB_VAULT_TOKEN") + print(f"Token file not found: {token_file}") except Exception as e: - # Log error but attempt fallback print(f"Error reading token file {token_file}: {e}") - return os.environ.get("JUPYTERHUB_VAULT_TOKEN") + + return None + {{- end }} async def pre_spawn_hook(spawner): """Set essential environment variables for spawned containers""" @@ -94,6 +91,7 @@ hub: # Logging configuration spawner.environment["BUUNSTACK_LOG_LEVEL"] = "{{ .Env.JUPYTER_BUUNSTACK_LOG_LEVEL }}" + {{- if eq .Env.JUPYTERHUB_VAULT_INTEGRATION_ENABLED "true" }} # Create user-specific Vault token directly try: username = spawner.user.name @@ -105,7 +103,7 @@ hub: spawner.log.info(f"pre_spawn_hook starting for {username}") spawner.log.info(f"Vault address: {vault_addr}") - spawner.log.info(f"Vault token source: {'file' if os.path.exists(os.environ.get('VAULT_TOKEN_FILE', '/vault/secrets/vault-token')) else 'env'}") + spawner.log.info(f"Vault token source: {'file' if os.path.exists('/vault/secrets/vault-token') else 'env'}") spawner.log.info(f"Vault token present: {bool(vault_token)}, length: {len(vault_token) if vault_token else 0}") if not vault_token: @@ -141,7 +139,7 @@ hub: user_token_ttl = os.environ.get("NOTEBOOK_VAULT_TOKEN_TTL", "24h") user_token_max_ttl = os.environ.get("NOTEBOOK_VAULT_TOKEN_MAX_TTL", "168h") - token_response = vault_client.auth.token.create( + token_response = vault_client.auth.token.create_orphan( policies=[user_policy_name], ttl=user_token_ttl, renewable=True, @@ -161,6 +159,7 @@ hub: spawner.log.error("Failed to create user-specific Vault token for {}: {}".format(spawner.user.name, e)) import traceback spawner.log.error("Full traceback: {}".format(traceback.format_exc())) + {{- end }} c.KubeSpawner.pre_spawn_hook = pre_spawn_hook @@ -172,12 +171,18 @@ hub: - name: vault-config configMap: name: vault-agent-config + - name: vault-admin-token + secret: + secretName: jupyterhub-vault-token extraVolumeMounts: - name: vault-secrets mountPath: /vault/secrets - name: vault-config mountPath: /vault/config + - name: vault-admin-token + mountPath: /vault/admin-token + readOnly: true extraContainers: - name: vault-agent @@ -195,8 +200,8 @@ hub: - /bin/sh - -c - | - # Start Vault Agent - vault agent -config=/vault/config/agent.hcl + # Start token renewal script (handles both retrieval and renewal) + exec sh /vault/config/vault-token-renewer.sh env: - name: VAULT_ADDR value: {{ .Env.VAULT_ADDR | quote }} @@ -205,6 +210,9 @@ hub: mountPath: /vault/secrets - name: vault-config mountPath: /vault/config + - name: vault-admin-token + mountPath: /vault/admin-token + readOnly: true resources: requests: cpu: 50m diff --git a/jupyterhub/jupyterhub-vault-token-external-secret.gomplate.yaml b/jupyterhub/jupyterhub-vault-token-external-secret.gomplate.yaml new file mode 100644 index 0000000..0938cc8 --- /dev/null +++ b/jupyterhub/jupyterhub-vault-token-external-secret.gomplate.yaml @@ -0,0 +1,18 @@ +apiVersion: external-secrets.io/v1 +kind: ExternalSecret +metadata: + name: jupyterhub-vault-token + namespace: {{ .Env.JUPYTERHUB_NAMESPACE }} +spec: + refreshInterval: 1h + secretStoreRef: + name: vault-secret-store + kind: ClusterSecretStore + target: + name: jupyterhub-vault-token + creationPolicy: Owner + data: + - secretKey: token + remoteRef: + key: jupyterhub/vault-token + property: token \ No newline at end of file diff --git a/jupyterhub/justfile b/jupyterhub/justfile index 0317d6d..cfba289 100644 --- a/jupyterhub/justfile +++ b/jupyterhub/justfile @@ -20,7 +20,6 @@ export JUPYTER_PROFILE_TENSORFLOW_ENABLED := env("JUPYTER_PROFILE_TENSORFLOW_ENA export JUPYTER_PROFILE_BUUN_STACK_ENABLED := env("JUPYTER_PROFILE_BUUN_STACK_ENABLED", "false") export JUPYTER_PROFILE_BUUN_STACK_CUDA_ENABLED := env("JUPYTER_PROFILE_BUUN_STACK_CUDA_ENABLED", "false") export JUPYTERHUB_VAULT_TOKEN_TTL := env("JUPYTERHUB_VAULT_TOKEN_TTL", "24h") -export JUPYTERHUB_VAULT_TOKEN_MAX_TTL := env("JUPYTERHUB_VAULT_TOKEN_MAX_TTL", "720h") export NOTEBOOK_VAULT_TOKEN_TTL := env("NOTEBOOK_VAULT_TOKEN_TTL", "24h") export NOTEBOOK_VAULT_TOKEN_MAX_TTL := env("NOTEBOOK_VAULT_TOKEN_MAX_TTL", "168h") export VAULT_AGENT_LOG_LEVEL := env("VAULT_AGENT_LOG_LEVEL", "info") @@ -54,7 +53,7 @@ delete-namespace: kubectl delete namespace ${JUPYTERHUB_NAMESPACE} --ignore-not-found # Install JupyterHub -install: +install root_token='': #!/bin/bash set -euo pipefail export JUPYTERHUB_HOST=${JUPYTERHUB_HOST:-} @@ -129,8 +128,16 @@ install: if [ "${JUPYTERHUB_VAULT_INTEGRATION_ENABLED}" = "true" ]; then echo "Setting up Vault Agent for automatic token management..." echo " Token TTL: ${JUPYTERHUB_VAULT_TOKEN_TTL}" - echo " Token Max TTL: ${JUPYTERHUB_VAULT_TOKEN_MAX_TTL}" - just setup-vault-integration + export VAULT_TOKEN="{{ root_token }}" + while [ -z "${VAULT_TOKEN}" ]; do + VAULT_TOKEN=$(gum input --prompt="Vault root token: " --password --width=100) + done + just setup-vault-integration ${VAULT_TOKEN} + just create-jupyterhub-vault-token ${VAULT_TOKEN} + + # Create ExternalSecret for admin vault token + echo "Creating ExternalSecret for admin vault token..." + gomplate -f jupyterhub-vault-token-external-secret.gomplate.yaml | kubectl apply -f - # Read user policy template for Vault export USER_POLICY_HCL=$(cat user_policy.hcl) @@ -155,6 +162,7 @@ uninstall: helm uninstall jupyterhub -n ${JUPYTERHUB_NAMESPACE} --wait --ignore-not-found kubectl delete pods -n ${JUPYTERHUB_NAMESPACE} -l app.kubernetes.io/component=singleuser-server kubectl delete -n ${JUPYTERHUB_NAMESPACE} pvc jupyter-nfs-pvc --ignore-not-found + kubectl delete -n ${JUPYTERHUB_NAMESPACE} externalsecret jupyterhub-vault-token --ignore-not-found if kubectl get pv jupyter-nfs-pv &>/dev/null; then kubectl patch pv jupyter-nfs-pv -p '{"spec":{"claimRef":null}}' fi @@ -213,39 +221,43 @@ push-kernel-images: setup-vault-integration root_token='': #!/bin/bash set -euo pipefail - echo "Setting up Vault integration for JupyterHub..." - - # Create Kubernetes role for JupyterHub in Vault - echo "Creating Kubernetes authentication role for JupyterHub..." - echo " Service Account: hub" - echo " Namespace: jupyter" - echo " Policies: admin" - echo " TTL: ${JUPYTERHUB_VAULT_TOKEN_TTL}" - echo " Max TTL: ${JUPYTERHUB_VAULT_TOKEN_MAX_TTL}" export VAULT_TOKEN="{{ root_token }}" while [ -z "${VAULT_TOKEN}" ]; do VAULT_TOKEN=$(gum input --prompt="Vault root token: " --password --width=100) done - vault write auth/kubernetes/role/jupyterhub \ + + echo "Setting up Vault integration for JupyterHub..." + + # Create JupyterHub-specific policy and Kubernetes role in Vault + echo "Creating JupyterHub-specific Vault policy and Kubernetes role..." + echo " Service Account: hub" + echo " Namespace: jupyter" + echo " Policy: jupyterhub-admin (custom policy with extended max TTL)" + echo " TTL: ${JUPYTERHUB_VAULT_TOKEN_TTL}" + + # Create JupyterHub-specific policy + echo "Creating jupyterhub-admin policy..." + vault policy write jupyterhub-admin jupyterhub-admin-policy.hcl + + # Create Kubernetes role (use system-safe max_ttl to avoid warnings) + echo "Creating Kubernetes role..." + vault write auth/kubernetes/role/jupyterhub-admin \ bound_service_account_names=hub \ bound_service_account_namespaces=jupyter \ - policies=admin \ + policies=jupyterhub-admin \ ttl=${JUPYTERHUB_VAULT_TOKEN_TTL} \ - max_ttl=${JUPYTERHUB_VAULT_TOKEN_MAX_TTL} + max_ttl=720h - # Create Vault Agent configuration with gomplate - echo "Creating Vault Agent configuration..." - gomplate -f vault-agent-config.gomplate.hcl -o vault-agent-config.hcl + # Create ConfigMap with token renewal script + echo "Creating ConfigMap with token renewal script..." kubectl create configmap vault-agent-config -n ${JUPYTERHUB_NAMESPACE} \ - --from-file=agent.hcl=vault-agent-config.hcl \ - --from-file=token-monitor.tpl=token-monitor.tpl \ + --from-file=vault-token-renewer.sh=vault-token-renewer.sh \ --dry-run=client -o yaml | kubectl apply -f - echo "✓ Vault integration configured (user-specific tokens + auto-renewal)" echo "" echo "Configuration Summary:" echo " JupyterHub Token TTL: ${JUPYTERHUB_VAULT_TOKEN_TTL}" - echo " JupyterHub Token Max TTL: ${JUPYTERHUB_VAULT_TOKEN_MAX_TTL}" echo " User Token TTL: ${NOTEBOOK_VAULT_TOKEN_TTL}" echo " User Token Max TTL: ${NOTEBOOK_VAULT_TOKEN_MAX_TTL}" echo " Vault Agent Log Level: ${VAULT_AGENT_LOG_LEVEL}" @@ -257,19 +269,60 @@ setup-vault-integration root_token='': echo " # Each user gets their own isolated Vault token and policy" echo " # Admin token is automatically renewed by Vault Agent" -# Create JupyterHub Vault token (uses admin policy for JWT operations) -create-jupyterhub-vault-token: +# Create JupyterHub Vault token (renewable with unlimited Max TTL) +create-jupyterhub-vault-token root_token='': #!/bin/bash set -euo pipefail - echo "Creating JupyterHub Vault token with admin policy..." - echo " TTL: ${JUPYTERHUB_VAULT_TOKEN_TTL}" - echo " Max TTL: ${JUPYTERHUB_VAULT_TOKEN_MAX_TTL}" + export VAULT_TOKEN="{{ root_token }}" + while [ -z "${VAULT_TOKEN}" ]; do + VAULT_TOKEN=$(gum input --prompt="Vault root token: " --password --width=100) + done - # JupyterHub needs admin privileges to read Keycloak credentials from Vault - # Create token and store in Vault - just vault::create-token-and-store admin jupyterhub/vault-token ${JUPYTERHUB_VAULT_TOKEN_TTL} ${JUPYTERHUB_VAULT_TOKEN_MAX_TTL} + echo "Creating JupyterHub admin Vault token" - echo "✓ JupyterHub Vault token created and stored" + # jupyterhub-admin policy should exist (created by setup-vault-integration) + + # Check if token already exists + if vault kv get secret/jupyterhub/vault-token >/dev/null 2>&1; then + echo "Existing admin token found at secret/jupyterhub/vault-token" + if gum confirm "Replace existing token with new one?"; then + echo "Creating new admin token..." + else + echo "Using existing token" + return 0 + fi + fi + + # Create admin vault token with unlimited max TTL echo "" - echo "To use in JupyterHub deployment:" - echo " JUPYTERHUB_VAULT_TOKEN=\$(just vault::get jupyterhub/vault-token token)" + echo "Creating admin token (TTL: 24h, Max TTL: unlimited)..." + TOKEN_RESPONSE=$(vault token create \ + -policy=jupyterhub-admin \ + -ttl=24h \ + -explicit-max-ttl=0 \ + -display-name="jupyterhub-admin" \ + -renewable=true \ + -format=json) + + # Extract token + ADMIN_TOKEN=$(echo "$TOKEN_RESPONSE" | jq -r .auth.client_token) + + if [ -z "$ADMIN_TOKEN" ] || [ "$ADMIN_TOKEN" = "null" ]; then + echo "❌ Failed to create admin token" + exit 1 + fi + + # Store token in Vault for JupyterHub to retrieve + echo "Storing admin token in Vault..." + vault kv put secret/jupyterhub/vault-token token="$ADMIN_TOKEN" + + echo "" + echo "✅ Admin token created and stored successfully!" + echo "" + echo "Token behavior:" + echo " - TTL: 24 hours (will expire in 24h without renewal)" + echo " - Max TTL: Unlimited (can be renewed forever)" + echo " - Vault Agent will renew every 12 hours" + echo " - No more 30-day limitation!" + echo "" + echo "Token stored at: secret/jupyterhub/vault-token" diff --git a/jupyterhub/monitor-vault-token.sh b/jupyterhub/monitor-vault-token.sh deleted file mode 100755 index 42fb162..0000000 --- a/jupyterhub/monitor-vault-token.sh +++ /dev/null @@ -1,73 +0,0 @@ -#!/bin/bash - -# JupyterHub Vault Token Monitor Script -# Usage: ./monitor-vault-token.sh [pod-name] - -set -euo pipefail - -NAMESPACE="jupyter" -POD_NAME=${1:-$(kubectl get pods -n ${NAMESPACE} -l app.kubernetes.io/component=hub -o jsonpath='{.items[0].metadata.name}')} - -echo "🔍 Monitoring Vault Agent for JupyterHub Pod: ${POD_NAME}" -echo "==================================================" - -# Check if pod exists and is running -if ! kubectl get pod ${POD_NAME} -n ${NAMESPACE} >/dev/null 2>&1; then - echo "❌ Pod ${POD_NAME} not found in namespace ${NAMESPACE}" - exit 1 -fi - -echo "📊 Pod Status:" -kubectl get pod ${POD_NAME} -n ${NAMESPACE} -echo "" - -echo "📄 Vault Secrets Directory:" -kubectl exec -n ${NAMESPACE} ${POD_NAME} -c hub -- ls -la /vault/secrets/ 2>/dev/null || echo "❌ Cannot access /vault/secrets/" -echo "" - -echo "🔐 Current Token Info:" -kubectl exec -n ${NAMESPACE} ${POD_NAME} -c hub -- sh -c ' - if [ -f /vault/secrets/vault-token ]; then - echo "Token file exists ($(wc -c < /vault/secrets/vault-token) bytes)" - echo "Last modified: $(stat -c %y /vault/secrets/vault-token 2>/dev/null || stat -f %Sm /vault/secrets/vault-token)" - - # Test token validity - if command -v curl >/dev/null 2>&1; then - echo "" - echo "Token validation:" - RESPONSE=$(curl -s -w "%{http_code}" -H "X-Vault-Token: $(cat /vault/secrets/vault-token)" $VAULT_ADDR/v1/auth/token/lookup-self) - HTTP_CODE="${RESPONSE: -3}" - if [ "$HTTP_CODE" = "200" ]; then - echo "✅ Token is valid" - echo "$RESPONSE" | head -c -3 | grep -E "(ttl|expire_time|renewable)" | head -3 - else - echo "❌ Token validation failed (HTTP $HTTP_CODE)" - fi - fi - else - echo "❌ Token file not found" - fi -' 2>/dev/null || echo "❌ Cannot check token info" - -echo "" -echo "📋 Recent Vault Agent Logs:" -kubectl logs -n ${NAMESPACE} ${POD_NAME} -c vault-agent --tail=10 2>/dev/null || echo "❌ Cannot access vault-agent logs" - -echo "" -echo "📋 Token Renewal Log (if exists):" -kubectl exec -n ${NAMESPACE} ${POD_NAME} -c hub -- sh -c ' - if [ -f /vault/secrets/renewal.log ]; then - echo "Recent renewal events:" - tail -10 /vault/secrets/renewal.log - else - echo "No renewal log file found yet" - fi -' 2>/dev/null || echo "❌ Cannot check renewal logs" - -echo "" -echo "🔄 To monitor token renewals in real-time, run:" -echo " kubectl logs -n ${NAMESPACE} ${POD_NAME} -c vault-agent -f | grep 'renewed auth token'" -echo "" -echo "🔍 To check token info periodically, run:" -echo " watch -n 30 \"kubectl exec -n ${NAMESPACE} ${POD_NAME} -c hub -- sh -c 'curl -s -H \\\"X-Vault-Token: \\\$(cat /vault/secrets/vault-token)\\\" \\\$VAULT_ADDR/v1/auth/token/lookup-self | grep -E \\\"(ttl|expire_time)\\\"'\"" - diff --git a/jupyterhub/token-monitor.tpl b/jupyterhub/token-monitor.tpl deleted file mode 100644 index 4a45bed..0000000 --- a/jupyterhub/token-monitor.tpl +++ /dev/null @@ -1,11 +0,0 @@ -{{- with secret "auth/token/lookup-self" -}} -=== Vault Token Status === -TTL: {{ .Data.ttl }} seconds -Renewable: {{ .Data.renewable }} -Expire Time: {{ .Data.expire_time }} -Policies: {{ range .Data.policies }}{{ . }} {{ end }} -Display Name: {{ .Data.display_name }} -Entity ID: {{ .Data.entity_id }} -Token Type: {{ .Data.type }} -=========================== -{{- end -}} \ No newline at end of file diff --git a/jupyterhub/vault-agent-config.gomplate.hcl b/jupyterhub/vault-agent-config.gomplate.hcl deleted file mode 100644 index 6d557a8..0000000 --- a/jupyterhub/vault-agent-config.gomplate.hcl +++ /dev/null @@ -1,38 +0,0 @@ -vault { - address = "{{ .Env.VAULT_ADDR }}" -} - -# Enable detailed logging -log_level = "{{ .Env.VAULT_AGENT_LOG_LEVEL }}" -log_format = "standard" - -auto_auth { - method "kubernetes" { - mount_path = "auth/kubernetes" - config = { - role = "jupyterhub" - } - } - - sink "file" { - config = { - path = "/vault/secrets/vault-token" - } - } -} - -cache { - use_auto_auth_token = true -} - -listener "tcp" { - address = "127.0.0.1:8100" - tls_disable = true -} - -# Add template for token monitoring -template { - source = "/vault/config/token-monitor.tpl" - destination = "/vault/secrets/token-info.log" - perms = 0644 -} \ No newline at end of file diff --git a/jupyterhub/vault-token-renewer.sh b/jupyterhub/vault-token-renewer.sh new file mode 100644 index 0000000..5e2387c --- /dev/null +++ b/jupyterhub/vault-token-renewer.sh @@ -0,0 +1,47 @@ +#!/bin/sh +# Script to handle admin token retrieval and renewal + +set -e + +echo "Starting Vault token management..." + +export VAULT_ADDR="${VAULT_ADDR}" + +# Wait for ExternalSecret to create the secret +echo "Waiting for admin token from ExternalSecret..." +while [ ! -f /vault/admin-token/token ]; do + echo "Waiting for /vault/admin-token/token..." + sleep 5 +done + +# Read admin token from mounted secret +ADMIN_TOKEN=$(cat /vault/admin-token/token) + +if [ -z "$ADMIN_TOKEN" ]; then + echo "ERROR: No admin token found in mounted secret" + exit 1 +fi + +echo "Admin token retrieved from ExternalSecret" +echo "$ADMIN_TOKEN" > /vault/secrets/vault-token + +# Start token renewal loop +export VAULT_TOKEN="$ADMIN_TOKEN" +while true; do + echo "$(date): Renewing admin token..." + if vault token renew >/dev/null 2>&1; then + echo "$(date): Token renewed successfully" + else + echo "$(date): Token renewal failed - trying to retrieve token again from ExternalSecret" + # Re-read token from mounted secret + ADMIN_TOKEN=$(cat /vault/admin-token/token 2>/dev/null || echo "") + if [ -n "$ADMIN_TOKEN" ]; then + echo "$ADMIN_TOKEN" > /vault/secrets/vault-token + export VAULT_TOKEN="$ADMIN_TOKEN" + echo "$(date): Token re-retrieved successfully from ExternalSecret" + else + echo "$(date): Failed to re-retrieve token from ExternalSecret" + fi + fi + sleep 43200 # 12 hours +done \ No newline at end of file