# JupyterHub JupyterHub provides a multi-user Jupyter notebook environment with Keycloak OIDC authentication, Vault integration for secure secrets management, and custom kernel images for data science workflows. ## Table of Contents - [Installation](#installation) - [Prerequisites](#prerequisites) - [Access](#access) - [MCP Server Integration](#mcp-server-integration) - [Programmatic API Access](#programmatic-api-access) - [Kernel Images](#kernel-images) - [Profile Configuration](#profile-configuration) - [GPU Support](#gpu-support) - [Buun-Stack Images](#buun-stack-images) - [buunstack Package & SecretStore](#buunstack-package--secretstore) - [Vault Integration](#vault-integration) - [Token Renewal Implementation](#token-renewal-implementation) - [Storage Options](#storage-options) - [Configuration](#configuration) - [Custom Container Images](#custom-container-images) - [Management](#management) - [Troubleshooting](#troubleshooting) - [Technical Implementation Details](#technical-implementation-details) - [Performance Considerations](#performance-considerations) - [Known Limitations](#known-limitations) ## Installation Install JupyterHub with interactive configuration: ```bash just jupyterhub::install ``` This will prompt for: - JupyterHub host (FQDN) - NFS PV usage (if Longhorn is installed) - NFS server details (if NFS is enabled) - Vault integration setup (requires root token for initial setup) ## Prerequisites - Keycloak must be installed and configured - For NFS storage: Longhorn must be installed - For Vault integration: Vault and External Secrets Operator must be installed - Helm repository must be accessible ## Access Access JupyterHub at your configured host (e.g., `https://jupyter.example.com`) and authenticate via Keycloak. ## MCP Server Integration JupyterHub includes [jupyter-mcp-server](https://jupyter-mcp-server.datalayer.tech/) as a Jupyter Server Extension, enabling MCP (Model Context Protocol) clients to interact with Jupyter notebooks programmatically. ### Overview The MCP server provides a standardized interface for AI assistants and other MCP clients to: - List and manage files on the Jupyter server - Create, read, and edit notebook cells - Execute code in notebook kernels - Manage kernel sessions ### Enabling MCP Server MCP server support is controlled by the `JUPYTER_MCP_SERVER_ENABLED` environment variable. During installation, you will be prompted: ```bash just jupyterhub::install # "Enable jupyter-mcp-server for Claude Code integration? (y/N)" ``` Or set the environment variable before installation: ```bash JUPYTER_MCP_SERVER_ENABLED=true just jupyterhub::install ``` ### Kernel Image Requirements The MCP server requires jupyter-mcp-server to be installed and enabled in the kernel image. **Buun-Stack profiles** (`buun-stack`, `buun-stack-cuda`) include jupyter-mcp-server pre-installed and enabled. No additional setup is required. **Other profiles** (minimal, base, datascience, pyspark, pytorch, tensorflow) do not include jupyter-mcp-server. To use MCP with these images, install the required packages in your notebook: ```bash pip install 'jupyter-mcp-server==0.21.0' 'jupyter-mcp-tools>=0.1.4' pip uninstall -y pycrdt datalayer_pycrdt pip install 'datalayer_pycrdt==0.12.17' jupyter server extension enable jupyter_mcp_server ``` After installation, restart your Jupyter server for the extension to take effect. ### MCP Endpoint When enabled, each user's Jupyter server exposes an MCP endpoint at: ```text https:///user//mcp ``` ### Authentication MCP clients must authenticate using a JupyterHub API token. Obtain a token using: ```bash # Get token for a user (creates user if not exists) just jupyterhub::get-token ``` The token should be passed in the `Authorization` header: ```text Authorization: token ``` ### Client Configuration #### Generic MCP Client Configuration For any MCP client that supports HTTP transport: ```bash just jupyterhub::setup-mcp-server ``` This displays the MCP server URL, authentication details, and available tools. #### Claude Code Configuration For Claude Code specifically: ```bash just jupyterhub::setup-claude-mcp-server ``` This provides a ready-to-use `.mcp.json` configuration: ```json { "mcpServers": { "jupyter-": { "type": "http", "url": "https:///user//mcp", "headers": { "Authorization": "token ${JUPYTERHUB_TOKEN}" } } } } ``` Set the environment variable: ```bash export JUPYTERHUB_TOKEN= ``` ### Checking MCP Status Verify MCP server status for a user: ```bash just jupyterhub::mcp-status ``` This checks: - User pod is running - jupyter-mcp-server extension is enabled - MCP endpoint is responding ### Technical Details - **Transport**: HTTP (streamable-http) - **Extension**: jupyter-mcp-server (installed in kernel images) - **Environment Variable**: `JUPYTERHUB_ALLOW_TOKEN_IN_URL=1` enables WebSocket token authentication ## Programmatic API Access buun-stack configures JupyterHub to allow programmatic API access, enabling token generation and user management without requiring users to log in first. This is achieved by registering a **Service** in JupyterHub. ### What is a JupyterHub Service? In JupyterHub, a **Service** is a registered entity (external program or script) that can access the JupyterHub API using a pre-configured token. While regular users obtain tokens by logging in, services use tokens registered in the JupyterHub configuration. ```python # Register a service with its API token c.JupyterHub.services = [ { 'name': 'admin-service', 'api_token': '', } ] # Grant permissions to the service c.JupyterHub.load_roles = [ { 'name': 'admin-service-role', 'scopes': ['admin:users', 'tokens', 'admin:servers'], 'services': ['admin-service'], } ] ``` When an API request includes `Authorization: token `, JupyterHub identifies the token owner (in this case, `admin-service`) and applies the corresponding permissions. ### How It Works ```text ┌─────────────────────────────────────────────────────────────────┐ │ just jupyterhub::get-token │ │ │ │ 1. Retrieve service token from Kubernetes Secret │ │ 2. Call JupyterHub API with the token │ └──────────────────────────────┬──────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ JupyterHub API │ │ │ │ 1. Receive: Authorization: token │ │ 2. Identify: This token belongs to "admin-service" │ │ 3. Check permissions: admin-service has admin:users, tokens │ │ 4. Execute: Create user token and return it │ └─────────────────────────────────────────────────────────────────┘ ``` ### Service Configuration The service is automatically configured during JupyterHub installation: 1. **Token Generation**: A random token is generated using `just utils::random-password` 2. **Secret Storage**: Token is stored in Vault (if External Secrets Operator is available) or as a Kubernetes Secret 3. **Service Registration**: JupyterHub is configured with the service and appropriate RBAC roles ### RBAC Permissions The registered service has the following scopes: - `admin:users` - Create, read, update, delete users - `tokens` - Create and manage API tokens - `admin:servers` - Start and stop user servers ### Usage #### Get Token for a User ```bash # Creates user if not exists, returns API token just jupyterhub::get-token ``` This command: 1. Checks if the user exists in JupyterHub 2. Creates the user if not found 3. Generates an API token with appropriate scopes 4. Returns the token for use with MCP or other API clients #### Manual Token Management The service token is stored in: - **Vault path**: `secret/jupyterhub/admin-service` (key: `token`) - **Kubernetes Secret**: `jupyterhub-admin-service-token` in the JupyterHub namespace To recreate the service token: ```bash just jupyterhub::create-admin-service-token-secret ``` ### Security Considerations - The service token has elevated privileges; protect it accordingly - Tokens are stored encrypted in Vault when External Secrets Operator is available - User tokens generated via the service have limited scopes (`access:servers!user=`, `self`) ## Kernel Images ### Important Note Building and using custom buun-stack images requires building the `buunstack` Python package first. The package wheel file will be included in the Docker image during build. JupyterHub supports multiple kernel image profiles: ### Standard Profiles - **minimal**: Basic Python environment - **base**: Python with common data science packages - **datascience**: Full data science stack (default) - **pyspark**: PySpark for big data processing - **pytorch**: PyTorch for machine learning - **tensorflow**: TensorFlow for machine learning ### Buun-Stack Profiles - **buun-stack**: Comprehensive data science environment with Vault integration - **buun-stack-cuda**: CUDA-enabled version with GPU support ## Profile Configuration Enable/disable profiles using environment variables: ```bash # Enable buun-stack profile (CPU version) JUPYTER_PROFILE_BUUN_STACK_ENABLED=true # Enable buun-stack CUDA profile (GPU version) JUPYTER_PROFILE_BUUN_STACK_CUDA_ENABLED=true # Disable default datascience profile JUPYTER_PROFILE_DATASCIENCE_ENABLED=false ``` Available profile variables: - `JUPYTER_PROFILE_MINIMAL_ENABLED` - `JUPYTER_PROFILE_BASE_ENABLED` - `JUPYTER_PROFILE_DATASCIENCE_ENABLED` - `JUPYTER_PROFILE_PYSPARK_ENABLED` - `JUPYTER_PROFILE_PYTORCH_ENABLED` - `JUPYTER_PROFILE_TENSORFLOW_ENABLED` - `JUPYTER_PROFILE_BUUN_STACK_ENABLED` - `JUPYTER_PROFILE_BUUN_STACK_CUDA_ENABLED` Only `JUPYTER_PROFILE_DATASCIENCE_ENABLED` is true by default. ## GPU Support JupyterHub supports GPU-accelerated notebooks using NVIDIA GPUs. GPU support is automatically enabled during installation if the nvidia-device-plugin is detected. ### GPU Prerequisites GPU support requires the following components to be installed: #### NVIDIA Device Plugin Install the NVIDIA device plugin for Kubernetes: ```bash just nvidia-device-plugin::install ``` This plugin: - Exposes NVIDIA GPUs to Kubernetes as schedulable resources - Manages GPU allocation to pods - Ensures proper GPU driver access within containers #### RuntimeClass Configuration The nvidia-device-plugin installation automatically creates the `nvidia` RuntimeClass, which: - Configures containerd to use the NVIDIA container runtime - Enables GPU access for containers using `runtimeClassName: nvidia` ### Enabling GPU Support During JupyterHub installation, you will be prompted: ```bash just jupyterhub::install # When nvidia-device-plugin is installed, you'll see: # "Enable GPU support for JupyterHub notebooks? (y/N)" ``` Alternatively, set the environment variable before installation: ```bash JUPYTERHUB_GPU_ENABLED=true JUPYTERHUB_GPU_LIMIT=1 # Number of GPUs per user (default: 1) ``` ### GPU-Enabled Profiles When GPU support is enabled: 1. **All notebook profiles** get GPU access via `runtimeClassName: nvidia` 2. **CUDA-specific profile** (buun-stack-cuda) additionally includes: - CUDA 12.x toolkit - PyTorch with CUDA support - GPU-optimized libraries ### Usage #### Selecting a GPU Profile When spawning a notebook, select a profile with GPU capabilities: - **Buun-stack with CUDA**: Recommended for GPU workloads (requires custom image) - **PyTorch**: Standard PyTorch notebook - **TensorFlow**: Standard TensorFlow notebook #### Verifying GPU Access In your notebook, verify GPU availability: ```python import torch # Check if CUDA is available print(f"CUDA available: {torch.cuda.is_available()}") # Get GPU device count print(f"GPU count: {torch.cuda.device_count()}") # Get GPU device name if torch.cuda.is_available(): print(f"GPU name: {torch.cuda.get_device_name(0)}") # Test GPU operation torch.cuda.synchronize() print("GPU is working correctly!") ``` #### GPU Configuration Default GPU configuration: - **GPU limit per user**: 1 GPU (configurable via `JUPYTERHUB_GPU_LIMIT`) - **Memory requests**: 1Gi (defined in singleuser settings) - **RuntimeClass**: `nvidia` (automatically applied when GPU enabled) ### Building GPU-Enabled Custom Images If using the buun-stack-cuda profile, build and push the CUDA-enabled image: ```bash # Enable CUDA profile export JUPYTER_PROFILE_BUUN_STACK_CUDA_ENABLED=true # Build CUDA-enabled image (includes PyTorch with CUDA 12.x) just jupyterhub::build-kernel-images # Push to registry just jupyterhub::push-kernel-images ``` The CUDA image: - Based on `quay.io/jupyter/pytorch-notebook:x86_64-cuda12-python-3.12.10` - Includes PyTorch with CUDA 12.4 support (`cu124`) - Contains all standard buun-stack packages - Supports GPU-accelerated deep learning ### Troubleshooting GPU Issues #### Pod Not Scheduling If GPU-enabled pods fail to schedule: ```bash # Check if nvidia-device-plugin is running kubectl get pods -n nvidia-device-plugin # Verify GPU resources are advertised kubectl describe nodes | grep nvidia.com/gpu # Check RuntimeClass exists kubectl get runtimeclass nvidia ``` #### CUDA Not Available If `torch.cuda.is_available()` returns `False`: 1. Verify the image has CUDA support: ```bash # In notebook !nvcc --version # Should show CUDA compiler version ``` 2. Check Pod uses nvidia RuntimeClass: ```bash kubectl get pod -n datastack -o yaml | grep runtimeClassName ``` 3. Rebuild image if using custom buun-stack-cuda image #### GPU Memory Issues Monitor GPU usage: ```python import torch # Check GPU memory if torch.cuda.is_available(): print(f"Allocated: {torch.cuda.memory_allocated(0) / 1024**3:.2f} GB") print(f"Reserved: {torch.cuda.memory_reserved(0) / 1024**3:.2f} GB") # Clear cache if needed torch.cuda.empty_cache() ``` ## Buun-Stack Images Buun-stack images provide comprehensive data science environments with: - All standard data science packages (NumPy, Pandas, Scikit-learn, etc.) - Deep learning frameworks (PyTorch, TensorFlow, Keras) - Big data tools (PySpark, Apache Arrow) - NLP and ML libraries (LangChain, Transformers, spaCy) - Database connectors and tools - **Vault integration** with `buunstack` Python package ### Building Custom Images Build and push buun-stack images to your registry: ```bash # Build images (includes building the buunstack Python package) just jupyterhub::build-kernel-images # Push to registry just jupyterhub::push-kernel-images ``` The build process: 1. Builds the `buunstack` Python package wheel 2. Copies the wheel into the Docker build context 3. Installs the wheel in the Docker image 4. Cleans up temporary files ⚠️ **Note**: Buun-stack images are comprehensive and large (~13GB). Initial image pulls and deployments take significant time due to the extensive package set. ### Image Configuration Configure image settings in `.env.local`: ```bash # Image registry IMAGE_REGISTRY=localhost:30500 # Image tag (current default) JUPYTER_PYTHON_KERNEL_TAG=python-3.12-28 ``` ## buunstack Package & SecretStore JupyterHub includes the **buunstack** Python package, which provides seamless integration with HashiCorp Vault for secure secrets management in your notebooks. ### Key Features - 🔒 **Secure Secrets Management**: Store and retrieve secrets securely using HashiCorp Vault - 🚀 **Pre-acquired Authentication**: Uses Vault tokens created automatically at notebook spawn - 📱 **Simple API**: Easy-to-use interface similar to Google Colab's `userdata.get()` - 🔄 **Automatic Token Renewal**: Built-in token refresh for long-running sessions ### Quick Example ```python from buunstack import SecretStore # Initialize with pre-acquired Vault token (automatic) secrets = SecretStore() # Store secrets secrets.put('api-keys', openai_key='sk-your-key-here', github_token='ghp_your-token', database_url='postgresql://user:pass@host:5432/db' ) # Retrieve secrets api_keys = secrets.get('api-keys') openai_key = api_keys['openai_key'] # Or get a specific field directly openai_key = secrets.get('api-keys', field='openai_key') ``` ### Learn More For detailed documentation, usage examples, and API reference, see: [📖 buunstack Package Documentation](../python-package/README.md) ## Vault Integration ### Overview Vault integration enables secure secrets management directly from Jupyter notebooks. The system uses: - **ExternalSecret** to fetch the admin token from Vault - **Renewable tokens** with unlimited Max TTL to avoid 30-day system limitations - **Token renewal script** that automatically renews tokens at TTL/2 intervals (minimum 30 seconds) - **User-specific tokens** created during notebook spawn with isolated access ### Architecture ```plain ┌────────────────────────────────────────────────────────────────┐ │ JupyterHub Hub Pod │ │ │ │ ┌──────────────┐ ┌────────────────┐ ┌────────────────────┐ │ │ │ Hub │ │ Token Renewer │ │ ExternalSecret │ │ │ │ Container │◄─┤ Sidecar │◄─┤ (mounted as │ │ │ │ │ │ │ │ Secret) │ │ │ └──────────────┘ └────────────────┘ └────────────────────┘ │ │ │ │ ▲ │ │ │ │ │ │ │ ▼ ▼ │ │ │ ┌──────────────────────────────────┐ │ │ │ │ /vault/secrets/vault-token │ │ │ │ │ (Admin token for user creation) │ │ │ │ └──────────────────────────────────┘ │ │ └────────────────────────────────────────────────────┼───────────┘ │ ┌───────────▼──────────┐ │ Vault │ │ secret/jupyterhub/ │ │ vault-token │ └──────────────────────┘ ``` ### Vault Integration Prerequisites Vault integration requires: - Vault server installed and configured - External Secrets Operator installed - ClusterSecretStore configured for Vault - Buun-stack kernel images (standard images don't include Vault integration) ### Setup Vault integration is configured during JupyterHub installation: ```bash just jupyterhub::install # Answer "yes" when prompted about Vault integration # Provide Vault root token when prompted ``` The setup process: 1. Creates `jupyterhub-admin` policy with necessary permissions including `sudo` for orphan token creation 2. Creates renewable admin token with 24h TTL and unlimited Max TTL 3. Stores token in Vault at `secret/jupyterhub/vault-token` 4. Creates ExternalSecret to fetch token from Vault 5. Deploys token renewal sidecar for automatic renewal ### Usage in Notebooks With Vault integration enabled, use the `buunstack` package in notebooks: ```python from buunstack import SecretStore # Initialize (uses pre-acquired user-specific token) secrets = SecretStore() # Store secrets secrets.put('api-keys', openai='sk-...', github='ghp_...', database_url='postgresql://...') # Retrieve secrets api_keys = secrets.get('api-keys') openai_key = secrets.get('api-keys', field='openai') # List all secrets secret_names = secrets.list() # Delete secrets or specific fields secrets.delete('old-api-key') # Delete entire secret secrets.delete('api-keys', field='github') # Delete only github field ``` ### Security Features - **User isolation**: Each user receives an orphan token with access only to their namespace - **Automatic renewal**: Token renewal script renews admin token at TTL/2 intervals (minimum 30 seconds) - **ExternalSecret integration**: Admin token fetched securely from Vault - **Orphan tokens**: User tokens are orphan tokens, not limited by parent policy restrictions - **Audit trail**: All secret access is logged in Vault ### Token Management #### Admin Token The admin token is managed through: 1. **Creation**: `just jupyterhub::create-jupyterhub-vault-token` creates renewable token 2. **Storage**: Stored in Vault at `secret/jupyterhub/vault-token` 3. **Retrieval**: ExternalSecret fetches and mounts as Kubernetes Secret 4. **Renewal**: `vault-token-renewer.sh` script renews at TTL/2 intervals #### User Tokens User tokens are created dynamically: 1. **Pre-spawn hook** reads admin token from `/vault/secrets/vault-token` 2. **Creates user policy** `jupyter-user-{username}` with restricted access 3. **Creates orphan token** with user policy (requires `sudo` permission) 4. **Sets environment variable** `NOTEBOOK_VAULT_TOKEN` in notebook container ## Token Renewal Implementation ### Admin Token Renewal The admin token renewal is handled by a sidecar container (`vault-token-renewer`) running alongside the JupyterHub hub: **Implementation Details:** 1. **Renewal Script**: `/vault/config/vault-token-renewer.sh` - Runs in the `vault-token-renewer` sidecar container - Uses Vault 1.17.5 image with HashiCorp Vault CLI 2. **Environment-Based TTL Configuration**: ```bash # Reads TTL from environment variable (set in .env.local) TTL_RAW="${JUPYTERHUB_VAULT_TOKEN_TTL}" # e.g., "5m", "24h" # Converts to seconds and calculates renewal interval RENEWAL_INTERVAL=$((TTL_SECONDS / 2)) # TTL/2 with minimum 30s ``` 3. **Token Source**: ExternalSecret → Kubernetes Secret → mounted file ```bash # Token retrieved from ExternalSecret-managed mount ADMIN_TOKEN=$(cat /vault/admin-token/token) ``` 4. **Renewal Loop**: ```bash while true; do vault token renew >/dev/null 2>&1 sleep $RENEWAL_INTERVAL done ``` 5. **Error Handling**: If renewal fails, re-retrieves token from ExternalSecret mount **Key Files:** - `vault-token-renewer.sh`: Main renewal script - `jupyterhub-vault-token-external-secret.gomplate.yaml`: ExternalSecret configuration - `vault-token-renewer-config` ConfigMap: Contains the renewal script ### User Token Renewal User token renewal is handled within the notebook environment by the `buunstack` Python package: **Implementation Details:** 1. **Token Source**: Environment variable set by pre-spawn hook ```python # In pre_spawn_hook.gomplate.py spawner.environment["NOTEBOOK_VAULT_TOKEN"] = user_vault_token ``` 2. **Automatic Renewal**: Built into `SecretStore` class operations ```python # In buunstack/secrets.py def _ensure_authenticated(self): token_info = self.client.auth.token.lookup_self() ttl = token_info.get("data", {}).get("ttl", 0) renewable = token_info.get("data", {}).get("renewable", False) # Renew if TTL < 10 minutes and renewable if renewable and ttl > 0 and ttl < 600: self.client.auth.token.renew_self() ``` 3. **Renewal Trigger**: Every `SecretStore` operation (get, put, delete, list) - Checks token validity before operation - Automatically renews if TTL < 10 minutes - Transparent to user code 4. **Token Configuration** (set during creation): - **TTL**: `NOTEBOOK_VAULT_TOKEN_TTL` (default: 24h = 1 day) - **Max TTL**: `NOTEBOOK_VAULT_TOKEN_MAX_TTL` (default: 168h = 7 days) - **Policy**: User-specific `jupyter-user-{username}` - **Type**: Orphan token (independent of parent token lifecycle) 5. **Expiry Handling**: When token reaches Max TTL: - Cannot be renewed further - User must restart notebook server (triggers new token creation) - Prevented by `JUPYTERHUB_CULL_MAX_AGE` setting (6 days < 7 day Max TTL) **Key Files:** - `pre_spawn_hook.gomplate.py`: User token creation logic - `buunstack/secrets.py`: Token renewal implementation - `user_policy.hcl`: User token permissions template ### Token Lifecycle Summary ```plain ┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐ │ Admin Token │ │ User Token │ │ Pod Lifecycle │ │ │ │ │ │ │ │ Created: Manual │ │ Created: Spawn │ │ Max Age: 7 days │ │ TTL: 5m-24h │ │ TTL: 1 day │ │ Auto-restart │ │ Max TTL: ∞ │ │ Max TTL: 7 days │ │ at Max TTL │ │ Renewal: Auto │ │ Renewal: Auto │ │ │ │ Interval: TTL/2 │ │ Trigger: Usage │ │ │ └─────────────────┘ └──────────────────┘ └─────────────────┘ │ │ │ ▼ ▼ ▼ vault-token-renewer buunstack.py cull.maxAge sidecar SecretStore pod restart ``` ## Storage Options ### Default Storage Uses Kubernetes PersistentVolumes for user home directories. ### NFS Storage For shared storage across nodes, configure NFS: ```bash JUPYTERHUB_NFS_PV_ENABLED=true JUPYTER_NFS_IP=192.168.10.1 JUPYTER_NFS_PATH=/volume1/drive1/jupyter ``` NFS storage requires: - Longhorn storage system installed - NFS server accessible from cluster nodes - Proper NFS export permissions configured ## Configuration ### Environment Variables Key configuration variables: ```bash # Basic settings JUPYTERHUB_NAMESPACE=jupyter JUPYTERHUB_CHART_VERSION=4.2.0 JUPYTERHUB_OIDC_CLIENT_ID=jupyterhub # Keycloak integration KEYCLOAK_REALM=buunstack # Storage JUPYTERHUB_NFS_PV_ENABLED=false # Vault integration JUPYTERHUB_VAULT_INTEGRATION_ENABLED=false VAULT_ADDR=https://vault.example.com # Image settings JUPYTER_PYTHON_KERNEL_TAG=python-3.12-28 IMAGE_REGISTRY=localhost:30500 # Vault token TTL settings JUPYTERHUB_VAULT_TOKEN_TTL=24h # Admin token: renewed at TTL/2 intervals NOTEBOOK_VAULT_TOKEN_TTL=24h # User token: 1 day (renewed on usage) NOTEBOOK_VAULT_TOKEN_MAX_TTL=168h # User token: 7 days max # Server pod lifecycle settings JUPYTERHUB_CULL_MAX_AGE=604800 # Max pod age in seconds (7 days = 604800s) # Should be <= NOTEBOOK_VAULT_TOKEN_MAX_TTL # Logging JUPYTER_BUUNSTACK_LOG_LEVEL=warning # Options: debug, info, warning, error ``` ### Advanced Configuration Customize JupyterHub behavior by editing `jupyterhub-values.gomplate.yaml` template before installation. ## Custom Container Images JupyterHub uses custom container images with pre-installed data science tools and integrations: ### datastack-notebook (CPU) Standard notebook image based on `jupyter/pytorch-notebook`: - **PyTorch**: Deep learning framework - **PySpark**: Apache Spark integration for big data processing - **ClickHouse Client**: Direct database access - **Python 3.12**: Latest Python runtime [📖 See Image Documentation](./images/datastack-notebook/README.md) ### datastack-cuda-notebook (GPU) GPU-enabled notebook image based on `jupyter/pytorch-notebook:cuda12`: - **CUDA 12**: GPU acceleration support - **PyTorch with GPU**: Hardware-accelerated deep learning - **PySpark**: Apache Spark integration - **ClickHouse Client**: Direct database access - **Python 3.12**: Latest Python runtime [📖 See Image Documentation](./images/datastack-cuda-notebook/README.md) Both images are based on the official [Jupyter Docker Stacks](https://github.com/jupyter/docker-stacks) and include all standard data science libraries (NumPy, pandas, scikit-learn, matplotlib, etc.). ## Management ### Uninstall ```bash just jupyterhub::uninstall ``` This removes: - JupyterHub deployment - User pods - PVCs - ExternalSecret ### Update Upgrade to newer versions: ```bash # Update image tag in .env.local export JUPYTER_PYTHON_KERNEL_TAG=python-3.12-29 # Rebuild and push images just jupyterhub::build-kernel-images just jupyterhub::push-kernel-images # Upgrade JupyterHub deployment just jupyterhub::install ``` ### Manual Token Refresh If needed, manually refresh the admin token: ```bash # Create new renewable token just jupyterhub::create-jupyterhub-vault-token # Restart JupyterHub to pick up new token kubectl rollout restart deployment/hub -n jupyter ``` ## Troubleshooting ### Image Pull Issues Buun-stack images are large and may timeout: ```bash # Check pod status kubectl get pods -n jupyter # Check image pull progress kubectl describe pod -n jupyter # Increase timeout if needed helm upgrade jupyterhub jupyterhub/jupyterhub --timeout=30m -f jupyterhub-values.yaml ``` ### Vault Integration Issues Check token and authentication: ```bash # Check ExternalSecret status kubectl get externalsecret -n jupyter jupyterhub-vault-token # Check if Secret was created kubectl get secret -n jupyter jupyterhub-vault-token # Check token renewal logs kubectl logs -n jupyter -l app.kubernetes.io/component=hub -c vault-token-renewer # In a notebook, verify environment %env NOTEBOOK_VAULT_TOKEN ``` Common issues: 1. **"child policies must be subset of parent"**: Admin policy needs `sudo` permission for orphan tokens 2. **Token not found**: Check ExternalSecret and ClusterSecretStore configuration 3. **Permission denied**: Verify `jupyterhub-admin` policy has all required permissions ### Authentication Issues Verify Keycloak client configuration: ```bash # Check client exists just keycloak::get-client buunstack jupyterhub # Check redirect URIs just keycloak::update-client buunstack jupyterhub \ "https://your-jupyter-host/hub/oauth_callback" ``` ## Technical Implementation Details ### Helm Chart Version JupyterHub uses the official Zero to JupyterHub (Z2JH) Helm chart: - Chart: `jupyterhub/jupyterhub` - Version: `4.2.0` (configurable via `JUPYTERHUB_CHART_VERSION`) - Documentation: https://z2jh.jupyter.org/ ### Token System Architecture The system uses a three-tier token approach: 1. **Renewable Admin Token**: - Created with `explicit-max-ttl=0` (unlimited Max TTL) - Renewed automatically at TTL/2 intervals (minimum 30 seconds) - Stored in Vault and fetched via ExternalSecret 2. **Orphan User Tokens**: - Created with `create_orphan()` API call - Not limited by parent token policies - Individual TTL and Max TTL settings 3. **Token Renewal Script**: - Runs as sidecar container - Reads token from ExternalSecret mount - Handles renewal and re-retrieval on failure ### Key Files - `jupyterhub-admin-policy.hcl`: Vault policy with admin permissions - `user_policy.hcl`: Template for user-specific policies - `vault-token-renewer.sh`: Token renewal script - `jupyterhub-vault-token-external-secret.gomplate.yaml`: ExternalSecret configuration ## Performance Considerations - **Image Size**: Buun-stack images are ~13GB, plan storage accordingly - **Pull Time**: Initial pulls take 5-15 minutes depending on network - **Resource Usage**: Data science workloads require adequate CPU/memory - **Token Renewal**: Minimal overhead (renewal at TTL/2 intervals) For production deployments, consider: - Pre-pulling images to all nodes - Using faster storage backends - Configuring resource limits per user - Setting up monitoring and alerts ## Known Limitations 1. **Annual Token Recreation**: While tokens have unlimited Max TTL, best practice suggests recreating them annually 2. **Token Expiry and Pod Lifecycle**: User tokens have a TTL of 1 day (`NOTEBOOK_VAULT_TOKEN_TTL=24h`) and maximum TTL of 7 days (`NOTEBOOK_VAULT_TOKEN_MAX_TTL=168h`). Daily usage extends the token for another day, allowing up to 7 days of continuous use. Server pods are automatically restarted after 7 days (`JUPYTERHUB_CULL_MAX_AGE=604800s`) to refresh tokens. 3. **Cull Settings**: Server idle timeout is set to 2 hours by default. Adjust `cull.timeout` and `cull.every` in the Helm values for different requirements 4. **NFS Storage**: When using NFS storage, ensure proper permissions are set on the NFS server. The default `JUPYTER_FSGID` is 100 5. **ExternalSecret Dependency**: Requires External Secrets Operator to be installed and configured