# JupyterHub JupyterHub provides a multi-user Jupyter notebook environment with Keycloak OIDC authentication, Vault integration for secure secrets management, and custom kernel images for data science workflows. ## Installation Install JupyterHub with interactive configuration: ```bash just jupyterhub::install ``` This will prompt for: - JupyterHub host (FQDN) - NFS PV usage (if Longhorn is installed) - NFS server details (if NFS is enabled) - Vault integration setup ### Prerequisites - Keycloak must be installed and configured - For NFS storage: Longhorn must be installed - For Vault integration: Vault must be installed and configured ## Kernel Images JupyterHub supports multiple kernel image profiles: ### Standard Profiles - **minimal**: Basic Python environment - **base**: Python with common data science packages - **datascience**: Full data science stack (default) - **pyspark**: PySpark for big data processing - **pytorch**: PyTorch for machine learning - **tensorflow**: TensorFlow for machine learning ### Buun-Stack Profiles - **buun-stack**: Comprehensive data science environment with Vault integration - **buun-stack-cuda**: CUDA-enabled version with GPU support ## Profile Configuration Enable/disable profiles using environment variables: ```bash # Enable buun-stack profile (CPU version) export JUPYTER_PROFILE_BUUN_STACK_ENABLED=true # Enable buun-stack CUDA profile (GPU version) export JUPYTER_PROFILE_BUUN_STACK_CUDA_ENABLED=true # Disable default datascience profile export JUPYTER_PROFILE_DATASCIENCE_ENABLED=false ``` Available profile variables: - `JUPYTER_PROFILE_MINIMAL_ENABLED` - `JUPYTER_PROFILE_BASE_ENABLED` - `JUPYTER_PROFILE_DATASCIENCE_ENABLED` - `JUPYTER_PROFILE_PYSPARK_ENABLED` - `JUPYTER_PROFILE_PYTORCH_ENABLED` - `JUPYTER_PROFILE_TENSORFLOW_ENABLED` - `JUPYTER_PROFILE_BUUN_STACK_ENABLED` - `JUPYTER_PROFILE_BUUN_STACK_CUDA_ENABLED` Only `JUPYTER_PROFILE_DATASCIENCE_ENABLED` is true by default. ## Buun-Stack Images Buun-stack images provide comprehensive data science environments with: - All standard data science packages (NumPy, Pandas, Scikit-learn, etc.) - Deep learning frameworks (PyTorch, TensorFlow, Keras) - Big data tools (PySpark, Apache Arrow) - NLP and ML libraries (LangChain, Transformers, spaCy) - Database connectors and tools - **Vault integration** with `buunstack` Python package ### Building Custom Images Build and push buun-stack images to your registry: ```bash # Build images just jupyterhub::build-kernel-images # Push to registry just jupyterhub::push-kernel-images ``` ⚠️ **Note**: Buun-stack images are comprehensive and large (~13GB). Initial image pulls and deployments take significant time due to the extensive package set. ### Image Configuration Configure image settings in `.env.local`: ```bash # Image registry IMAGE_REGISTRY=localhost:30500 # Image tag JUPYTER_PYTHON_KERNEL_TAG=python-3.12-1 ``` ## Vault Integration ### Overview Vault integration enables secure secrets management directly from Jupyter notebooks without re-authentication. Users can store and retrieve API keys, database credentials, and other sensitive data securely. ### Prerequisites Vault integration requires: - Vault server installed and configured - Keycloak OIDC authentication configured - **Buun-stack kernel images** (standard images don't include Vault integration) ### Setup Enable Vault integration during installation: ```bash # Set environment variable before installation or answer yes to prompt during install export JUPYTERHUB_VAULT_INTEGRATION_ENABLED=true just jupyterhub::install ``` Or configure manually: ```bash # Setup Vault JWT authentication for JupyterHub just jupyterhub::setup-vault-jwt-auth ``` ### Usage in Notebooks With Vault integration enabled, use the `buunstack` package in notebooks: ```python from buunstack import SecretStore # Initialize (uses JupyterHub session authentication) secrets = SecretStore() # Store secrets secrets.put('api-keys', openai='sk-...', github='ghp_...', database_url='postgresql://...') # Retrieve secrets api_keys = secrets.get('api-keys') openai_key = secrets.get('api-keys', field='openai') # List all secrets secret_names = secrets.list() # Delete secrets secrets.delete('old-api-key') ``` ### Security Features - **User isolation**: Each user can only access their own secrets - **Automatic token refresh**: Background token management prevents authentication failures - **Audit trail**: All secret access is logged in Vault - **No re-authentication**: Uses existing JupyterHub OIDC session ## Storage Options ### Default Storage Uses Kubernetes PersistentVolumes for user home directories. ### NFS Storage For shared storage across nodes, configure NFS: ```bash export JUPYTERHUB_NFS_PV_ENABLED=true export JUPYTER_NFS_IP=192.168.10.1 export JUPYTER_NFS_PATH=/volume1/drive1/jupyter ``` NFS storage requires: - Longhorn storage system installed - NFS server accessible from cluster nodes - Proper NFS export permissions configured ## Configuration ### Environment Variables Key configuration variables: ```bash # Basic settings JUPYTERHUB_NAMESPACE=jupyter JUPYTERHUB_CHART_VERSION=4.2.0 JUPYTERHUB_OIDC_CLIENT_ID=jupyterhub # Keycloak integration KEYCLOAK_REALM=buunstack # Storage JUPYTERHUB_NFS_PV_ENABLED=false # Vault integration JUPYTERHUB_VAULT_INTEGRATION_ENABLED=false VAULT_ADDR=http://vault.vault.svc:8200 # Image settings JUPYTER_PYTHON_KERNEL_TAG=python-3.12-6 IMAGE_REGISTRY=localhost:30500 ``` ### Advanced Configuration Customize JupyterHub behavior by editing `jupyterhub-values.gomplate.yaml` template before installation. ## Management ### Uninstall ```bash just jupyterhub::uninstall ``` ### Update Upgrade to newer versions: ```bash # Update image tag export JUPYTER_PYTHON_KERNEL_TAG=python-3.12-2 # Rebuild and push images just jupyterhub::push-kernel-images # Upgrade JupyterHub deployment just jupyterhub::install ``` ## Troubleshooting ### Image Pull Issues Buun-stack images are large and may timeout: ```bash # Check pod status kubectl get pods -n jupyter # Check image pull progress kubectl describe pod -n jupyter # Increase timeout if needed helm upgrade jupyterhub jupyterhub/jupyterhub \ --timeout=30m -f jupyterhub-values.yaml ``` ### Vault Integration Issues Check Vault connectivity and authentication: ```python # In a notebook import os print("Vault Address:", os.getenv('VAULT_ADDR')) print("Access Token:", bool(os.getenv('JUPYTERHUB_OIDC_ACCESS_TOKEN'))) # Test SecretStore from buunstack import SecretStore secrets = SecretStore() status = secrets.get_status() print(status) ``` ### Authentication Issues Verify Keycloak client configuration: ```bash # Check client exists just keycloak::get-client buunstack jupyterhub # Check redirect URIs just keycloak::update-client buunstack jupyterhub \ "https://your-jupyter-host/hub/oauth_callback" ``` ## Performance Considerations - **Image Size**: Buun-stack images are ~13GB, plan storage accordingly - **Pull Time**: Initial pulls take 5-15 minutes depending on network - **Resource Usage**: Data science workloads require adequate CPU/memory - **Storage**: NFS provides better performance for shared datasets For production deployments, consider: - Pre-pulling images to all nodes - Using faster storage backends - Configuring resource limits per user - Setting up monitoring and alerts