7.3 KiB
JupyterHub
JupyterHub provides a multi-user Jupyter notebook environment with Keycloak OIDC authentication, Vault integration for secure secrets management, and custom kernel images for data science workflows.
Installation
Install JupyterHub with interactive configuration:
just jupyterhub::install
This will prompt for:
- JupyterHub host (FQDN)
- NFS PV usage (if Longhorn is installed)
- NFS server details (if NFS is enabled)
- Vault integration setup
Prerequisites
- Keycloak must be installed and configured
- For NFS storage: Longhorn must be installed
- For Vault integration: Vault must be installed and configured
Kernel Images
JupyterHub supports multiple kernel image profiles:
Standard Profiles
- minimal: Basic Python environment
- base: Python with common data science packages
- datascience: Full data science stack (default)
- pyspark: PySpark for big data processing
- pytorch: PyTorch for machine learning
- tensorflow: TensorFlow for machine learning
Buun-Stack Profiles
- buun-stack: Comprehensive data science environment with Vault integration
- buun-stack-cuda: CUDA-enabled version with GPU support
Profile Configuration
Enable/disable profiles using environment variables:
# Enable buun-stack profile (CPU version)
export JUPYTER_PROFILE_BUUN_STACK_ENABLED=true
# Enable buun-stack CUDA profile (GPU version)
export JUPYTER_PROFILE_BUUN_STACK_CUDA_ENABLED=true
# Disable default datascience profile
export JUPYTER_PROFILE_DATASCIENCE_ENABLED=false
Available profile variables:
JUPYTER_PROFILE_MINIMAL_ENABLEDJUPYTER_PROFILE_BASE_ENABLEDJUPYTER_PROFILE_DATASCIENCE_ENABLEDJUPYTER_PROFILE_PYSPARK_ENABLEDJUPYTER_PROFILE_PYTORCH_ENABLEDJUPYTER_PROFILE_TENSORFLOW_ENABLEDJUPYTER_PROFILE_BUUN_STACK_ENABLEDJUPYTER_PROFILE_BUUN_STACK_CUDA_ENABLED
Only JUPYTER_PROFILE_DATASCIENCE_ENABLED is true by default.
Buun-Stack Images
Buun-stack images provide comprehensive data science environments with:
- All standard data science packages (NumPy, Pandas, Scikit-learn, etc.)
- Deep learning frameworks (PyTorch, TensorFlow, Keras)
- Big data tools (PySpark, Apache Arrow)
- NLP and ML libraries (LangChain, Transformers, spaCy)
- Database connectors and tools
- Vault integration with
buunstackPython package
Building Custom Images
Build and push buun-stack images to your registry:
# Build images
just jupyterhub::build-kernel-images
# Push to registry
just jupyterhub::push-kernel-images
⚠️ Note: Buun-stack images are comprehensive and large (~13GB). Initial image pulls and deployments take significant time due to the extensive package set.
Image Configuration
Configure image settings in .env.local:
# Image registry
IMAGE_REGISTRY=localhost:30500
# Image tag
JUPYTER_PYTHON_KERNEL_TAG=python-3.12-1
Vault Integration
Overview
Vault integration enables secure secrets management directly from Jupyter notebooks without re-authentication. Users can store and retrieve API keys, database credentials, and other sensitive data securely.
Prerequisites
Vault integration requires:
- Vault server installed and configured
- Keycloak OIDC authentication configured
- Buun-stack kernel images (standard images don't include Vault integration)
Setup
Enable Vault integration during installation:
# Set environment variable before installation or answer yes to prompt during install
export JUPYTERHUB_VAULT_INTEGRATION_ENABLED=true
just jupyterhub::install
Or configure manually:
# Setup Vault JWT authentication for JupyterHub
just jupyterhub::setup-vault-jwt-auth
Usage in Notebooks
With Vault integration enabled, use the buunstack package in notebooks:
from buunstack import SecretStore
# Initialize (uses JupyterHub session authentication)
secrets = SecretStore()
# Store secrets
secrets.put('api-keys',
openai='sk-...',
github='ghp_...',
database_url='postgresql://...')
# Retrieve secrets
api_keys = secrets.get('api-keys')
openai_key = secrets.get('api-keys', field='openai')
# List all secrets
secret_names = secrets.list()
# Delete secrets
secrets.delete('old-api-key')
Security Features
- User isolation: Each user can only access their own secrets
- Automatic token refresh: Background token management prevents authentication failures
- Audit trail: All secret access is logged in Vault
- No re-authentication: Uses existing JupyterHub OIDC session
Storage Options
Default Storage
Uses Kubernetes PersistentVolumes for user home directories.
NFS Storage
For shared storage across nodes, configure NFS:
export JUPYTERHUB_NFS_PV_ENABLED=true
export JUPYTER_NFS_IP=192.168.10.1
export JUPYTER_NFS_PATH=/volume1/drive1/jupyter
NFS storage requires:
- Longhorn storage system installed
- NFS server accessible from cluster nodes
- Proper NFS export permissions configured
Configuration
Environment Variables
Key configuration variables:
# Basic settings
JUPYTERHUB_NAMESPACE=jupyter
JUPYTERHUB_CHART_VERSION=4.2.0
JUPYTERHUB_OIDC_CLIENT_ID=jupyterhub
# Keycloak integration
KEYCLOAK_REALM=buunstack
# Storage
JUPYTERHUB_NFS_PV_ENABLED=false
# Vault integration
JUPYTERHUB_VAULT_INTEGRATION_ENABLED=false
VAULT_ADDR=http://vault.vault.svc:8200
# Image settings
JUPYTER_PYTHON_KERNEL_TAG=python-3.12-6
IMAGE_REGISTRY=localhost:30500
Advanced Configuration
Customize JupyterHub behavior by editing jupyterhub-values.gomplate.yaml template before installation.
Management
Uninstall
just jupyterhub::uninstall
Update
Upgrade to newer versions:
# Update image tag
export JUPYTER_PYTHON_KERNEL_TAG=python-3.12-2
# Rebuild and push images
just jupyterhub::push-kernel-images
# Upgrade JupyterHub deployment
just jupyterhub::install
Troubleshooting
Image Pull Issues
Buun-stack images are large and may timeout:
# Check pod status
kubectl get pods -n jupyter
# Check image pull progress
kubectl describe pod <pod-name> -n jupyter
# Increase timeout if needed
helm upgrade jupyterhub jupyterhub/jupyterhub \
--timeout=30m -f jupyterhub-values.yaml
Vault Integration Issues
Check Vault connectivity and authentication:
# In a notebook
import os
print("Vault Address:", os.getenv('VAULT_ADDR'))
print("Access Token:", bool(os.getenv('JUPYTERHUB_OIDC_ACCESS_TOKEN')))
# Test SecretStore
from buunstack import SecretStore
secrets = SecretStore()
status = secrets.get_status()
print(status)
Authentication Issues
Verify Keycloak client configuration:
# Check client exists
just keycloak::get-client buunstack jupyterhub
# Check redirect URIs
just keycloak::update-client buunstack jupyterhub \
"https://your-jupyter-host/hub/oauth_callback"
Performance Considerations
- Image Size: Buun-stack images are ~13GB, plan storage accordingly
- Pull Time: Initial pulls take 5-15 minutes depending on network
- Resource Usage: Data science workloads require adequate CPU/memory
- Storage: NFS provides better performance for shared datasets
For production deployments, consider:
- Pre-pulling images to all nodes
- Using faster storage backends
- Configuring resource limits per user
- Setting up monitoring and alerts