14 KiB
JupyterHub
JupyterHub provides a multi-user Jupyter notebook environment with Keycloak OIDC authentication, Vault integration for secure secrets management, and custom kernel images for data science workflows.
Installation
Install JupyterHub with interactive configuration:
just jupyterhub::install
This will prompt for:
- JupyterHub host (FQDN)
- NFS PV usage (if Longhorn is installed)
- NFS server details (if NFS is enabled)
- Vault integration setup (requires root token for initial setup)
Prerequisites
- Keycloak must be installed and configured
- For NFS storage: Longhorn must be installed
- For Vault integration: Vault and External Secrets Operator must be installed
- Helm repository must be accessible
Kernel Images
Important Note
Building and using custom buun-stack images requires building the buunstack Python package first. The package wheel file will be included in the Docker image during build.
JupyterHub supports multiple kernel image profiles:
Standard Profiles
- minimal: Basic Python environment
- base: Python with common data science packages
- datascience: Full data science stack (default)
- pyspark: PySpark for big data processing
- pytorch: PyTorch for machine learning
- tensorflow: TensorFlow for machine learning
Buun-Stack Profiles
- buun-stack: Comprehensive data science environment with Vault integration
- buun-stack-cuda: CUDA-enabled version with GPU support
Profile Configuration
Enable/disable profiles using environment variables:
# Enable buun-stack profile (CPU version)
JUPYTER_PROFILE_BUUN_STACK_ENABLED=true
# Enable buun-stack CUDA profile (GPU version)
JUPYTER_PROFILE_BUUN_STACK_CUDA_ENABLED=true
# Disable default datascience profile
JUPYTER_PROFILE_DATASCIENCE_ENABLED=false
Available profile variables:
JUPYTER_PROFILE_MINIMAL_ENABLEDJUPYTER_PROFILE_BASE_ENABLEDJUPYTER_PROFILE_DATASCIENCE_ENABLEDJUPYTER_PROFILE_PYSPARK_ENABLEDJUPYTER_PROFILE_PYTORCH_ENABLEDJUPYTER_PROFILE_TENSORFLOW_ENABLEDJUPYTER_PROFILE_BUUN_STACK_ENABLEDJUPYTER_PROFILE_BUUN_STACK_CUDA_ENABLED
Only JUPYTER_PROFILE_DATASCIENCE_ENABLED is true by default.
Buun-Stack Images
Buun-stack images provide comprehensive data science environments with:
- All standard data science packages (NumPy, Pandas, Scikit-learn, etc.)
- Deep learning frameworks (PyTorch, TensorFlow, Keras)
- Big data tools (PySpark, Apache Arrow)
- NLP and ML libraries (LangChain, Transformers, spaCy)
- Database connectors and tools
- Vault integration with
buunstackPython package
Building Custom Images
Build and push buun-stack images to your registry:
# Build images (includes building the buunstack Python package)
just jupyterhub::build-kernel-images
# Push to registry
just jupyterhub::push-kernel-images
The build process:
- Builds the
buunstackPython package wheel - Copies the wheel into the Docker build context
- Installs the wheel in the Docker image
- Cleans up temporary files
⚠️ Note: Buun-stack images are comprehensive and large (~13GB). Initial image pulls and deployments take significant time due to the extensive package set.
Image Configuration
Configure image settings in .env.local:
# Image registry
IMAGE_REGISTRY=localhost:30500
# Image tag (current default)
JUPYTER_PYTHON_KERNEL_TAG=python-3.12-28
Vault Integration
Overview
Vault integration enables secure secrets management directly from Jupyter notebooks. The system uses:
- ExternalSecret to fetch the admin token from Vault
- Renewable tokens with unlimited Max TTL to avoid 30-day system limitations
- Token renewal script that automatically renews tokens every 12 hours
- User-specific tokens created during notebook spawn with isolated access
Architecture
┌──────────────────────────────────────────────────────────────────┐
│ JupyterHub Hub Pod │
│ │
│ ┌──────────────┐ ┌────────────────┐ ┌────────────────────┐ │
│ │ Hub │ │ Token Renewer │ │ ExternalSecret │ │
│ │ Container │◄─┤ Sidecar │◄─┤ (mounted as │ │
│ │ │ │ │ │ Secret) │ │
│ └──────────────┘ └────────────────┘ └────────────────────┘ │
│ │ │ ▲ │
│ │ │ │ │
│ ▼ ▼ │ │
│ ┌──────────────────────────────────┐ │ │
│ │ /vault/secrets/vault-token │ │ │
│ │ (Admin token for user creation) │ │ │
│ └──────────────────────────────────┘ │ │
└────────────────────────────────────────────────────┼────────────┘
│
┌───────────▼──────────┐
│ Vault │
│ secret/jupyterhub/ │
│ vault-token │
└──────────────────────┘
Prerequisites
Vault integration requires:
- Vault server installed and configured
- External Secrets Operator installed
- ClusterSecretStore configured for Vault
- Buun-stack kernel images (standard images don't include Vault integration)
Setup
Vault integration is configured during JupyterHub installation:
just jupyterhub::install
# Answer "yes" when prompted about Vault integration
# Provide Vault root token when prompted
The setup process:
- Creates
jupyterhub-adminpolicy with necessary permissions includingsudofor orphan token creation - Creates renewable admin token with 24h TTL and unlimited Max TTL
- Stores token in Vault at
secret/jupyterhub/vault-token - Creates ExternalSecret to fetch token from Vault
- Deploys token renewal sidecar for automatic renewal
Usage in Notebooks
With Vault integration enabled, use the buunstack package in notebooks:
from buunstack import SecretStore
# Initialize (uses pre-acquired user-specific token)
secrets = SecretStore()
# Store secrets
secrets.put('api-keys',
openai='sk-...',
github='ghp_...',
database_url='postgresql://...')
# Retrieve secrets
api_keys = secrets.get('api-keys')
openai_key = secrets.get('api-keys', field='openai')
# List all secrets
secret_names = secrets.list()
# Delete secrets or specific fields
secrets.delete('old-api-key') # Delete entire secret
secrets.delete('api-keys', field='github') # Delete only github field
Security Features
- User isolation: Each user receives an orphan token with access only to their namespace
- Automatic renewal: Token renewal script renews admin token every 12 hours
- ExternalSecret integration: Admin token fetched securely from Vault
- Orphan tokens: User tokens are orphan tokens, not limited by parent policy restrictions
- Audit trail: All secret access is logged in Vault
Token Management
Admin Token
The admin token is managed through:
- Creation:
just jupyterhub::create-jupyterhub-vault-tokencreates renewable token - Storage: Stored in Vault at
secret/jupyterhub/vault-token - Retrieval: ExternalSecret fetches and mounts as Kubernetes Secret
- Renewal:
vault-token-renewer.shscript renews every 12 hours
User Tokens
User tokens are created dynamically:
- Pre-spawn hook reads admin token from
/vault/secrets/vault-token - Creates user policy
jupyter-user-{username}with restricted access - Creates orphan token with user policy (requires
sudopermission) - Sets environment variable
NOTEBOOK_VAULT_TOKENin notebook container
Storage Options
Default Storage
Uses Kubernetes PersistentVolumes for user home directories.
NFS Storage
For shared storage across nodes, configure NFS:
JUPYTERHUB_NFS_PV_ENABLED=true
JUPYTER_NFS_IP=192.168.10.1
JUPYTER_NFS_PATH=/volume1/drive1/jupyter
NFS storage requires:
- Longhorn storage system installed
- NFS server accessible from cluster nodes
- Proper NFS export permissions configured
Configuration
Environment Variables
Key configuration variables:
# Basic settings
JUPYTERHUB_NAMESPACE=jupyter
JUPYTERHUB_CHART_VERSION=4.2.0
JUPYTERHUB_OIDC_CLIENT_ID=jupyterhub
# Keycloak integration
KEYCLOAK_REALM=buunstack
# Storage
JUPYTERHUB_NFS_PV_ENABLED=false
# Vault integration
JUPYTERHUB_VAULT_INTEGRATION_ENABLED=false
VAULT_ADDR=https://vault.example.com
# Image settings
JUPYTER_PYTHON_KERNEL_TAG=python-3.12-28
IMAGE_REGISTRY=localhost:30500
# Vault token TTL settings
JUPYTERHUB_VAULT_TOKEN_TTL=24h # Admin token: renewed every 12h
NOTEBOOK_VAULT_TOKEN_TTL=24h # User token: 1 day
NOTEBOOK_VAULT_TOKEN_MAX_TTL=168h # User token: 7 days max
# Logging
JUPYTER_BUUNSTACK_LOG_LEVEL=warning # Options: debug, info, warning, error
Advanced Configuration
Customize JupyterHub behavior by editing jupyterhub-values.gomplate.yaml template before installation.
Management
Uninstall
just jupyterhub::uninstall
This removes:
- JupyterHub deployment
- User pods
- PVCs
- ExternalSecret
Update
Upgrade to newer versions:
# Update image tag in .env.local
export JUPYTER_PYTHON_KERNEL_TAG=python-3.12-29
# Rebuild and push images
just jupyterhub::build-kernel-images
just jupyterhub::push-kernel-images
# Upgrade JupyterHub deployment
just jupyterhub::install
Manual Token Refresh
If needed, manually refresh the admin token:
# Create new renewable token
just jupyterhub::create-jupyterhub-vault-token
# Restart JupyterHub to pick up new token
kubectl rollout restart deployment/hub -n jupyter
Troubleshooting
Image Pull Issues
Buun-stack images are large and may timeout:
# Check pod status
kubectl get pods -n jupyter
# Check image pull progress
kubectl describe pod <pod-name> -n jupyter
# Increase timeout if needed
helm upgrade jupyterhub jupyterhub/jupyterhub --timeout=30m -f jupyterhub-values.yaml
Vault Integration Issues
Check token and authentication:
# Check ExternalSecret status
kubectl get externalsecret -n jupyter jupyterhub-vault-token
# Check if Secret was created
kubectl get secret -n jupyter jupyterhub-vault-token
# Check token renewal logs
kubectl logs -n jupyter -l app.kubernetes.io/component=hub -c vault-agent
# In a notebook, verify environment
%env NOTEBOOK_VAULT_TOKEN
Common issues:
- "child policies must be subset of parent": Admin policy needs
sudopermission for orphan tokens - Token not found: Check ExternalSecret and ClusterSecretStore configuration
- Permission denied: Verify
jupyterhub-adminpolicy has all required permissions
Authentication Issues
Verify Keycloak client configuration:
# Check client exists
just keycloak::get-client buunstack jupyterhub
# Check redirect URIs
just keycloak::update-client buunstack jupyterhub \
"https://your-jupyter-host/hub/oauth_callback"
Technical Implementation Details
Helm Chart Version
JupyterHub uses the official Zero to JupyterHub (Z2JH) Helm chart:
- Chart:
jupyterhub/jupyterhub - Version:
4.2.0(configurable viaJUPYTERHUB_CHART_VERSION) - Documentation: https://z2jh.jupyter.org/
Token System Architecture
The system uses a three-tier token approach:
-
Renewable Admin Token:
- Created with
explicit-max-ttl=0(unlimited Max TTL) - Renewed automatically every 12 hours
- Stored in Vault and fetched via ExternalSecret
- Created with
-
Orphan User Tokens:
- Created with
create_orphan()API call - Not limited by parent token policies
- Individual TTL and Max TTL settings
- Created with
-
Token Renewal Script:
- Runs as sidecar container
- Reads token from ExternalSecret mount
- Handles renewal and re-retrieval on failure
Key Files
jupyterhub-admin-policy.hcl: Vault policy with admin permissionsuser_policy.hcl: Template for user-specific policiesvault-token-renewer.sh: Token renewal scriptjupyterhub-vault-token-external-secret.gomplate.yaml: ExternalSecret configuration
Performance Considerations
- Image Size: Buun-stack images are ~13GB, plan storage accordingly
- Pull Time: Initial pulls take 5-15 minutes depending on network
- Resource Usage: Data science workloads require adequate CPU/memory
- Token Renewal: Minimal overhead (renewal every 12 hours)
For production deployments, consider:
- Pre-pulling images to all nodes
- Using faster storage backends
- Configuring resource limits per user
- Setting up monitoring and alerts
Known Limitations
-
Annual Token Recreation: While tokens have unlimited Max TTL, best practice suggests recreating them annually
-
Cull Settings: Server idle timeout is set to 2 hours by default. Adjust
cull.timeoutandcull.everyin the Helm values for different requirements -
NFS Storage: When using NFS storage, ensure proper permissions are set on the NFS server. The default
JUPYTER_FSGIDis 100 -
ExternalSecret Dependency: Requires External Secrets Operator to be installed and configured