34 KiB
JupyterHub
JupyterHub provides a multi-user Jupyter notebook environment with Keycloak OIDC authentication, Vault integration for secure secrets management, and custom kernel images for data science workflows.
Table of Contents
- Installation
- Prerequisites
- Access
- MCP Server Integration
- Programmatic API Access
- Kernel Images
- Profile Configuration
- GPU Support
- Buun-Stack Images
- buunstack Package & SecretStore
- Vault Integration
- Token Renewal Implementation
- Storage Options
- Configuration
- Custom Container Images
- Management
- Troubleshooting
- Technical Implementation Details
- Performance Considerations
- Known Limitations
Installation
Install JupyterHub with interactive configuration:
just jupyterhub::install
This will prompt for:
- JupyterHub host (FQDN)
- NFS PV usage (if Longhorn is installed)
- NFS server details (if NFS is enabled)
- Vault integration setup (requires root token for initial setup)
Prerequisites
- Keycloak must be installed and configured
- For NFS storage: Longhorn must be installed
- For Vault integration: Vault and External Secrets Operator must be installed
- Helm repository must be accessible
Access
Access JupyterHub at your configured host (e.g., https://jupyter.example.com) and authenticate via Keycloak.
MCP Server Integration
JupyterHub includes jupyter-mcp-server as a Jupyter Server Extension, enabling MCP (Model Context Protocol) clients to interact with Jupyter notebooks programmatically.
Overview
The MCP server provides a standardized interface for AI assistants and other MCP clients to:
- List and manage files on the Jupyter server
- Create, read, and edit notebook cells
- Execute code in notebook kernels
- Manage kernel sessions
Enabling MCP Server
MCP server support is controlled by the JUPYTER_MCP_SERVER_ENABLED environment variable. During installation, you will be prompted:
just jupyterhub::install
# "Enable jupyter-mcp-server for Claude Code integration? (y/N)"
Or set the environment variable before installation:
JUPYTER_MCP_SERVER_ENABLED=true just jupyterhub::install
Kernel Image Requirements
The MCP server requires jupyter-mcp-server to be installed and enabled in the kernel image.
Buun-Stack profiles (buun-stack, buun-stack-cuda) include jupyter-mcp-server pre-installed and enabled. No additional setup is required.
Other profiles (minimal, base, datascience, pyspark, pytorch, tensorflow) do not include jupyter-mcp-server. To use MCP with these images, install the required packages in your notebook:
pip install 'jupyter-mcp-server==0.21.0' 'jupyter-mcp-tools>=0.1.4'
pip uninstall -y pycrdt datalayer_pycrdt
pip install 'datalayer_pycrdt==0.12.17'
jupyter server extension enable jupyter_mcp_server
After installation, restart your Jupyter server for the extension to take effect.
MCP Endpoint
When enabled, each user's Jupyter server exposes an MCP endpoint at:
https://<JUPYTERHUB_HOST>/user/<username>/mcp
Authentication
MCP clients must authenticate using a JupyterHub API token. Obtain a token using:
# Get token for a user (creates user if not exists)
just jupyterhub::get-token <username>
The token should be passed in the Authorization header:
Authorization: token <JUPYTERHUB_TOKEN>
Client Configuration
Generic MCP Client Configuration
For any MCP client that supports HTTP transport:
just jupyterhub::setup-mcp-server <username>
This displays the MCP server URL, authentication details, and available tools.
Claude Code Configuration
For Claude Code specifically:
just jupyterhub::setup-claude-mcp-server <username>
This provides a ready-to-use .mcp.json configuration:
{
"mcpServers": {
"jupyter-<username>": {
"type": "http",
"url": "https://<JUPYTERHUB_HOST>/user/<username>/mcp",
"headers": {
"Authorization": "token ${JUPYTERHUB_TOKEN}"
}
}
}
}
Set the environment variable:
export JUPYTERHUB_TOKEN=<your-token>
Checking MCP Status
Verify MCP server status for a user:
just jupyterhub::mcp-status <username>
This checks:
- User pod is running
- jupyter-mcp-server extension is enabled
- MCP endpoint is responding
Technical Details
- Transport: HTTP (streamable-http)
- Extension: jupyter-mcp-server (installed in kernel images)
- Environment Variable:
JUPYTERHUB_ALLOW_TOKEN_IN_URL=1enables WebSocket token authentication
Programmatic API Access
buun-stack configures JupyterHub to allow programmatic API access, enabling token generation and user management without requiring users to log in first. This is achieved by registering a Service in JupyterHub.
What is a JupyterHub Service?
In JupyterHub, a Service is a registered entity (external program or script) that can access the JupyterHub API using a pre-configured token. While regular users obtain tokens by logging in, services use tokens registered in the JupyterHub configuration.
# Register a service with its API token
c.JupyterHub.services = [
{
'name': 'admin-service',
'api_token': '<token>',
}
]
# Grant permissions to the service
c.JupyterHub.load_roles = [
{
'name': 'admin-service-role',
'scopes': ['admin:users', 'tokens', 'admin:servers'],
'services': ['admin-service'],
}
]
When an API request includes Authorization: token <token>, JupyterHub identifies the token owner (in this case, admin-service) and applies the corresponding permissions.
How It Works
┌─────────────────────────────────────────────────────────────────┐
│ just jupyterhub::get-token <username> │
│ │
│ 1. Retrieve service token from Kubernetes Secret │
│ 2. Call JupyterHub API with the token │
└──────────────────────────────┬──────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ JupyterHub API │
│ │
│ 1. Receive: Authorization: token <service-token> │
│ 2. Identify: This token belongs to "admin-service" │
│ 3. Check permissions: admin-service has admin:users, tokens │
│ 4. Execute: Create user token and return it │
└─────────────────────────────────────────────────────────────────┘
Service Configuration
The service is automatically configured during JupyterHub installation:
- Token Generation: A random token is generated using
just utils::random-password - Secret Storage: Token is stored in Vault (if External Secrets Operator is available) or as a Kubernetes Secret
- Service Registration: JupyterHub is configured with the service and appropriate RBAC roles
RBAC Permissions
The registered service has the following scopes:
admin:users- Create, read, update, delete userstokens- Create and manage API tokensadmin:servers- Start and stop user servers
Usage
Get Token for a User
# Creates user if not exists, returns API token
just jupyterhub::get-token <username>
This command:
- Checks if the user exists in JupyterHub
- Creates the user if not found
- Generates an API token with appropriate scopes
- Returns the token for use with MCP or other API clients
Manual Token Management
The service token is stored in:
- Vault path:
secret/jupyterhub/admin-service(key:token) - Kubernetes Secret:
jupyterhub-admin-service-tokenin the JupyterHub namespace
To recreate the service token:
just jupyterhub::create-admin-service-token-secret
Security Considerations
- The service token has elevated privileges; protect it accordingly
- Tokens are stored encrypted in Vault when External Secrets Operator is available
- User tokens generated via the service have limited scopes (
access:servers!user=<username>,self)
Kernel Images
Important Note
Building and using custom buun-stack images requires building the buunstack Python package first. The package wheel file will be included in the Docker image during build.
JupyterHub supports multiple kernel image profiles:
Standard Profiles
- minimal: Basic Python environment
- base: Python with common data science packages
- datascience: Full data science stack (default)
- pyspark: PySpark for big data processing
- pytorch: PyTorch for machine learning
- tensorflow: TensorFlow for machine learning
Buun-Stack Profiles
- buun-stack: Comprehensive data science environment with Vault integration
- buun-stack-cuda: CUDA-enabled version with GPU support
Profile Configuration
Enable/disable profiles using environment variables:
# Enable buun-stack profile (CPU version)
JUPYTER_PROFILE_BUUN_STACK_ENABLED=true
# Enable buun-stack CUDA profile (GPU version)
JUPYTER_PROFILE_BUUN_STACK_CUDA_ENABLED=true
# Disable default datascience profile
JUPYTER_PROFILE_DATASCIENCE_ENABLED=false
Available profile variables:
JUPYTER_PROFILE_MINIMAL_ENABLEDJUPYTER_PROFILE_BASE_ENABLEDJUPYTER_PROFILE_DATASCIENCE_ENABLEDJUPYTER_PROFILE_PYSPARK_ENABLEDJUPYTER_PROFILE_PYTORCH_ENABLEDJUPYTER_PROFILE_TENSORFLOW_ENABLEDJUPYTER_PROFILE_BUUN_STACK_ENABLEDJUPYTER_PROFILE_BUUN_STACK_CUDA_ENABLED
Only JUPYTER_PROFILE_DATASCIENCE_ENABLED is true by default.
GPU Support
JupyterHub supports GPU-accelerated notebooks using NVIDIA GPUs. GPU support is automatically enabled during installation if the nvidia-device-plugin is detected.
GPU Prerequisites
GPU support requires the following components to be installed:
NVIDIA Device Plugin
Install the NVIDIA device plugin for Kubernetes:
just nvidia-device-plugin::install
This plugin:
- Exposes NVIDIA GPUs to Kubernetes as schedulable resources
- Manages GPU allocation to pods
- Ensures proper GPU driver access within containers
RuntimeClass Configuration
The nvidia-device-plugin installation automatically creates the nvidia RuntimeClass, which:
- Configures containerd to use the NVIDIA container runtime
- Enables GPU access for containers using
runtimeClassName: nvidia
Enabling GPU Support
During JupyterHub installation, you will be prompted:
just jupyterhub::install
# When nvidia-device-plugin is installed, you'll see:
# "Enable GPU support for JupyterHub notebooks? (y/N)"
Alternatively, set the environment variable before installation:
JUPYTERHUB_GPU_ENABLED=true
JUPYTERHUB_GPU_LIMIT=1 # Number of GPUs per user (default: 1)
GPU-Enabled Profiles
When GPU support is enabled:
- All notebook profiles get GPU access via
runtimeClassName: nvidia - CUDA-specific profile (buun-stack-cuda) additionally includes:
- CUDA 12.x toolkit
- PyTorch with CUDA support
- GPU-optimized libraries
Usage
Selecting a GPU Profile
When spawning a notebook, select a profile with GPU capabilities:
- Buun-stack with CUDA: Recommended for GPU workloads (requires custom image)
- PyTorch: Standard PyTorch notebook
- TensorFlow: Standard TensorFlow notebook
Verifying GPU Access
In your notebook, verify GPU availability:
import torch
# Check if CUDA is available
print(f"CUDA available: {torch.cuda.is_available()}")
# Get GPU device count
print(f"GPU count: {torch.cuda.device_count()}")
# Get GPU device name
if torch.cuda.is_available():
print(f"GPU name: {torch.cuda.get_device_name(0)}")
# Test GPU operation
torch.cuda.synchronize()
print("GPU is working correctly!")
GPU Configuration
Default GPU configuration:
- GPU limit per user: 1 GPU (configurable via
JUPYTERHUB_GPU_LIMIT) - Memory requests: 1Gi (defined in singleuser settings)
- RuntimeClass:
nvidia(automatically applied when GPU enabled)
Building GPU-Enabled Custom Images
If using the buun-stack-cuda profile, build and push the CUDA-enabled image:
# Enable CUDA profile
export JUPYTER_PROFILE_BUUN_STACK_CUDA_ENABLED=true
# Build CUDA-enabled image (includes PyTorch with CUDA 12.x)
just jupyterhub::build-kernel-images
# Push to registry
just jupyterhub::push-kernel-images
The CUDA image:
- Based on
quay.io/jupyter/pytorch-notebook:x86_64-cuda12-python-3.12.10 - Includes PyTorch with CUDA 12.4 support (
cu124) - Contains all standard buun-stack packages
- Supports GPU-accelerated deep learning
Troubleshooting GPU Issues
Pod Not Scheduling
If GPU-enabled pods fail to schedule:
# Check if nvidia-device-plugin is running
kubectl get pods -n nvidia-device-plugin
# Verify GPU resources are advertised
kubectl describe nodes | grep nvidia.com/gpu
# Check RuntimeClass exists
kubectl get runtimeclass nvidia
CUDA Not Available
If torch.cuda.is_available() returns False:
-
Verify the image has CUDA support:
# In notebook !nvcc --version # Should show CUDA compiler version -
Check Pod uses nvidia RuntimeClass:
kubectl get pod <pod-name> -n datastack -o yaml | grep runtimeClassName -
Rebuild image if using custom buun-stack-cuda image
GPU Memory Issues
Monitor GPU usage:
import torch
# Check GPU memory
if torch.cuda.is_available():
print(f"Allocated: {torch.cuda.memory_allocated(0) / 1024**3:.2f} GB")
print(f"Reserved: {torch.cuda.memory_reserved(0) / 1024**3:.2f} GB")
# Clear cache if needed
torch.cuda.empty_cache()
Buun-Stack Images
Buun-stack images provide comprehensive data science environments with:
- All standard data science packages (NumPy, Pandas, Scikit-learn, etc.)
- Deep learning frameworks (PyTorch, TensorFlow, Keras)
- Big data tools (PySpark, Apache Arrow)
- NLP and ML libraries (LangChain, Transformers, spaCy)
- Database connectors and tools
- Vault integration with
buunstackPython package
Building Custom Images
Build and push buun-stack images to your registry:
# Build images (includes building the buunstack Python package)
just jupyterhub::build-kernel-images
# Push to registry
just jupyterhub::push-kernel-images
The build process:
- Builds the
buunstackPython package wheel - Copies the wheel into the Docker build context
- Installs the wheel in the Docker image
- Cleans up temporary files
⚠️ Note: Buun-stack images are comprehensive and large (~13GB). Initial image pulls and deployments take significant time due to the extensive package set.
Image Configuration
Configure image settings in .env.local:
# Image registry
IMAGE_REGISTRY=localhost:30500
# Image tag (current default)
JUPYTER_PYTHON_KERNEL_TAG=python-3.12-28
buunstack Package & SecretStore
JupyterHub includes the buunstack Python package, which provides seamless integration with HashiCorp Vault for secure secrets management in your notebooks.
Key Features
- 🔒 Secure Secrets Management: Store and retrieve secrets securely using HashiCorp Vault
- 🚀 Pre-acquired Authentication: Uses Vault tokens created automatically at notebook spawn
- 📱 Simple API: Easy-to-use interface similar to Google Colab's
userdata.get() - 🔄 Automatic Token Renewal: Built-in token refresh for long-running sessions
Quick Example
from buunstack import SecretStore
# Initialize with pre-acquired Vault token (automatic)
secrets = SecretStore()
# Store secrets
secrets.put('api-keys',
openai_key='sk-your-key-here',
github_token='ghp_your-token',
database_url='postgresql://user:pass@host:5432/db'
)
# Retrieve secrets
api_keys = secrets.get('api-keys')
openai_key = api_keys['openai_key']
# Or get a specific field directly
openai_key = secrets.get('api-keys', field='openai_key')
Learn More
For detailed documentation, usage examples, and API reference, see:
📖 buunstack Package Documentation
Vault Integration
Overview
Vault integration enables secure secrets management directly from Jupyter notebooks. The system uses:
- ExternalSecret to fetch the admin token from Vault
- Renewable tokens with unlimited Max TTL to avoid 30-day system limitations
- Token renewal script that automatically renews tokens at TTL/2 intervals (minimum 30 seconds)
- User-specific tokens created during notebook spawn with isolated access
Architecture
┌────────────────────────────────────────────────────────────────┐
│ JupyterHub Hub Pod │
│ │
│ ┌──────────────┐ ┌────────────────┐ ┌────────────────────┐ │
│ │ Hub │ │ Token Renewer │ │ ExternalSecret │ │
│ │ Container │◄─┤ Sidecar │◄─┤ (mounted as │ │
│ │ │ │ │ │ Secret) │ │
│ └──────────────┘ └────────────────┘ └────────────────────┘ │
│ │ │ ▲ │
│ │ │ │ │
│ ▼ ▼ │ │
│ ┌──────────────────────────────────┐ │ │
│ │ /vault/secrets/vault-token │ │ │
│ │ (Admin token for user creation) │ │ │
│ └──────────────────────────────────┘ │ │
└────────────────────────────────────────────────────┼───────────┘
│
┌───────────▼──────────┐
│ Vault │
│ secret/jupyterhub/ │
│ vault-token │
└──────────────────────┘
Vault Integration Prerequisites
Vault integration requires:
- Vault server installed and configured
- External Secrets Operator installed
- ClusterSecretStore configured for Vault
- Buun-stack kernel images (standard images don't include Vault integration)
Setup
Vault integration is configured during JupyterHub installation:
just jupyterhub::install
# Answer "yes" when prompted about Vault integration
# Provide Vault root token when prompted
The setup process:
- Creates
jupyterhub-adminpolicy with necessary permissions includingsudofor orphan token creation - Creates renewable admin token with 24h TTL and unlimited Max TTL
- Stores token in Vault at
secret/jupyterhub/vault-token - Creates ExternalSecret to fetch token from Vault
- Deploys token renewal sidecar for automatic renewal
Usage in Notebooks
With Vault integration enabled, use the buunstack package in notebooks:
from buunstack import SecretStore
# Initialize (uses pre-acquired user-specific token)
secrets = SecretStore()
# Store secrets
secrets.put('api-keys',
openai='sk-...',
github='ghp_...',
database_url='postgresql://...')
# Retrieve secrets
api_keys = secrets.get('api-keys')
openai_key = secrets.get('api-keys', field='openai')
# List all secrets
secret_names = secrets.list()
# Delete secrets or specific fields
secrets.delete('old-api-key') # Delete entire secret
secrets.delete('api-keys', field='github') # Delete only github field
Security Features
- User isolation: Each user receives an orphan token with access only to their namespace
- Automatic renewal: Token renewal script renews admin token at TTL/2 intervals (minimum 30 seconds)
- ExternalSecret integration: Admin token fetched securely from Vault
- Orphan tokens: User tokens are orphan tokens, not limited by parent policy restrictions
- Audit trail: All secret access is logged in Vault
Token Management
Admin Token
The admin token is managed through:
- Creation:
just jupyterhub::create-jupyterhub-vault-tokencreates renewable token - Storage: Stored in Vault at
secret/jupyterhub/vault-token - Retrieval: ExternalSecret fetches and mounts as Kubernetes Secret
- Renewal:
vault-token-renewer.shscript renews at TTL/2 intervals
User Tokens
User tokens are created dynamically:
- Pre-spawn hook reads admin token from
/vault/secrets/vault-token - Creates user policy
jupyter-user-{username}with restricted access - Creates orphan token with user policy (requires
sudopermission) - Sets environment variable
NOTEBOOK_VAULT_TOKENin notebook container
Token Renewal Implementation
Admin Token Renewal
The admin token renewal is handled by a sidecar container (vault-token-renewer) running alongside the JupyterHub hub:
Implementation Details:
-
Renewal Script:
/vault/config/vault-token-renewer.sh- Runs in the
vault-token-renewersidecar container - Uses Vault 1.17.5 image with HashiCorp Vault CLI
- Runs in the
-
Environment-Based TTL Configuration:
# Reads TTL from environment variable (set in .env.local) TTL_RAW="${JUPYTERHUB_VAULT_TOKEN_TTL}" # e.g., "5m", "24h" # Converts to seconds and calculates renewal interval RENEWAL_INTERVAL=$((TTL_SECONDS / 2)) # TTL/2 with minimum 30s -
Token Source: ExternalSecret → Kubernetes Secret → mounted file
# Token retrieved from ExternalSecret-managed mount ADMIN_TOKEN=$(cat /vault/admin-token/token) -
Renewal Loop:
while true; do vault token renew >/dev/null 2>&1 sleep $RENEWAL_INTERVAL done -
Error Handling: If renewal fails, re-retrieves token from ExternalSecret mount
Key Files:
vault-token-renewer.sh: Main renewal scriptjupyterhub-vault-token-external-secret.gomplate.yaml: ExternalSecret configurationvault-token-renewer-configConfigMap: Contains the renewal script
User Token Renewal
User token renewal is handled within the notebook environment by the buunstack Python package:
Implementation Details:
-
Token Source: Environment variable set by pre-spawn hook
# In pre_spawn_hook.gomplate.py spawner.environment["NOTEBOOK_VAULT_TOKEN"] = user_vault_token -
Automatic Renewal: Built into
SecretStoreclass operations# In buunstack/secrets.py def _ensure_authenticated(self): token_info = self.client.auth.token.lookup_self() ttl = token_info.get("data", {}).get("ttl", 0) renewable = token_info.get("data", {}).get("renewable", False) # Renew if TTL < 10 minutes and renewable if renewable and ttl > 0 and ttl < 600: self.client.auth.token.renew_self() -
Renewal Trigger: Every
SecretStoreoperation (get, put, delete, list)- Checks token validity before operation
- Automatically renews if TTL < 10 minutes
- Transparent to user code
-
Token Configuration (set during creation):
- TTL:
NOTEBOOK_VAULT_TOKEN_TTL(default: 24h = 1 day) - Max TTL:
NOTEBOOK_VAULT_TOKEN_MAX_TTL(default: 168h = 7 days) - Policy: User-specific
jupyter-user-{username} - Type: Orphan token (independent of parent token lifecycle)
- TTL:
-
Expiry Handling: When token reaches Max TTL:
- Cannot be renewed further
- User must restart notebook server (triggers new token creation)
- Prevented by
JUPYTERHUB_CULL_MAX_AGEsetting (6 days < 7 day Max TTL)
Key Files:
pre_spawn_hook.gomplate.py: User token creation logicbuunstack/secrets.py: Token renewal implementationuser_policy.hcl: User token permissions template
Token Lifecycle Summary
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Admin Token │ │ User Token │ │ Pod Lifecycle │
│ │ │ │ │ │
│ Created: Manual │ │ Created: Spawn │ │ Max Age: 7 days │
│ TTL: 5m-24h │ │ TTL: 1 day │ │ Auto-restart │
│ Max TTL: ∞ │ │ Max TTL: 7 days │ │ at Max TTL │
│ Renewal: Auto │ │ Renewal: Auto │ │ │
│ Interval: TTL/2 │ │ Trigger: Usage │ │ │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│ │ │
▼ ▼ ▼
vault-token-renewer buunstack.py cull.maxAge
sidecar SecretStore pod restart
Storage Options
Default Storage
Uses Kubernetes PersistentVolumes for user home directories.
NFS Storage
For shared storage across nodes, configure NFS:
JUPYTERHUB_NFS_PV_ENABLED=true
JUPYTER_NFS_IP=192.168.10.1
JUPYTER_NFS_PATH=/volume1/drive1/jupyter
NFS storage requires:
- Longhorn storage system installed
- NFS server accessible from cluster nodes
- Proper NFS export permissions configured
Configuration
Environment Variables
Key configuration variables:
# Basic settings
JUPYTERHUB_NAMESPACE=jupyter
JUPYTERHUB_CHART_VERSION=4.2.0
JUPYTERHUB_OIDC_CLIENT_ID=jupyterhub
# Keycloak integration
KEYCLOAK_REALM=buunstack
# Storage
JUPYTERHUB_NFS_PV_ENABLED=false
# Vault integration
JUPYTERHUB_VAULT_INTEGRATION_ENABLED=false
VAULT_ADDR=https://vault.example.com
# Image settings
JUPYTER_PYTHON_KERNEL_TAG=python-3.12-28
IMAGE_REGISTRY=localhost:30500
# Vault token TTL settings
JUPYTERHUB_VAULT_TOKEN_TTL=24h # Admin token: renewed at TTL/2 intervals
NOTEBOOK_VAULT_TOKEN_TTL=24h # User token: 1 day (renewed on usage)
NOTEBOOK_VAULT_TOKEN_MAX_TTL=168h # User token: 7 days max
# Server pod lifecycle settings
JUPYTERHUB_CULL_MAX_AGE=604800 # Max pod age in seconds (7 days = 604800s)
# Should be <= NOTEBOOK_VAULT_TOKEN_MAX_TTL
# Logging
JUPYTER_BUUNSTACK_LOG_LEVEL=warning # Options: debug, info, warning, error
Advanced Configuration
Customize JupyterHub behavior by editing jupyterhub-values.gomplate.yaml template before installation.
Custom Container Images
JupyterHub uses custom container images with pre-installed data science tools and integrations:
datastack-notebook (CPU)
Standard notebook image based on jupyter/pytorch-notebook:
- PyTorch: Deep learning framework
- PySpark: Apache Spark integration for big data processing
- ClickHouse Client: Direct database access
- Python 3.12: Latest Python runtime
datastack-cuda-notebook (GPU)
GPU-enabled notebook image based on jupyter/pytorch-notebook:cuda12:
- CUDA 12: GPU acceleration support
- PyTorch with GPU: Hardware-accelerated deep learning
- PySpark: Apache Spark integration
- ClickHouse Client: Direct database access
- Python 3.12: Latest Python runtime
Both images are based on the official Jupyter Docker Stacks and include all standard data science libraries (NumPy, pandas, scikit-learn, matplotlib, etc.).
Management
Uninstall
just jupyterhub::uninstall
This removes:
- JupyterHub deployment
- User pods
- PVCs
- ExternalSecret
Update
Upgrade to newer versions:
# Update image tag in .env.local
export JUPYTER_PYTHON_KERNEL_TAG=python-3.12-29
# Rebuild and push images
just jupyterhub::build-kernel-images
just jupyterhub::push-kernel-images
# Upgrade JupyterHub deployment
just jupyterhub::install
Manual Token Refresh
If needed, manually refresh the admin token:
# Create new renewable token
just jupyterhub::create-jupyterhub-vault-token
# Restart JupyterHub to pick up new token
kubectl rollout restart deployment/hub -n jupyter
Troubleshooting
Image Pull Issues
Buun-stack images are large and may timeout:
# Check pod status
kubectl get pods -n jupyter
# Check image pull progress
kubectl describe pod <pod-name> -n jupyter
# Increase timeout if needed
helm upgrade jupyterhub jupyterhub/jupyterhub --timeout=30m -f jupyterhub-values.yaml
Vault Integration Issues
Check token and authentication:
# Check ExternalSecret status
kubectl get externalsecret -n jupyter jupyterhub-vault-token
# Check if Secret was created
kubectl get secret -n jupyter jupyterhub-vault-token
# Check token renewal logs
kubectl logs -n jupyter -l app.kubernetes.io/component=hub -c vault-token-renewer
# In a notebook, verify environment
%env NOTEBOOK_VAULT_TOKEN
Common issues:
- "child policies must be subset of parent": Admin policy needs
sudopermission for orphan tokens - Token not found: Check ExternalSecret and ClusterSecretStore configuration
- Permission denied: Verify
jupyterhub-adminpolicy has all required permissions
Authentication Issues
Verify Keycloak client configuration:
# Check client exists
just keycloak::get-client buunstack jupyterhub
# Check redirect URIs
just keycloak::update-client buunstack jupyterhub \
"https://your-jupyter-host/hub/oauth_callback"
Technical Implementation Details
Helm Chart Version
JupyterHub uses the official Zero to JupyterHub (Z2JH) Helm chart:
- Chart:
jupyterhub/jupyterhub - Version:
4.2.0(configurable viaJUPYTERHUB_CHART_VERSION) - Documentation: https://z2jh.jupyter.org/
Token System Architecture
The system uses a three-tier token approach:
- Renewable Admin Token:
- Created with
explicit-max-ttl=0(unlimited Max TTL) - Renewed automatically at TTL/2 intervals (minimum 30 seconds)
- Stored in Vault and fetched via ExternalSecret
- Created with
- Orphan User Tokens:
- Created with
create_orphan()API call - Not limited by parent token policies
- Individual TTL and Max TTL settings
- Created with
- Token Renewal Script:
- Runs as sidecar container
- Reads token from ExternalSecret mount
- Handles renewal and re-retrieval on failure
Key Files
jupyterhub-admin-policy.hcl: Vault policy with admin permissionsuser_policy.hcl: Template for user-specific policiesvault-token-renewer.sh: Token renewal scriptjupyterhub-vault-token-external-secret.gomplate.yaml: ExternalSecret configuration
Performance Considerations
- Image Size: Buun-stack images are ~13GB, plan storage accordingly
- Pull Time: Initial pulls take 5-15 minutes depending on network
- Resource Usage: Data science workloads require adequate CPU/memory
- Token Renewal: Minimal overhead (renewal at TTL/2 intervals)
For production deployments, consider:
- Pre-pulling images to all nodes
- Using faster storage backends
- Configuring resource limits per user
- Setting up monitoring and alerts
Known Limitations
- Annual Token Recreation: While tokens have unlimited Max TTL, best practice suggests recreating them annually
- Token Expiry and Pod Lifecycle: User tokens have a TTL of 1 day (
NOTEBOOK_VAULT_TOKEN_TTL=24h) and maximum TTL of 7 days (NOTEBOOK_VAULT_TOKEN_MAX_TTL=168h). Daily usage extends the token for another day, allowing up to 7 days of continuous use. Server pods are automatically restarted after 7 days (JUPYTERHUB_CULL_MAX_AGE=604800s) to refresh tokens. - Cull Settings: Server idle timeout is set to 2 hours by default. Adjust
cull.timeoutandcull.everyin the Helm values for different requirements - NFS Storage: When using NFS storage, ensure proper permissions are set on the NFS server. The default
JUPYTER_FSGIDis 100 - ExternalSecret Dependency: Requires External Secrets Operator to be installed and configured