Files
buun-stack/docs/jupyterhub.md
2025-08-31 22:34:11 +09:00

7.3 KiB

JupyterHub

JupyterHub provides a multi-user Jupyter notebook environment with Keycloak OIDC authentication, Vault integration for secure secrets management, and custom kernel images for data science workflows.

Installation

Install JupyterHub with interactive configuration:

just jupyterhub::install

This will prompt for:

  • JupyterHub host (FQDN)
  • NFS PV usage (if Longhorn is installed)
  • NFS server details (if NFS is enabled)
  • Vault integration setup

Prerequisites

  • Keycloak must be installed and configured
  • For NFS storage: Longhorn must be installed
  • For Vault integration: Vault must be installed and configured

Kernel Images

JupyterHub supports multiple kernel image profiles:

Standard Profiles

  • minimal: Basic Python environment
  • base: Python with common data science packages
  • datascience: Full data science stack (default)
  • pyspark: PySpark for big data processing
  • pytorch: PyTorch for machine learning
  • tensorflow: TensorFlow for machine learning

Buun-Stack Profiles

  • buun-stack: Comprehensive data science environment with Vault integration
  • buun-stack-cuda: CUDA-enabled version with GPU support

Profile Configuration

Enable/disable profiles using environment variables:

# Enable buun-stack profile (CPU version)
export JUPYTER_PROFILE_BUUN_STACK_ENABLED=true

# Enable buun-stack CUDA profile (GPU version)
export JUPYTER_PROFILE_BUUN_STACK_CUDA_ENABLED=true

# Disable default datascience profile
export JUPYTER_PROFILE_DATASCIENCE_ENABLED=false

Available profile variables:

  • JUPYTER_PROFILE_MINIMAL_ENABLED
  • JUPYTER_PROFILE_BASE_ENABLED
  • JUPYTER_PROFILE_DATASCIENCE_ENABLED
  • JUPYTER_PROFILE_PYSPARK_ENABLED
  • JUPYTER_PROFILE_PYTORCH_ENABLED
  • JUPYTER_PROFILE_TENSORFLOW_ENABLED
  • JUPYTER_PROFILE_BUUN_STACK_ENABLED
  • JUPYTER_PROFILE_BUUN_STACK_CUDA_ENABLED

Only JUPYTER_PROFILE_DATASCIENCE_ENABLED is true by default.

Buun-Stack Images

Buun-stack images provide comprehensive data science environments with:

  • All standard data science packages (NumPy, Pandas, Scikit-learn, etc.)
  • Deep learning frameworks (PyTorch, TensorFlow, Keras)
  • Big data tools (PySpark, Apache Arrow)
  • NLP and ML libraries (LangChain, Transformers, spaCy)
  • Database connectors and tools
  • Vault integration with buunstack Python package

Building Custom Images

Build and push buun-stack images to your registry:

# Build images
just jupyterhub::build-kernel-images

# Push to registry
just jupyterhub::push-kernel-images

⚠️ Note: Buun-stack images are comprehensive and large (~13GB). Initial image pulls and deployments take significant time due to the extensive package set.

Image Configuration

Configure image settings in .env.local:

# Image registry
IMAGE_REGISTRY=localhost:30500

# Image tag
JUPYTER_PYTHON_KERNEL_TAG=python-3.12-1

Vault Integration

Overview

Vault integration enables secure secrets management directly from Jupyter notebooks without re-authentication. Users can store and retrieve API keys, database credentials, and other sensitive data securely.

Prerequisites

Vault integration requires:

  • Vault server installed and configured
  • Keycloak OIDC authentication configured
  • Buun-stack kernel images (standard images don't include Vault integration)

Setup

Enable Vault integration during installation:

# Set environment variable before installation or answer yes to prompt during install
export JUPYTERHUB_VAULT_INTEGRATION_ENABLED=true
just jupyterhub::install

Or configure manually:

# Setup Vault JWT authentication for JupyterHub
just jupyterhub::setup-vault-jwt-auth

Usage in Notebooks

With Vault integration enabled, use the buunstack package in notebooks:

from buunstack import SecretStore

# Initialize (uses JupyterHub session authentication)
secrets = SecretStore()

# Store secrets
secrets.put('api-keys', 
    openai='sk-...',
    github='ghp_...',
    database_url='postgresql://...')

# Retrieve secrets
api_keys = secrets.get('api-keys')
openai_key = secrets.get('api-keys', field='openai')

# List all secrets
secret_names = secrets.list()

# Delete secrets
secrets.delete('old-api-key')

Security Features

  • User isolation: Each user can only access their own secrets
  • Automatic token refresh: Background token management prevents authentication failures
  • Audit trail: All secret access is logged in Vault
  • No re-authentication: Uses existing JupyterHub OIDC session

Storage Options

Default Storage

Uses Kubernetes PersistentVolumes for user home directories.

NFS Storage

For shared storage across nodes, configure NFS:

export JUPYTERHUB_NFS_PV_ENABLED=true
export JUPYTER_NFS_IP=192.168.10.1
export JUPYTER_NFS_PATH=/volume1/drive1/jupyter

NFS storage requires:

  • Longhorn storage system installed
  • NFS server accessible from cluster nodes
  • Proper NFS export permissions configured

Configuration

Environment Variables

Key configuration variables:

# Basic settings
JUPYTERHUB_NAMESPACE=jupyter
JUPYTERHUB_CHART_VERSION=4.2.0
JUPYTERHUB_OIDC_CLIENT_ID=jupyterhub

# Keycloak integration
KEYCLOAK_REALM=buunstack

# Storage
JUPYTERHUB_NFS_PV_ENABLED=false

# Vault integration
JUPYTERHUB_VAULT_INTEGRATION_ENABLED=false
VAULT_ADDR=http://vault.vault.svc:8200

# Image settings
JUPYTER_PYTHON_KERNEL_TAG=python-3.12-6
IMAGE_REGISTRY=localhost:30500

Advanced Configuration

Customize JupyterHub behavior by editing jupyterhub-values.gomplate.yaml template before installation.

Management

Uninstall

just jupyterhub::uninstall

Update

Upgrade to newer versions:

# Update image tag
export JUPYTER_PYTHON_KERNEL_TAG=python-3.12-2

# Rebuild and push images
just jupyterhub::push-kernel-images

# Upgrade JupyterHub deployment
just jupyterhub::install

Troubleshooting

Image Pull Issues

Buun-stack images are large and may timeout:

# Check pod status
kubectl get pods -n jupyter

# Check image pull progress
kubectl describe pod <pod-name> -n jupyter

# Increase timeout if needed
helm upgrade jupyterhub jupyterhub/jupyterhub \
  --timeout=30m -f jupyterhub-values.yaml

Vault Integration Issues

Check Vault connectivity and authentication:

# In a notebook
import os
print("Vault Address:", os.getenv('VAULT_ADDR'))
print("Access Token:", bool(os.getenv('JUPYTERHUB_OIDC_ACCESS_TOKEN')))

# Test SecretStore
from buunstack import SecretStore
secrets = SecretStore()
status = secrets.get_status()
print(status)

Authentication Issues

Verify Keycloak client configuration:

# Check client exists
just keycloak::get-client buunstack jupyterhub

# Check redirect URIs
just keycloak::update-client buunstack jupyterhub \
  "https://your-jupyter-host/hub/oauth_callback"

Performance Considerations

  • Image Size: Buun-stack images are ~13GB, plan storage accordingly
  • Pull Time: Initial pulls take 5-15 minutes depending on network
  • Resource Usage: Data science workloads require adequate CPU/memory
  • Storage: NFS provides better performance for shared datasets

For production deployments, consider:

  • Pre-pulling images to all nodes
  • Using faster storage backends
  • Configuring resource limits per user
  • Setting up monitoring and alerts