Files

Masaki Yatsu ee886bac78 feat(jupyterhub): enable jupyter-mcp-server

2025-11-30 20:12:12 +09:00

34 KiB

Raw Blame History

JupyterHub

JupyterHub provides a multi-user Jupyter notebook environment with Keycloak OIDC authentication, Vault integration for secure secrets management, and custom kernel images for data science workflows.

Installation
Prerequisites
Access
MCP Server Integration
Programmatic API Access
Kernel Images
Profile Configuration
GPU Support
Buun-Stack Images
buunstack Package & SecretStore
Vault Integration
Token Renewal Implementation
Storage Options
Configuration
Custom Container Images
Management
Troubleshooting
Technical Implementation Details
Performance Considerations
Known Limitations

Installation

Install JupyterHub with interactive configuration:

just jupyterhub::install

This will prompt for:

JupyterHub host (FQDN)
NFS PV usage (if Longhorn is installed)
NFS server details (if NFS is enabled)
Vault integration setup (requires root token for initial setup)

Prerequisites

Keycloak must be installed and configured
For NFS storage: Longhorn must be installed
For Vault integration: Vault and External Secrets Operator must be installed
Helm repository must be accessible

Access

Access JupyterHub at your configured host (e.g., https://jupyter.example.com) and authenticate via Keycloak.

MCP Server Integration

JupyterHub includes jupyter-mcp-server as a Jupyter Server Extension, enabling MCP (Model Context Protocol) clients to interact with Jupyter notebooks programmatically.

Overview

The MCP server provides a standardized interface for AI assistants and other MCP clients to:

List and manage files on the Jupyter server
Create, read, and edit notebook cells
Execute code in notebook kernels
Manage kernel sessions

Enabling MCP Server

MCP server support is controlled by the JUPYTER_MCP_SERVER_ENABLED environment variable. During installation, you will be prompted:

just jupyterhub::install
# "Enable jupyter-mcp-server for Claude Code integration? (y/N)"

Or set the environment variable before installation:

JUPYTER_MCP_SERVER_ENABLED=true just jupyterhub::install

Kernel Image Requirements

The MCP server requires jupyter-mcp-server to be installed and enabled in the kernel image.

Buun-Stack profiles (buun-stack, buun-stack-cuda) include jupyter-mcp-server pre-installed and enabled. No additional setup is required.

Other profiles (minimal, base, datascience, pyspark, pytorch, tensorflow) do not include jupyter-mcp-server. To use MCP with these images, install the required packages in your notebook:

pip install 'jupyter-mcp-server==0.21.0' 'jupyter-mcp-tools>=0.1.4'
pip uninstall -y pycrdt datalayer_pycrdt
pip install 'datalayer_pycrdt==0.12.17'
jupyter server extension enable jupyter_mcp_server

After installation, restart your Jupyter server for the extension to take effect.

MCP Endpoint

When enabled, each user's Jupyter server exposes an MCP endpoint at:

https://<JUPYTERHUB_HOST>/user/<username>/mcp

Authentication

MCP clients must authenticate using a JupyterHub API token. Obtain a token using:

# Get token for a user (creates user if not exists)
just jupyterhub::get-token <username>

The token should be passed in the Authorization header:

Authorization: token <JUPYTERHUB_TOKEN>

Client Configuration

Generic MCP Client Configuration

For any MCP client that supports HTTP transport:

just jupyterhub::setup-mcp-server <username>

This displays the MCP server URL, authentication details, and available tools.

Claude Code Configuration

For Claude Code specifically:

just jupyterhub::setup-claude-mcp-server <username>

This provides a ready-to-use .mcp.json configuration:

{
  "mcpServers": {
    "jupyter-<username>": {
      "type": "http",
      "url": "https://<JUPYTERHUB_HOST>/user/<username>/mcp",
      "headers": {
        "Authorization": "token ${JUPYTERHUB_TOKEN}"
      }
    }
  }
}

Set the environment variable:

export JUPYTERHUB_TOKEN=<your-token>

Checking MCP Status

Verify MCP server status for a user:

just jupyterhub::mcp-status <username>

This checks:

User pod is running
jupyter-mcp-server extension is enabled
MCP endpoint is responding

Technical Details

Transport: HTTP (streamable-http)
Extension: jupyter-mcp-server (installed in kernel images)
Environment Variable: JUPYTERHUB_ALLOW_TOKEN_IN_URL=1 enables WebSocket token authentication

Programmatic API Access

buun-stack configures JupyterHub to allow programmatic API access, enabling token generation and user management without requiring users to log in first. This is achieved by registering a Service in JupyterHub.

What is a JupyterHub Service?

In JupyterHub, a Service is a registered entity (external program or script) that can access the JupyterHub API using a pre-configured token. While regular users obtain tokens by logging in, services use tokens registered in the JupyterHub configuration.

# Register a service with its API token
c.JupyterHub.services = [
    {
        'name': 'admin-service',
        'api_token': '<token>',
    }
]

# Grant permissions to the service
c.JupyterHub.load_roles = [
    {
        'name': 'admin-service-role',
        'scopes': ['admin:users', 'tokens', 'admin:servers'],
        'services': ['admin-service'],
    }
]

When an API request includes Authorization: token <token>, JupyterHub identifies the token owner (in this case, admin-service) and applies the corresponding permissions.

How It Works

┌─────────────────────────────────────────────────────────────────┐
│  just jupyterhub::get-token <username>                          │
│                                                                 │
│  1. Retrieve service token from Kubernetes Secret               │
│  2. Call JupyterHub API with the token                          │
└──────────────────────────────┬──────────────────────────────────┘
                               │
                               ▼
┌─────────────────────────────────────────────────────────────────┐
│  JupyterHub API                                                 │
│                                                                 │
│  1. Receive: Authorization: token <service-token>               │
│  2. Identify: This token belongs to "admin-service"             │
│  3. Check permissions: admin-service has admin:users, tokens    │
│  4. Execute: Create user token and return it                    │
└─────────────────────────────────────────────────────────────────┘

Service Configuration

The service is automatically configured during JupyterHub installation:

Token Generation: A random token is generated using just utils::random-password
Secret Storage: Token is stored in Vault (if External Secrets Operator is available) or as a Kubernetes Secret
Service Registration: JupyterHub is configured with the service and appropriate RBAC roles

RBAC Permissions

The registered service has the following scopes:

admin:users - Create, read, update, delete users
tokens - Create and manage API tokens
admin:servers - Start and stop user servers

Usage

Get Token for a User

# Creates user if not exists, returns API token
just jupyterhub::get-token <username>

This command:

Checks if the user exists in JupyterHub
Creates the user if not found
Generates an API token with appropriate scopes
Returns the token for use with MCP or other API clients

Manual Token Management

The service token is stored in:

Vault path: secret/jupyterhub/admin-service (key: token)
Kubernetes Secret: jupyterhub-admin-service-token in the JupyterHub namespace

To recreate the service token:

just jupyterhub::create-admin-service-token-secret

Security Considerations

The service token has elevated privileges; protect it accordingly
Tokens are stored encrypted in Vault when External Secrets Operator is available
User tokens generated via the service have limited scopes (access:servers!user=<username>, self)

Kernel Images

Important Note

Building and using custom buun-stack images requires building the buunstack Python package first. The package wheel file will be included in the Docker image during build.

JupyterHub supports multiple kernel image profiles:

Standard Profiles

minimal: Basic Python environment
base: Python with common data science packages
datascience: Full data science stack (default)
pyspark: PySpark for big data processing
pytorch: PyTorch for machine learning
tensorflow: TensorFlow for machine learning

Buun-Stack Profiles

buun-stack: Comprehensive data science environment with Vault integration
buun-stack-cuda: CUDA-enabled version with GPU support

Profile Configuration

Enable/disable profiles using environment variables:

# Enable buun-stack profile (CPU version)
JUPYTER_PROFILE_BUUN_STACK_ENABLED=true

# Enable buun-stack CUDA profile (GPU version)
JUPYTER_PROFILE_BUUN_STACK_CUDA_ENABLED=true

# Disable default datascience profile
JUPYTER_PROFILE_DATASCIENCE_ENABLED=false

Available profile variables:

JUPYTER_PROFILE_MINIMAL_ENABLED
JUPYTER_PROFILE_BASE_ENABLED
JUPYTER_PROFILE_DATASCIENCE_ENABLED
JUPYTER_PROFILE_PYSPARK_ENABLED
JUPYTER_PROFILE_PYTORCH_ENABLED
JUPYTER_PROFILE_TENSORFLOW_ENABLED
JUPYTER_PROFILE_BUUN_STACK_ENABLED
JUPYTER_PROFILE_BUUN_STACK_CUDA_ENABLED

Only JUPYTER_PROFILE_DATASCIENCE_ENABLED is true by default.

GPU Support

JupyterHub supports GPU-accelerated notebooks using NVIDIA GPUs. GPU support is automatically enabled during installation if the nvidia-device-plugin is detected.

GPU Prerequisites

GPU support requires the following components to be installed:

NVIDIA Device Plugin

Install the NVIDIA device plugin for Kubernetes:

just nvidia-device-plugin::install

This plugin:

Exposes NVIDIA GPUs to Kubernetes as schedulable resources
Manages GPU allocation to pods
Ensures proper GPU driver access within containers

RuntimeClass Configuration

The nvidia-device-plugin installation automatically creates the nvidia RuntimeClass, which:

Configures containerd to use the NVIDIA container runtime
Enables GPU access for containers using runtimeClassName: nvidia

Enabling GPU Support

During JupyterHub installation, you will be prompted:

just jupyterhub::install
# When nvidia-device-plugin is installed, you'll see:
# "Enable GPU support for JupyterHub notebooks? (y/N)"

Alternatively, set the environment variable before installation:

JUPYTERHUB_GPU_ENABLED=true
JUPYTERHUB_GPU_LIMIT=1  # Number of GPUs per user (default: 1)

GPU-Enabled Profiles

When GPU support is enabled:

All notebook profiles get GPU access via runtimeClassName: nvidia
CUDA-specific profile (buun-stack-cuda) additionally includes:
- CUDA 12.x toolkit
- PyTorch with CUDA support
- GPU-optimized libraries

Usage

Selecting a GPU Profile

When spawning a notebook, select a profile with GPU capabilities:

Buun-stack with CUDA: Recommended for GPU workloads (requires custom image)
PyTorch: Standard PyTorch notebook
TensorFlow: Standard TensorFlow notebook

Verifying GPU Access

In your notebook, verify GPU availability:

import torch

# Check if CUDA is available
print(f"CUDA available: {torch.cuda.is_available()}")

# Get GPU device count
print(f"GPU count: {torch.cuda.device_count()}")

# Get GPU device name
if torch.cuda.is_available():
    print(f"GPU name: {torch.cuda.get_device_name(0)}")

    # Test GPU operation
    torch.cuda.synchronize()
    print("GPU is working correctly!")

GPU Configuration

Default GPU configuration:

GPU limit per user: 1 GPU (configurable via JUPYTERHUB_GPU_LIMIT)
Memory requests: 1Gi (defined in singleuser settings)
RuntimeClass: nvidia (automatically applied when GPU enabled)

Building GPU-Enabled Custom Images

If using the buun-stack-cuda profile, build and push the CUDA-enabled image:

# Enable CUDA profile
export JUPYTER_PROFILE_BUUN_STACK_CUDA_ENABLED=true

# Build CUDA-enabled image (includes PyTorch with CUDA 12.x)
just jupyterhub::build-kernel-images

# Push to registry
just jupyterhub::push-kernel-images

The CUDA image:

Based on quay.io/jupyter/pytorch-notebook:x86_64-cuda12-python-3.12.10
Includes PyTorch with CUDA 12.4 support (cu124)
Contains all standard buun-stack packages
Supports GPU-accelerated deep learning

Troubleshooting GPU Issues

Pod Not Scheduling

If GPU-enabled pods fail to schedule:

# Check if nvidia-device-plugin is running
kubectl get pods -n nvidia-device-plugin

# Verify GPU resources are advertised
kubectl describe nodes | grep nvidia.com/gpu

# Check RuntimeClass exists
kubectl get runtimeclass nvidia

CUDA Not Available

If torch.cuda.is_available() returns False:

Verify the image has CUDA support:

# In notebook
!nvcc --version  # Should show CUDA compiler version

Check Pod uses nvidia RuntimeClass:

kubectl get pod <pod-name> -n datastack -o yaml | grep runtimeClassName

Rebuild image if using custom buun-stack-cuda image

GPU Memory Issues

Monitor GPU usage:

import torch

# Check GPU memory
if torch.cuda.is_available():
    print(f"Allocated: {torch.cuda.memory_allocated(0) / 1024**3:.2f} GB")
    print(f"Reserved: {torch.cuda.memory_reserved(0) / 1024**3:.2f} GB")

    # Clear cache if needed
    torch.cuda.empty_cache()

Buun-Stack Images

Buun-stack images provide comprehensive data science environments with:

All standard data science packages (NumPy, Pandas, Scikit-learn, etc.)
Deep learning frameworks (PyTorch, TensorFlow, Keras)
Big data tools (PySpark, Apache Arrow)
NLP and ML libraries (LangChain, Transformers, spaCy)
Database connectors and tools
Vault integration with buunstack Python package

Building Custom Images

Build and push buun-stack images to your registry:

# Build images (includes building the buunstack Python package)
just jupyterhub::build-kernel-images

# Push to registry
just jupyterhub::push-kernel-images

The build process:

Builds the buunstack Python package wheel
Copies the wheel into the Docker build context
Installs the wheel in the Docker image
Cleans up temporary files

⚠️ Note: Buun-stack images are comprehensive and large (~13GB). Initial image pulls and deployments take significant time due to the extensive package set.

Image Configuration

Configure image settings in .env.local:

# Image registry
IMAGE_REGISTRY=localhost:30500

# Image tag (current default)
JUPYTER_PYTHON_KERNEL_TAG=python-3.12-28

buunstack Package & SecretStore

JupyterHub includes the buunstack Python package, which provides seamless integration with HashiCorp Vault for secure secrets management in your notebooks.

Key Features

🔒 Secure Secrets Management: Store and retrieve secrets securely using HashiCorp Vault
🚀 Pre-acquired Authentication: Uses Vault tokens created automatically at notebook spawn
📱 Simple API: Easy-to-use interface similar to Google Colab's userdata.get()
🔄 Automatic Token Renewal: Built-in token refresh for long-running sessions

Quick Example

from buunstack import SecretStore

# Initialize with pre-acquired Vault token (automatic)
secrets = SecretStore()

# Store secrets
secrets.put('api-keys',
    openai_key='sk-your-key-here',
    github_token='ghp_your-token',
    database_url='postgresql://user:pass@host:5432/db'
)

# Retrieve secrets
api_keys = secrets.get('api-keys')
openai_key = api_keys['openai_key']

# Or get a specific field directly
openai_key = secrets.get('api-keys', field='openai_key')

Learn More

For detailed documentation, usage examples, and API reference, see:

📖 buunstack Package Documentation

Vault Integration

Overview

Vault integration enables secure secrets management directly from Jupyter notebooks. The system uses:

ExternalSecret to fetch the admin token from Vault
Renewable tokens with unlimited Max TTL to avoid 30-day system limitations
Token renewal script that automatically renews tokens at TTL/2 intervals (minimum 30 seconds)
User-specific tokens created during notebook spawn with isolated access

Architecture

┌────────────────────────────────────────────────────────────────┐
│                         JupyterHub Hub Pod                     │
│                                                                │
│  ┌──────────────┐  ┌────────────────┐  ┌────────────────────┐  │
│  │     Hub      │  │ Token Renewer  │  │  ExternalSecret    │  │
│  │  Container   │◄─┤   Sidecar      │◄─┤   (mounted as      │  │
│  │              │  │                │  │    Secret)         │  │
│  └──────────────┘  └────────────────┘  └────────────────────┘  │
│         │                    │                     ▲           │
│         │                    │                     │           │
│         ▼                    ▼                     │           │
│  ┌──────────────────────────────────┐              │           │
│  │    /vault/secrets/vault-token    │              │           │
│  │  (Admin token for user creation) │              │           │
│  └──────────────────────────────────┘              │           │
└────────────────────────────────────────────────────┼───────────┘
                                                     │
                                         ┌───────────▼──────────┐
                                         │       Vault          │
                                         │  secret/jupyterhub/  │
                                         │     vault-token      │
                                         └──────────────────────┘

Vault Integration Prerequisites

Vault integration requires:

Vault server installed and configured
External Secrets Operator installed
ClusterSecretStore configured for Vault
Buun-stack kernel images (standard images don't include Vault integration)

Setup

Vault integration is configured during JupyterHub installation:

just jupyterhub::install
# Answer "yes" when prompted about Vault integration
# Provide Vault root token when prompted

The setup process:

Creates jupyterhub-admin policy with necessary permissions including sudo for orphan token creation
Creates renewable admin token with 24h TTL and unlimited Max TTL
Stores token in Vault at secret/jupyterhub/vault-token
Creates ExternalSecret to fetch token from Vault
Deploys token renewal sidecar for automatic renewal

Usage in Notebooks

With Vault integration enabled, use the buunstack package in notebooks:

from buunstack import SecretStore

# Initialize (uses pre-acquired user-specific token)
secrets = SecretStore()

# Store secrets
secrets.put('api-keys',
    openai='sk-...',
    github='ghp_...',
    database_url='postgresql://...')

# Retrieve secrets
api_keys = secrets.get('api-keys')
openai_key = secrets.get('api-keys', field='openai')

# List all secrets
secret_names = secrets.list()

# Delete secrets or specific fields
secrets.delete('old-api-key')  # Delete entire secret
secrets.delete('api-keys', field='github')  # Delete only github field

Security Features

User isolation: Each user receives an orphan token with access only to their namespace
Automatic renewal: Token renewal script renews admin token at TTL/2 intervals (minimum 30 seconds)
ExternalSecret integration: Admin token fetched securely from Vault
Orphan tokens: User tokens are orphan tokens, not limited by parent policy restrictions
Audit trail: All secret access is logged in Vault

Token Management

Admin Token

The admin token is managed through:

Creation: just jupyterhub::create-jupyterhub-vault-token creates renewable token
Storage: Stored in Vault at secret/jupyterhub/vault-token
Retrieval: ExternalSecret fetches and mounts as Kubernetes Secret
Renewal: vault-token-renewer.sh script renews at TTL/2 intervals

User Tokens

User tokens are created dynamically:

Pre-spawn hook reads admin token from /vault/secrets/vault-token
Creates user policy jupyter-user-{username} with restricted access
Creates orphan token with user policy (requires sudo permission)
Sets environment variable NOTEBOOK_VAULT_TOKEN in notebook container

Token Renewal Implementation

Admin Token Renewal

The admin token renewal is handled by a sidecar container (vault-token-renewer) running alongside the JupyterHub hub:

Implementation Details:

Renewal Script: /vault/config/vault-token-renewer.sh
- Runs in the vault-token-renewer sidecar container
- Uses Vault 1.17.5 image with HashiCorp Vault CLI

Environment-Based TTL Configuration:

# Reads TTL from environment variable (set in .env.local)
TTL_RAW="${JUPYTERHUB_VAULT_TOKEN_TTL}"  # e.g., "5m", "24h"

# Converts to seconds and calculates renewal interval
RENEWAL_INTERVAL=$((TTL_SECONDS / 2))  # TTL/2 with minimum 30s

Token Source: ExternalSecret → Kubernetes Secret → mounted file

# Token retrieved from ExternalSecret-managed mount
ADMIN_TOKEN=$(cat /vault/admin-token/token)

Renewal Loop:

while true; do
    vault token renew >/dev/null 2>&1
    sleep $RENEWAL_INTERVAL
done

Error Handling: If renewal fails, re-retrieves token from ExternalSecret mount

Key Files:

vault-token-renewer.sh: Main renewal script
jupyterhub-vault-token-external-secret.gomplate.yaml: ExternalSecret configuration
vault-token-renewer-config ConfigMap: Contains the renewal script

User Token Renewal

User token renewal is handled within the notebook environment by the buunstack Python package:

Implementation Details:

Token Source: Environment variable set by pre-spawn hook

# In pre_spawn_hook.gomplate.py
spawner.environment["NOTEBOOK_VAULT_TOKEN"] = user_vault_token

Automatic Renewal: Built into SecretStore class operations

# In buunstack/secrets.py
def _ensure_authenticated(self):
    token_info = self.client.auth.token.lookup_self()
    ttl = token_info.get("data", {}).get("ttl", 0)
    renewable = token_info.get("data", {}).get("renewable", False)

    # Renew if TTL < 10 minutes and renewable
    if renewable and ttl > 0 and ttl < 600:
        self.client.auth.token.renew_self()

Renewal Trigger: Every SecretStore operation (get, put, delete, list)
- Checks token validity before operation
- Automatically renews if TTL < 10 minutes
- Transparent to user code
Token Configuration (set during creation):
- TTL: NOTEBOOK_VAULT_TOKEN_TTL (default: 24h = 1 day)
- Max TTL: NOTEBOOK_VAULT_TOKEN_MAX_TTL (default: 168h = 7 days)
- Policy: User-specific jupyter-user-{username}
- Type: Orphan token (independent of parent token lifecycle)
Expiry Handling: When token reaches Max TTL:
- Cannot be renewed further
- User must restart notebook server (triggers new token creation)
- Prevented by JUPYTERHUB_CULL_MAX_AGE setting (6 days < 7 day Max TTL)

Key Files:

pre_spawn_hook.gomplate.py: User token creation logic
buunstack/secrets.py: Token renewal implementation
user_policy.hcl: User token permissions template

Token Lifecycle Summary

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   Admin Token   │    │   User Token     │    │  Pod Lifecycle  │
│                 │    │                  │    │                 │
│ Created: Manual │    │ Created: Spawn   │    │ Max Age: 7 days │
│ TTL: 5m-24h     │    │ TTL: 1 day       │    │ Auto-restart    │
│ Max TTL: ∞      │    │ Max TTL: 7 days  │    │ at Max TTL      │
│ Renewal: Auto   │    │ Renewal: Auto    │    │                 │
│ Interval: TTL/2 │    │ Trigger: Usage   │    │                 │
└─────────────────┘    └──────────────────┘    └─────────────────┘
         │                       │                       │
         ▼                       ▼                       ▼
   vault-token-renewer      buunstack.py            cull.maxAge
   sidecar                  SecretStore            pod restart

Storage Options

Default Storage

Uses Kubernetes PersistentVolumes for user home directories.

NFS Storage

For shared storage across nodes, configure NFS:

JUPYTERHUB_NFS_PV_ENABLED=true
JUPYTER_NFS_IP=192.168.10.1
JUPYTER_NFS_PATH=/volume1/drive1/jupyter

NFS storage requires:

Longhorn storage system installed
NFS server accessible from cluster nodes
Proper NFS export permissions configured

Configuration

Environment Variables

Key configuration variables:

# Basic settings
JUPYTERHUB_NAMESPACE=jupyter
JUPYTERHUB_CHART_VERSION=4.2.0
JUPYTERHUB_OIDC_CLIENT_ID=jupyterhub

# Keycloak integration
KEYCLOAK_REALM=buunstack

# Storage
JUPYTERHUB_NFS_PV_ENABLED=false

# Vault integration
JUPYTERHUB_VAULT_INTEGRATION_ENABLED=false
VAULT_ADDR=https://vault.example.com

# Image settings
JUPYTER_PYTHON_KERNEL_TAG=python-3.12-28
IMAGE_REGISTRY=localhost:30500

# Vault token TTL settings
JUPYTERHUB_VAULT_TOKEN_TTL=24h       # Admin token: renewed at TTL/2 intervals
NOTEBOOK_VAULT_TOKEN_TTL=24h         # User token: 1 day (renewed on usage)
NOTEBOOK_VAULT_TOKEN_MAX_TTL=168h    # User token: 7 days max

# Server pod lifecycle settings
JUPYTERHUB_CULL_MAX_AGE=604800       # Max pod age in seconds (7 days = 604800s)
                                     # Should be <= NOTEBOOK_VAULT_TOKEN_MAX_TTL

# Logging
JUPYTER_BUUNSTACK_LOG_LEVEL=warning  # Options: debug, info, warning, error

Advanced Configuration

Customize JupyterHub behavior by editing jupyterhub-values.gomplate.yaml template before installation.

Custom Container Images

JupyterHub uses custom container images with pre-installed data science tools and integrations:

datastack-notebook (CPU)

Standard notebook image based on jupyter/pytorch-notebook:

PyTorch: Deep learning framework
PySpark: Apache Spark integration for big data processing
ClickHouse Client: Direct database access
Python 3.12: Latest Python runtime

📖 See Image Documentation

datastack-cuda-notebook (GPU)

GPU-enabled notebook image based on jupyter/pytorch-notebook:cuda12:

CUDA 12: GPU acceleration support
PyTorch with GPU: Hardware-accelerated deep learning
PySpark: Apache Spark integration
ClickHouse Client: Direct database access
Python 3.12: Latest Python runtime

📖 See Image Documentation

Both images are based on the official Jupyter Docker Stacks and include all standard data science libraries (NumPy, pandas, scikit-learn, matplotlib, etc.).

Management

Uninstall

just jupyterhub::uninstall

This removes:

JupyterHub deployment
User pods
PVCs
ExternalSecret

Update

Upgrade to newer versions:

# Update image tag in .env.local
export JUPYTER_PYTHON_KERNEL_TAG=python-3.12-29

# Rebuild and push images
just jupyterhub::build-kernel-images
just jupyterhub::push-kernel-images

# Upgrade JupyterHub deployment
just jupyterhub::install

Manual Token Refresh

If needed, manually refresh the admin token:

# Create new renewable token
just jupyterhub::create-jupyterhub-vault-token

# Restart JupyterHub to pick up new token
kubectl rollout restart deployment/hub -n jupyter

Troubleshooting

Image Pull Issues

Buun-stack images are large and may timeout:

# Check pod status
kubectl get pods -n jupyter

# Check image pull progress
kubectl describe pod <pod-name> -n jupyter

# Increase timeout if needed
helm upgrade jupyterhub jupyterhub/jupyterhub --timeout=30m -f jupyterhub-values.yaml

Vault Integration Issues

Check token and authentication:

# Check ExternalSecret status
kubectl get externalsecret -n jupyter jupyterhub-vault-token

# Check if Secret was created
kubectl get secret -n jupyter jupyterhub-vault-token

# Check token renewal logs
kubectl logs -n jupyter -l app.kubernetes.io/component=hub -c vault-token-renewer

# In a notebook, verify environment
%env NOTEBOOK_VAULT_TOKEN

Common issues:

"child policies must be subset of parent": Admin policy needs sudo permission for orphan tokens
Token not found: Check ExternalSecret and ClusterSecretStore configuration
Permission denied: Verify jupyterhub-admin policy has all required permissions

Authentication Issues

Verify Keycloak client configuration:

# Check client exists
just keycloak::get-client buunstack jupyterhub

# Check redirect URIs
just keycloak::update-client buunstack jupyterhub \
  "https://your-jupyter-host/hub/oauth_callback"

Technical Implementation Details

Helm Chart Version

JupyterHub uses the official Zero to JupyterHub (Z2JH) Helm chart:

Chart: jupyterhub/jupyterhub
Version: 4.2.0 (configurable via JUPYTERHUB_CHART_VERSION)
Documentation: https://z2jh.jupyter.org/

Token System Architecture

The system uses a three-tier token approach:

Renewable Admin Token:
- Created with explicit-max-ttl=0 (unlimited Max TTL)
- Renewed automatically at TTL/2 intervals (minimum 30 seconds)
- Stored in Vault and fetched via ExternalSecret
Orphan User Tokens:
- Created with create_orphan() API call
- Not limited by parent token policies
- Individual TTL and Max TTL settings
Token Renewal Script:
- Runs as sidecar container
- Reads token from ExternalSecret mount
- Handles renewal and re-retrieval on failure

Key Files

jupyterhub-admin-policy.hcl: Vault policy with admin permissions
user_policy.hcl: Template for user-specific policies
vault-token-renewer.sh: Token renewal script
jupyterhub-vault-token-external-secret.gomplate.yaml: ExternalSecret configuration

Performance Considerations

Image Size: Buun-stack images are ~13GB, plan storage accordingly
Pull Time: Initial pulls take 5-15 minutes depending on network
Resource Usage: Data science workloads require adequate CPU/memory
Token Renewal: Minimal overhead (renewal at TTL/2 intervals)

For production deployments, consider:

Pre-pulling images to all nodes
Using faster storage backends
Configuring resource limits per user
Setting up monitoring and alerts

Known Limitations

Annual Token Recreation: While tokens have unlimited Max TTL, best practice suggests recreating them annually
Token Expiry and Pod Lifecycle: User tokens have a TTL of 1 day (NOTEBOOK_VAULT_TOKEN_TTL=24h) and maximum TTL of 7 days (NOTEBOOK_VAULT_TOKEN_MAX_TTL=168h). Daily usage extends the token for another day, allowing up to 7 days of continuous use. Server pods are automatically restarted after 7 days (JUPYTERHUB_CULL_MAX_AGE=604800s) to refresh tokens.
Cull Settings: Server idle timeout is set to 2 hours by default. Adjust cull.timeout and cull.every in the Helm values for different requirements
NFS Storage: When using NFS storage, ensure proper permissions are set on the NFS server. The default JUPYTER_FSGID is 100
ExternalSecret Dependency: Requires External Secrets Operator to be installed and configured

34 KiB Raw Blame History

JupyterHub

Table of Contents

Installation

Prerequisites

Access

MCP Server Integration

Overview

Enabling MCP Server

Kernel Image Requirements

MCP Endpoint

Authentication

Client Configuration

Generic MCP Client Configuration

Claude Code Configuration

Checking MCP Status

Technical Details

Programmatic API Access

What is a JupyterHub Service?

How It Works

Service Configuration

RBAC Permissions

Usage

Get Token for a User

Manual Token Management

Security Considerations

Kernel Images

Important Note

Standard Profiles

Buun-Stack Profiles

Profile Configuration

GPU Support

GPU Prerequisites

NVIDIA Device Plugin

RuntimeClass Configuration

Enabling GPU Support

GPU-Enabled Profiles

Usage

Selecting a GPU Profile

Verifying GPU Access

GPU Configuration

Building GPU-Enabled Custom Images

Troubleshooting GPU Issues

Pod Not Scheduling

CUDA Not Available

GPU Memory Issues

Buun-Stack Images

Building Custom Images

Image Configuration

buunstack Package & SecretStore

Key Features

Quick Example

Learn More

Vault Integration

Overview

Architecture

Vault Integration Prerequisites

Setup

Usage in Notebooks

Security Features

Token Management

Admin Token

User Tokens

Token Renewal Implementation

Admin Token Renewal

User Token Renewal

Token Lifecycle Summary

Storage Options

Default Storage

NFS Storage

Configuration

Environment Variables

Advanced Configuration

Custom Container Images

datastack-notebook (CPU)

datastack-cuda-notebook (GPU)

Management

Uninstall

Update

Manual Token Refresh

34 KiB

Raw Blame History