From e6d130c3a85124f0845bb1bd5993627549d997bc Mon Sep 17 00:00:00 2001 From: Masaki Yatsu Date: Sun, 31 Aug 2025 22:34:11 +0900 Subject: [PATCH] docs: write jupyterhub doc --- .markdownlint.yaml | 3 +- CLAUDE.md | 1 + README.md | 3 + docs/jupyterhub.md | 310 +++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 316 insertions(+), 1 deletion(-) create mode 100644 docs/jupyterhub.md diff --git a/.markdownlint.yaml b/.markdownlint.yaml index b03e995..152b285 100644 --- a/.markdownlint.yaml +++ b/.markdownlint.yaml @@ -1,6 +1,7 @@ default: true -no-bare-urls: false line-length: false +no-bare-urls: false +no-duplicate-heading: false no-inline-html: false ul-indent: indent: 4 diff --git a/CLAUDE.md b/CLAUDE.md index 6b92903..d06d336 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -143,4 +143,5 @@ When adding new services: - It must pass the command: `just --fmt --check --unstable` - Follow existing Justfile patterns - Only write code comments when necessary, as the code should be self-explanatory + (Avoid trivial comment for each code block) - Write output messages and code comments in English diff --git a/README.md b/README.md index 62a3439..1bd3c85 100644 --- a/README.md +++ b/README.md @@ -112,6 +112,9 @@ Multi-user platform for interactive computing: - Integrated with Keycloak for OIDC authentication - Persistent storage for user workspaces - Support for multiple kernels and environments +- Vault integration for secure secrets management + +See [JupyterHub Documentation](./docs/jupyterhub.md) for detailed setup and configuration. ## Common Operations diff --git a/docs/jupyterhub.md b/docs/jupyterhub.md new file mode 100644 index 0000000..68880f8 --- /dev/null +++ b/docs/jupyterhub.md @@ -0,0 +1,310 @@ +# JupyterHub + +JupyterHub provides a multi-user Jupyter notebook environment with Keycloak OIDC authentication, Vault integration for secure secrets management, and custom kernel images for data science workflows. + +## Installation + +Install JupyterHub with interactive configuration: + +```bash +just jupyterhub::install +``` + +This will prompt for: + +- JupyterHub host (FQDN) +- NFS PV usage (if Longhorn is installed) +- NFS server details (if NFS is enabled) +- Vault integration setup + +### Prerequisites + +- Keycloak must be installed and configured +- For NFS storage: Longhorn must be installed +- For Vault integration: Vault must be installed and configured + +## Kernel Images + +JupyterHub supports multiple kernel image profiles: + +### Standard Profiles + +- **minimal**: Basic Python environment +- **base**: Python with common data science packages +- **datascience**: Full data science stack (default) +- **pyspark**: PySpark for big data processing +- **pytorch**: PyTorch for machine learning +- **tensorflow**: TensorFlow for machine learning + +### Buun-Stack Profiles + +- **buun-stack**: Comprehensive data science environment with Vault integration +- **buun-stack-cuda**: CUDA-enabled version with GPU support + +## Profile Configuration + +Enable/disable profiles using environment variables: + +```bash +# Enable buun-stack profile (CPU version) +export JUPYTER_PROFILE_BUUN_STACK_ENABLED=true + +# Enable buun-stack CUDA profile (GPU version) +export JUPYTER_PROFILE_BUUN_STACK_CUDA_ENABLED=true + +# Disable default datascience profile +export JUPYTER_PROFILE_DATASCIENCE_ENABLED=false +``` + +Available profile variables: + +- `JUPYTER_PROFILE_MINIMAL_ENABLED` +- `JUPYTER_PROFILE_BASE_ENABLED` +- `JUPYTER_PROFILE_DATASCIENCE_ENABLED` +- `JUPYTER_PROFILE_PYSPARK_ENABLED` +- `JUPYTER_PROFILE_PYTORCH_ENABLED` +- `JUPYTER_PROFILE_TENSORFLOW_ENABLED` +- `JUPYTER_PROFILE_BUUN_STACK_ENABLED` +- `JUPYTER_PROFILE_BUUN_STACK_CUDA_ENABLED` + +Only `JUPYTER_PROFILE_DATASCIENCE_ENABLED` is true by default. + +## Buun-Stack Images + +Buun-stack images provide comprehensive data science environments with: + +- All standard data science packages (NumPy, Pandas, Scikit-learn, etc.) +- Deep learning frameworks (PyTorch, TensorFlow, Keras) +- Big data tools (PySpark, Apache Arrow) +- NLP and ML libraries (LangChain, Transformers, spaCy) +- Database connectors and tools +- **Vault integration** with `buunstack` Python package + +### Building Custom Images + +Build and push buun-stack images to your registry: + +```bash +# Build images +just jupyterhub::build-kernel-images + +# Push to registry +just jupyterhub::push-kernel-images +``` + +⚠️ **Note**: Buun-stack images are comprehensive and large (~13GB). Initial image pulls and deployments take significant time due to the extensive package set. + +### Image Configuration + +Configure image settings in `.env.local`: + +```bash +# Image registry +IMAGE_REGISTRY=localhost:30500 + +# Image tag +JUPYTER_PYTHON_KERNEL_TAG=python-3.12-1 +``` + +## Vault Integration + +### Overview + +Vault integration enables secure secrets management directly from Jupyter notebooks without re-authentication. Users can store and retrieve API keys, database credentials, and other sensitive data securely. + +### Prerequisites + +Vault integration requires: + +- Vault server installed and configured +- Keycloak OIDC authentication configured +- **Buun-stack kernel images** (standard images don't include Vault integration) + +### Setup + +Enable Vault integration during installation: + +```bash +# Set environment variable before installation or answer yes to prompt during install +export JUPYTERHUB_VAULT_INTEGRATION_ENABLED=true +just jupyterhub::install +``` + +Or configure manually: + +```bash +# Setup Vault JWT authentication for JupyterHub +just jupyterhub::setup-vault-jwt-auth +``` + +### Usage in Notebooks + +With Vault integration enabled, use the `buunstack` package in notebooks: + +```python +from buunstack import SecretStore + +# Initialize (uses JupyterHub session authentication) +secrets = SecretStore() + +# Store secrets +secrets.put('api-keys', + openai='sk-...', + github='ghp_...', + database_url='postgresql://...') + +# Retrieve secrets +api_keys = secrets.get('api-keys') +openai_key = secrets.get('api-keys', field='openai') + +# List all secrets +secret_names = secrets.list() + +# Delete secrets +secrets.delete('old-api-key') +``` + +### Security Features + +- **User isolation**: Each user can only access their own secrets +- **Automatic token refresh**: Background token management prevents authentication failures +- **Audit trail**: All secret access is logged in Vault +- **No re-authentication**: Uses existing JupyterHub OIDC session + +## Storage Options + +### Default Storage + +Uses Kubernetes PersistentVolumes for user home directories. + +### NFS Storage + +For shared storage across nodes, configure NFS: + +```bash +export JUPYTERHUB_NFS_PV_ENABLED=true +export JUPYTER_NFS_IP=192.168.10.1 +export JUPYTER_NFS_PATH=/volume1/drive1/jupyter +``` + +NFS storage requires: + +- Longhorn storage system installed +- NFS server accessible from cluster nodes +- Proper NFS export permissions configured + +## Configuration + +### Environment Variables + +Key configuration variables: + +```bash +# Basic settings +JUPYTERHUB_NAMESPACE=jupyter +JUPYTERHUB_CHART_VERSION=4.2.0 +JUPYTERHUB_OIDC_CLIENT_ID=jupyterhub + +# Keycloak integration +KEYCLOAK_REALM=buunstack + +# Storage +JUPYTERHUB_NFS_PV_ENABLED=false + +# Vault integration +JUPYTERHUB_VAULT_INTEGRATION_ENABLED=false +VAULT_ADDR=http://vault.vault.svc:8200 + +# Image settings +JUPYTER_PYTHON_KERNEL_TAG=python-3.12-6 +IMAGE_REGISTRY=localhost:30500 +``` + +### Advanced Configuration + +Customize JupyterHub behavior by editing `jupyterhub-values.gomplate.yaml` template before installation. + +## Management + +### Uninstall + +```bash +just jupyterhub::uninstall +``` + +### Update + +Upgrade to newer versions: + +```bash +# Update image tag +export JUPYTER_PYTHON_KERNEL_TAG=python-3.12-2 + +# Rebuild and push images +just jupyterhub::push-kernel-images + +# Upgrade JupyterHub deployment +just jupyterhub::install +``` + +## Troubleshooting + +### Image Pull Issues + +Buun-stack images are large and may timeout: + +```bash +# Check pod status +kubectl get pods -n jupyter + +# Check image pull progress +kubectl describe pod -n jupyter + +# Increase timeout if needed +helm upgrade jupyterhub jupyterhub/jupyterhub \ + --timeout=30m -f jupyterhub-values.yaml +``` + +### Vault Integration Issues + +Check Vault connectivity and authentication: + +```python +# In a notebook +import os +print("Vault Address:", os.getenv('VAULT_ADDR')) +print("Access Token:", bool(os.getenv('JUPYTERHUB_OIDC_ACCESS_TOKEN'))) + +# Test SecretStore +from buunstack import SecretStore +secrets = SecretStore() +status = secrets.get_status() +print(status) +``` + +### Authentication Issues + +Verify Keycloak client configuration: + +```bash +# Check client exists +just keycloak::get-client buunstack jupyterhub + +# Check redirect URIs +just keycloak::update-client buunstack jupyterhub \ + "https://your-jupyter-host/hub/oauth_callback" +``` + +## Performance Considerations + +- **Image Size**: Buun-stack images are ~13GB, plan storage accordingly +- **Pull Time**: Initial pulls take 5-15 minutes depending on network +- **Resource Usage**: Data science workloads require adequate CPU/memory +- **Storage**: NFS provides better performance for shared datasets + +For production deployments, consider: + +- Pre-pulling images to all nodes +- Using faster storage backends +- Configuring resource limits per user +- Setting up monitoring and alerts