docs: write jupyterhub doc

2025-08-31 22:34:11 +09:00
parent 972805aa65
commit e6d130c3a8
4 changed files with 316 additions and 1 deletions
--- a/.markdownlint.yaml
+++ b/.markdownlint.yaml
@@ -1,6 +1,7 @@
 default: true
-no-bare-urls: false
 line-length: false
+no-bare-urls: false
+no-duplicate-heading: false
 no-inline-html: false
 ul-indent:
  indent: 4
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -143,4 +143,5 @@ When adding new services:
 - It must pass the command: `just --fmt --check --unstable`
 - Follow existing Justfile patterns
 - Only write code comments when necessary, as the code should be self-explanatory
+  (Avoid trivial comment for each code block)
 - Write output messages and code comments in English
--- a/README.md
+++ b/README.md
@@ -112,6 +112,9 @@ Multi-user platform for interactive computing:
 - Integrated with Keycloak for OIDC authentication
 - Persistent storage for user workspaces
 - Support for multiple kernels and environments
+- Vault integration for secure secrets management
+
+See [JupyterHub Documentation](./docs/jupyterhub.md) for detailed setup and configuration.

 ## Common Operations

--- a/docs/jupyterhub.md
+++ b/docs/jupyterhub.md
@@ -0,0 +1,310 @@
+# JupyterHub
+
+JupyterHub provides a multi-user Jupyter notebook environment with Keycloak OIDC authentication, Vault integration for secure secrets management, and custom kernel images for data science workflows.
+
+## Installation
+
+Install JupyterHub with interactive configuration:
+
+```bash
+just jupyterhub::install
+```
+
+This will prompt for:
+
+- JupyterHub host (FQDN)
+- NFS PV usage (if Longhorn is installed)
+- NFS server details (if NFS is enabled)
+- Vault integration setup
+
+### Prerequisites
+
+- Keycloak must be installed and configured
+- For NFS storage: Longhorn must be installed
+- For Vault integration: Vault must be installed and configured
+
+## Kernel Images
+
+JupyterHub supports multiple kernel image profiles:
+
+### Standard Profiles
+
+- **minimal**: Basic Python environment
+- **base**: Python with common data science packages
+- **datascience**: Full data science stack (default)
+- **pyspark**: PySpark for big data processing
+- **pytorch**: PyTorch for machine learning
+- **tensorflow**: TensorFlow for machine learning
+
+### Buun-Stack Profiles
+
+- **buun-stack**: Comprehensive data science environment with Vault integration
+- **buun-stack-cuda**: CUDA-enabled version with GPU support
+
+## Profile Configuration
+
+Enable/disable profiles using environment variables:
+
+```bash
+# Enable buun-stack profile (CPU version)
+export JUPYTER_PROFILE_BUUN_STACK_ENABLED=true
+
+# Enable buun-stack CUDA profile (GPU version)
+export JUPYTER_PROFILE_BUUN_STACK_CUDA_ENABLED=true
+
+# Disable default datascience profile
+export JUPYTER_PROFILE_DATASCIENCE_ENABLED=false
+```
+
+Available profile variables:
+
+- `JUPYTER_PROFILE_MINIMAL_ENABLED`
+- `JUPYTER_PROFILE_BASE_ENABLED`
+- `JUPYTER_PROFILE_DATASCIENCE_ENABLED`
+- `JUPYTER_PROFILE_PYSPARK_ENABLED`
+- `JUPYTER_PROFILE_PYTORCH_ENABLED`
+- `JUPYTER_PROFILE_TENSORFLOW_ENABLED`
+- `JUPYTER_PROFILE_BUUN_STACK_ENABLED`
+- `JUPYTER_PROFILE_BUUN_STACK_CUDA_ENABLED`
+
+Only `JUPYTER_PROFILE_DATASCIENCE_ENABLED` is true by default.
+
+## Buun-Stack Images
+
+Buun-stack images provide comprehensive data science environments with:
+
+- All standard data science packages (NumPy, Pandas, Scikit-learn, etc.)
+- Deep learning frameworks (PyTorch, TensorFlow, Keras)
+- Big data tools (PySpark, Apache Arrow)
+- NLP and ML libraries (LangChain, Transformers, spaCy)
+- Database connectors and tools
+- **Vault integration** with `buunstack` Python package
+
+### Building Custom Images
+
+Build and push buun-stack images to your registry:
+
+```bash
+# Build images
+just jupyterhub::build-kernel-images
+
+# Push to registry
+just jupyterhub::push-kernel-images
+```
+
+⚠️ **Note**: Buun-stack images are comprehensive and large (~13GB). Initial image pulls and deployments take significant time due to the extensive package set.
+
+### Image Configuration
+
+Configure image settings in `.env.local`:
+
+```bash
+# Image registry
+IMAGE_REGISTRY=localhost:30500
+
+# Image tag
+JUPYTER_PYTHON_KERNEL_TAG=python-3.12-1
+```
+
+## Vault Integration
+
+### Overview
+
+Vault integration enables secure secrets management directly from Jupyter notebooks without re-authentication. Users can store and retrieve API keys, database credentials, and other sensitive data securely.
+
+### Prerequisites
+
+Vault integration requires:
+
+- Vault server installed and configured
+- Keycloak OIDC authentication configured
+- **Buun-stack kernel images** (standard images don't include Vault integration)
+
+### Setup
+
+Enable Vault integration during installation:
+
+```bash
+# Set environment variable before installation or answer yes to prompt during install
+export JUPYTERHUB_VAULT_INTEGRATION_ENABLED=true
+just jupyterhub::install
+```
+
+Or configure manually:
+
+```bash
+# Setup Vault JWT authentication for JupyterHub
+just jupyterhub::setup-vault-jwt-auth
+```
+
+### Usage in Notebooks
+
+With Vault integration enabled, use the `buunstack` package in notebooks:
+
+```python
+from buunstack import SecretStore
+
+# Initialize (uses JupyterHub session authentication)
+secrets = SecretStore()
+
+# Store secrets
+secrets.put('api-keys', 
+    openai='sk-...',
+    github='ghp_...',
+    database_url='postgresql://...')
+
+# Retrieve secrets
+api_keys = secrets.get('api-keys')
+openai_key = secrets.get('api-keys', field='openai')
+
+# List all secrets
+secret_names = secrets.list()
+
+# Delete secrets
+secrets.delete('old-api-key')
+```
+
+### Security Features
+
+- **User isolation**: Each user can only access their own secrets
+- **Automatic token refresh**: Background token management prevents authentication failures
+- **Audit trail**: All secret access is logged in Vault
+- **No re-authentication**: Uses existing JupyterHub OIDC session
+
+## Storage Options
+
+### Default Storage
+
+Uses Kubernetes PersistentVolumes for user home directories.
+
+### NFS Storage
+
+For shared storage across nodes, configure NFS:
+
+```bash
+export JUPYTERHUB_NFS_PV_ENABLED=true
+export JUPYTER_NFS_IP=192.168.10.1
+export JUPYTER_NFS_PATH=/volume1/drive1/jupyter
+```
+
+NFS storage requires:
+
+- Longhorn storage system installed
+- NFS server accessible from cluster nodes
+- Proper NFS export permissions configured
+
+## Configuration
+
+### Environment Variables
+
+Key configuration variables:
+
+```bash
+# Basic settings
+JUPYTERHUB_NAMESPACE=jupyter
+JUPYTERHUB_CHART_VERSION=4.2.0
+JUPYTERHUB_OIDC_CLIENT_ID=jupyterhub
+
+# Keycloak integration
+KEYCLOAK_REALM=buunstack
+
+# Storage
+JUPYTERHUB_NFS_PV_ENABLED=false
+
+# Vault integration
+JUPYTERHUB_VAULT_INTEGRATION_ENABLED=false
+VAULT_ADDR=http://vault.vault.svc:8200
+
+# Image settings
+JUPYTER_PYTHON_KERNEL_TAG=python-3.12-6
+IMAGE_REGISTRY=localhost:30500
+```
+
+### Advanced Configuration
+
+Customize JupyterHub behavior by editing `jupyterhub-values.gomplate.yaml` template before installation.
+
+## Management
+
+### Uninstall
+
+```bash
+just jupyterhub::uninstall
+```
+
+### Update
+
+Upgrade to newer versions:
+
+```bash
+# Update image tag
+export JUPYTER_PYTHON_KERNEL_TAG=python-3.12-2
+
+# Rebuild and push images
+just jupyterhub::push-kernel-images
+
+# Upgrade JupyterHub deployment
+just jupyterhub::install
+```
+
+## Troubleshooting
+
+### Image Pull Issues
+
+Buun-stack images are large and may timeout:
+
+```bash
+# Check pod status
+kubectl get pods -n jupyter
+
+# Check image pull progress
+kubectl describe pod <pod-name> -n jupyter
+
+# Increase timeout if needed
+helm upgrade jupyterhub jupyterhub/jupyterhub \
+  --timeout=30m -f jupyterhub-values.yaml
+```
+
+### Vault Integration Issues
+
+Check Vault connectivity and authentication:
+
+```python
+# In a notebook
+import os
+print("Vault Address:", os.getenv('VAULT_ADDR'))
+print("Access Token:", bool(os.getenv('JUPYTERHUB_OIDC_ACCESS_TOKEN')))
+
+# Test SecretStore
+from buunstack import SecretStore
+secrets = SecretStore()
+status = secrets.get_status()
+print(status)
+```
+
+### Authentication Issues
+
+Verify Keycloak client configuration:
+
+```bash
+# Check client exists
+just keycloak::get-client buunstack jupyterhub
+
+# Check redirect URIs
+just keycloak::update-client buunstack jupyterhub \
+  "https://your-jupyter-host/hub/oauth_callback"
+```
+
+## Performance Considerations
+
+- **Image Size**: Buun-stack images are ~13GB, plan storage accordingly
+- **Pull Time**: Initial pulls take 5-15 minutes depending on network
+- **Resource Usage**: Data science workloads require adequate CPU/memory
+- **Storage**: NFS provides better performance for shared datasets
+
+For production deployments, consider:
+
+- Pre-pulling images to all nodes
+- Using faster storage backends
+- Configuring resource limits per user
+- Setting up monitoring and alerts