docs: write jupyterhub doc
This commit is contained in:
@@ -1,6 +1,7 @@
|
|||||||
default: true
|
default: true
|
||||||
no-bare-urls: false
|
|
||||||
line-length: false
|
line-length: false
|
||||||
|
no-bare-urls: false
|
||||||
|
no-duplicate-heading: false
|
||||||
no-inline-html: false
|
no-inline-html: false
|
||||||
ul-indent:
|
ul-indent:
|
||||||
indent: 4
|
indent: 4
|
||||||
|
|||||||
@@ -143,4 +143,5 @@ When adding new services:
|
|||||||
- It must pass the command: `just --fmt --check --unstable`
|
- It must pass the command: `just --fmt --check --unstable`
|
||||||
- Follow existing Justfile patterns
|
- Follow existing Justfile patterns
|
||||||
- Only write code comments when necessary, as the code should be self-explanatory
|
- Only write code comments when necessary, as the code should be self-explanatory
|
||||||
|
(Avoid trivial comment for each code block)
|
||||||
- Write output messages and code comments in English
|
- Write output messages and code comments in English
|
||||||
|
|||||||
@@ -112,6 +112,9 @@ Multi-user platform for interactive computing:
|
|||||||
- Integrated with Keycloak for OIDC authentication
|
- Integrated with Keycloak for OIDC authentication
|
||||||
- Persistent storage for user workspaces
|
- Persistent storage for user workspaces
|
||||||
- Support for multiple kernels and environments
|
- Support for multiple kernels and environments
|
||||||
|
- Vault integration for secure secrets management
|
||||||
|
|
||||||
|
See [JupyterHub Documentation](./docs/jupyterhub.md) for detailed setup and configuration.
|
||||||
|
|
||||||
## Common Operations
|
## Common Operations
|
||||||
|
|
||||||
|
|||||||
310
docs/jupyterhub.md
Normal file
310
docs/jupyterhub.md
Normal file
@@ -0,0 +1,310 @@
|
|||||||
|
# JupyterHub
|
||||||
|
|
||||||
|
JupyterHub provides a multi-user Jupyter notebook environment with Keycloak OIDC authentication, Vault integration for secure secrets management, and custom kernel images for data science workflows.
|
||||||
|
|
||||||
|
## Installation
|
||||||
|
|
||||||
|
Install JupyterHub with interactive configuration:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
just jupyterhub::install
|
||||||
|
```
|
||||||
|
|
||||||
|
This will prompt for:
|
||||||
|
|
||||||
|
- JupyterHub host (FQDN)
|
||||||
|
- NFS PV usage (if Longhorn is installed)
|
||||||
|
- NFS server details (if NFS is enabled)
|
||||||
|
- Vault integration setup
|
||||||
|
|
||||||
|
### Prerequisites
|
||||||
|
|
||||||
|
- Keycloak must be installed and configured
|
||||||
|
- For NFS storage: Longhorn must be installed
|
||||||
|
- For Vault integration: Vault must be installed and configured
|
||||||
|
|
||||||
|
## Kernel Images
|
||||||
|
|
||||||
|
JupyterHub supports multiple kernel image profiles:
|
||||||
|
|
||||||
|
### Standard Profiles
|
||||||
|
|
||||||
|
- **minimal**: Basic Python environment
|
||||||
|
- **base**: Python with common data science packages
|
||||||
|
- **datascience**: Full data science stack (default)
|
||||||
|
- **pyspark**: PySpark for big data processing
|
||||||
|
- **pytorch**: PyTorch for machine learning
|
||||||
|
- **tensorflow**: TensorFlow for machine learning
|
||||||
|
|
||||||
|
### Buun-Stack Profiles
|
||||||
|
|
||||||
|
- **buun-stack**: Comprehensive data science environment with Vault integration
|
||||||
|
- **buun-stack-cuda**: CUDA-enabled version with GPU support
|
||||||
|
|
||||||
|
## Profile Configuration
|
||||||
|
|
||||||
|
Enable/disable profiles using environment variables:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Enable buun-stack profile (CPU version)
|
||||||
|
export JUPYTER_PROFILE_BUUN_STACK_ENABLED=true
|
||||||
|
|
||||||
|
# Enable buun-stack CUDA profile (GPU version)
|
||||||
|
export JUPYTER_PROFILE_BUUN_STACK_CUDA_ENABLED=true
|
||||||
|
|
||||||
|
# Disable default datascience profile
|
||||||
|
export JUPYTER_PROFILE_DATASCIENCE_ENABLED=false
|
||||||
|
```
|
||||||
|
|
||||||
|
Available profile variables:
|
||||||
|
|
||||||
|
- `JUPYTER_PROFILE_MINIMAL_ENABLED`
|
||||||
|
- `JUPYTER_PROFILE_BASE_ENABLED`
|
||||||
|
- `JUPYTER_PROFILE_DATASCIENCE_ENABLED`
|
||||||
|
- `JUPYTER_PROFILE_PYSPARK_ENABLED`
|
||||||
|
- `JUPYTER_PROFILE_PYTORCH_ENABLED`
|
||||||
|
- `JUPYTER_PROFILE_TENSORFLOW_ENABLED`
|
||||||
|
- `JUPYTER_PROFILE_BUUN_STACK_ENABLED`
|
||||||
|
- `JUPYTER_PROFILE_BUUN_STACK_CUDA_ENABLED`
|
||||||
|
|
||||||
|
Only `JUPYTER_PROFILE_DATASCIENCE_ENABLED` is true by default.
|
||||||
|
|
||||||
|
## Buun-Stack Images
|
||||||
|
|
||||||
|
Buun-stack images provide comprehensive data science environments with:
|
||||||
|
|
||||||
|
- All standard data science packages (NumPy, Pandas, Scikit-learn, etc.)
|
||||||
|
- Deep learning frameworks (PyTorch, TensorFlow, Keras)
|
||||||
|
- Big data tools (PySpark, Apache Arrow)
|
||||||
|
- NLP and ML libraries (LangChain, Transformers, spaCy)
|
||||||
|
- Database connectors and tools
|
||||||
|
- **Vault integration** with `buunstack` Python package
|
||||||
|
|
||||||
|
### Building Custom Images
|
||||||
|
|
||||||
|
Build and push buun-stack images to your registry:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Build images
|
||||||
|
just jupyterhub::build-kernel-images
|
||||||
|
|
||||||
|
# Push to registry
|
||||||
|
just jupyterhub::push-kernel-images
|
||||||
|
```
|
||||||
|
|
||||||
|
⚠️ **Note**: Buun-stack images are comprehensive and large (~13GB). Initial image pulls and deployments take significant time due to the extensive package set.
|
||||||
|
|
||||||
|
### Image Configuration
|
||||||
|
|
||||||
|
Configure image settings in `.env.local`:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Image registry
|
||||||
|
IMAGE_REGISTRY=localhost:30500
|
||||||
|
|
||||||
|
# Image tag
|
||||||
|
JUPYTER_PYTHON_KERNEL_TAG=python-3.12-1
|
||||||
|
```
|
||||||
|
|
||||||
|
## Vault Integration
|
||||||
|
|
||||||
|
### Overview
|
||||||
|
|
||||||
|
Vault integration enables secure secrets management directly from Jupyter notebooks without re-authentication. Users can store and retrieve API keys, database credentials, and other sensitive data securely.
|
||||||
|
|
||||||
|
### Prerequisites
|
||||||
|
|
||||||
|
Vault integration requires:
|
||||||
|
|
||||||
|
- Vault server installed and configured
|
||||||
|
- Keycloak OIDC authentication configured
|
||||||
|
- **Buun-stack kernel images** (standard images don't include Vault integration)
|
||||||
|
|
||||||
|
### Setup
|
||||||
|
|
||||||
|
Enable Vault integration during installation:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Set environment variable before installation or answer yes to prompt during install
|
||||||
|
export JUPYTERHUB_VAULT_INTEGRATION_ENABLED=true
|
||||||
|
just jupyterhub::install
|
||||||
|
```
|
||||||
|
|
||||||
|
Or configure manually:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Setup Vault JWT authentication for JupyterHub
|
||||||
|
just jupyterhub::setup-vault-jwt-auth
|
||||||
|
```
|
||||||
|
|
||||||
|
### Usage in Notebooks
|
||||||
|
|
||||||
|
With Vault integration enabled, use the `buunstack` package in notebooks:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from buunstack import SecretStore
|
||||||
|
|
||||||
|
# Initialize (uses JupyterHub session authentication)
|
||||||
|
secrets = SecretStore()
|
||||||
|
|
||||||
|
# Store secrets
|
||||||
|
secrets.put('api-keys',
|
||||||
|
openai='sk-...',
|
||||||
|
github='ghp_...',
|
||||||
|
database_url='postgresql://...')
|
||||||
|
|
||||||
|
# Retrieve secrets
|
||||||
|
api_keys = secrets.get('api-keys')
|
||||||
|
openai_key = secrets.get('api-keys', field='openai')
|
||||||
|
|
||||||
|
# List all secrets
|
||||||
|
secret_names = secrets.list()
|
||||||
|
|
||||||
|
# Delete secrets
|
||||||
|
secrets.delete('old-api-key')
|
||||||
|
```
|
||||||
|
|
||||||
|
### Security Features
|
||||||
|
|
||||||
|
- **User isolation**: Each user can only access their own secrets
|
||||||
|
- **Automatic token refresh**: Background token management prevents authentication failures
|
||||||
|
- **Audit trail**: All secret access is logged in Vault
|
||||||
|
- **No re-authentication**: Uses existing JupyterHub OIDC session
|
||||||
|
|
||||||
|
## Storage Options
|
||||||
|
|
||||||
|
### Default Storage
|
||||||
|
|
||||||
|
Uses Kubernetes PersistentVolumes for user home directories.
|
||||||
|
|
||||||
|
### NFS Storage
|
||||||
|
|
||||||
|
For shared storage across nodes, configure NFS:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export JUPYTERHUB_NFS_PV_ENABLED=true
|
||||||
|
export JUPYTER_NFS_IP=192.168.10.1
|
||||||
|
export JUPYTER_NFS_PATH=/volume1/drive1/jupyter
|
||||||
|
```
|
||||||
|
|
||||||
|
NFS storage requires:
|
||||||
|
|
||||||
|
- Longhorn storage system installed
|
||||||
|
- NFS server accessible from cluster nodes
|
||||||
|
- Proper NFS export permissions configured
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
### Environment Variables
|
||||||
|
|
||||||
|
Key configuration variables:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Basic settings
|
||||||
|
JUPYTERHUB_NAMESPACE=jupyter
|
||||||
|
JUPYTERHUB_CHART_VERSION=4.2.0
|
||||||
|
JUPYTERHUB_OIDC_CLIENT_ID=jupyterhub
|
||||||
|
|
||||||
|
# Keycloak integration
|
||||||
|
KEYCLOAK_REALM=buunstack
|
||||||
|
|
||||||
|
# Storage
|
||||||
|
JUPYTERHUB_NFS_PV_ENABLED=false
|
||||||
|
|
||||||
|
# Vault integration
|
||||||
|
JUPYTERHUB_VAULT_INTEGRATION_ENABLED=false
|
||||||
|
VAULT_ADDR=http://vault.vault.svc:8200
|
||||||
|
|
||||||
|
# Image settings
|
||||||
|
JUPYTER_PYTHON_KERNEL_TAG=python-3.12-6
|
||||||
|
IMAGE_REGISTRY=localhost:30500
|
||||||
|
```
|
||||||
|
|
||||||
|
### Advanced Configuration
|
||||||
|
|
||||||
|
Customize JupyterHub behavior by editing `jupyterhub-values.gomplate.yaml` template before installation.
|
||||||
|
|
||||||
|
## Management
|
||||||
|
|
||||||
|
### Uninstall
|
||||||
|
|
||||||
|
```bash
|
||||||
|
just jupyterhub::uninstall
|
||||||
|
```
|
||||||
|
|
||||||
|
### Update
|
||||||
|
|
||||||
|
Upgrade to newer versions:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Update image tag
|
||||||
|
export JUPYTER_PYTHON_KERNEL_TAG=python-3.12-2
|
||||||
|
|
||||||
|
# Rebuild and push images
|
||||||
|
just jupyterhub::push-kernel-images
|
||||||
|
|
||||||
|
# Upgrade JupyterHub deployment
|
||||||
|
just jupyterhub::install
|
||||||
|
```
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Image Pull Issues
|
||||||
|
|
||||||
|
Buun-stack images are large and may timeout:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check pod status
|
||||||
|
kubectl get pods -n jupyter
|
||||||
|
|
||||||
|
# Check image pull progress
|
||||||
|
kubectl describe pod <pod-name> -n jupyter
|
||||||
|
|
||||||
|
# Increase timeout if needed
|
||||||
|
helm upgrade jupyterhub jupyterhub/jupyterhub \
|
||||||
|
--timeout=30m -f jupyterhub-values.yaml
|
||||||
|
```
|
||||||
|
|
||||||
|
### Vault Integration Issues
|
||||||
|
|
||||||
|
Check Vault connectivity and authentication:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# In a notebook
|
||||||
|
import os
|
||||||
|
print("Vault Address:", os.getenv('VAULT_ADDR'))
|
||||||
|
print("Access Token:", bool(os.getenv('JUPYTERHUB_OIDC_ACCESS_TOKEN')))
|
||||||
|
|
||||||
|
# Test SecretStore
|
||||||
|
from buunstack import SecretStore
|
||||||
|
secrets = SecretStore()
|
||||||
|
status = secrets.get_status()
|
||||||
|
print(status)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Authentication Issues
|
||||||
|
|
||||||
|
Verify Keycloak client configuration:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check client exists
|
||||||
|
just keycloak::get-client buunstack jupyterhub
|
||||||
|
|
||||||
|
# Check redirect URIs
|
||||||
|
just keycloak::update-client buunstack jupyterhub \
|
||||||
|
"https://your-jupyter-host/hub/oauth_callback"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Performance Considerations
|
||||||
|
|
||||||
|
- **Image Size**: Buun-stack images are ~13GB, plan storage accordingly
|
||||||
|
- **Pull Time**: Initial pulls take 5-15 minutes depending on network
|
||||||
|
- **Resource Usage**: Data science workloads require adequate CPU/memory
|
||||||
|
- **Storage**: NFS provides better performance for shared datasets
|
||||||
|
|
||||||
|
For production deployments, consider:
|
||||||
|
|
||||||
|
- Pre-pulling images to all nodes
|
||||||
|
- Using faster storage backends
|
||||||
|
- Configuring resource limits per user
|
||||||
|
- Setting up monitoring and alerts
|
||||||
Reference in New Issue
Block a user