feat(jupyterhub): vault token w/o keycloak auth

This commit is contained in:
Masaki Yatsu
2025-09-03 10:11:06 +09:00
parent 02ec5eb1e2
commit d233373219
15 changed files with 583 additions and 612 deletions

View File

@@ -110,7 +110,7 @@ JUPYTER_PYTHON_KERNEL_TAG=python-3.12-1
### Overview ### Overview
Vault integration enables secure secrets management directly from Jupyter notebooks without re-authentication. Users can store and retrieve API keys, database credentials, and other sensitive data securely. Vault integration enables secure secrets management directly from Jupyter notebooks using user-specific Vault tokens. Each user receives their own isolated Vault token during notebook spawn, ensuring complete separation of secrets between users. Users can store and retrieve API keys, database credentials, and other sensitive data securely with automatic token renewal.
### Prerequisites ### Prerequisites
@@ -133,7 +133,7 @@ just jupyterhub::install
Or configure manually: Or configure manually:
```bash ```bash
# Setup Vault JWT authentication for JupyterHub # Setup Vault integration (creates user-specific tokens)
just jupyterhub::setup-vault-jwt-auth just jupyterhub::setup-vault-jwt-auth
``` ```
@@ -144,7 +144,7 @@ With Vault integration enabled, use the `buunstack` package in notebooks:
```python ```python
from buunstack import SecretStore from buunstack import SecretStore
# Initialize (uses JupyterHub session authentication) # Initialize (uses pre-acquired user-specific token)
secrets = SecretStore() secrets = SecretStore()
# Store secrets # Store secrets
@@ -160,16 +160,17 @@ openai_key = secrets.get('api-keys', field='openai')
# List all secrets # List all secrets
secret_names = secrets.list() secret_names = secrets.list()
# Delete secrets # Delete secrets or specific fields
secrets.delete('old-api-key') secrets.delete('old-api-key') # Delete entire secret
secrets.delete('api-keys', field='github') # Delete only github field
``` ```
### Security Features ### Security Features
- **User isolation**: Each user can only access their own secrets - **User isolation**: Each user receives a unique Vault token with access only to their own secrets
- **Automatic token refresh**: Background token management prevents authentication failures - **Automatic token renewal**: Tokens can be renewed to extend session lifetime
- **Audit trail**: All secret access is logged in Vault - **Audit trail**: All secret access is logged in Vault
- **No re-authentication**: Uses existing JupyterHub OIDC session - **Individual policies**: Each user has their own Vault policy restricting access to their namespace
## Storage Options ## Storage Options
@@ -273,7 +274,8 @@ Check Vault connectivity and authentication:
# In a notebook # In a notebook
import os import os
print("Vault Address:", os.getenv('VAULT_ADDR')) print("Vault Address:", os.getenv('VAULT_ADDR'))
print("Access Token:", bool(os.getenv('JUPYTERHUB_OIDC_ACCESS_TOKEN'))) print("JWT Token:", bool(os.getenv('NOTEBOOK_VAULT_JWT')))
print("Vault Token:", bool(os.getenv('NOTEBOOK_VAULT_TOKEN')))
# Test SecretStore # Test SecretStore
from buunstack import SecretStore from buunstack import SecretStore
@@ -295,12 +297,172 @@ just keycloak::update-client buunstack jupyterhub \
"https://your-jupyter-host/hub/oauth_callback" "https://your-jupyter-host/hub/oauth_callback"
``` ```
## Implementation
### User-Specific Vault Token System
The `buunstack` SecretStore uses pre-created user-specific Vault tokens that are generated during notebook spawn, ensuring complete user isolation and secure access to individual secret namespaces.
#### Architecture Overview
```plain
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ JupyterHub │ │ Notebook │ │ Vault │
│ │ │ │ │ │
│ ┌───────────┐ │ │ ┌────────────┐ │ │ ┌───────────┐ │
│ │Pre-spawn │ │───►│ │SecretStore │ ├───►│ │User Token │ │
│ │ Hook │ │ │ │ │ │ │ │ + Policy │ │
│ └───────────┘ │ │ └────────────┘ │ │ └───────────┘ │
└─────────────────┘ └──────────────────┘ └─────────────────┘
```
#### Token Lifecycle
1. **Pre-spawn Hook Setup**
- JupyterHub uses admin Vault token to access Vault API
- Creates user-specific Vault policy with restricted path access
- Generates new user-specific Vault token with the created policy
- Passes user token to notebook environment via `NOTEBOOK_VAULT_TOKEN`
2. **SecretStore Initialization**
- Reads user-specific token from environment variable:
- `NOTEBOOK_VAULT_TOKEN` (User-specific Vault token)
- Uses token for all Vault operations within user's namespace
3. **Token Validation**
- Before operations, checks token validity using `lookup_self`
- Verifies token TTL and renewable status
4. **Automatic Token Renewal**
- If token TTL is low (< 10 minutes) and renewable, renews token
- Uses `renew_self` capability granted by user policy
- Logs renewal success for monitoring
#### Code Flow
```python
def _ensure_authenticated(self):
# Check if current Vault token is valid
try:
if self.client.is_authenticated():
# Check if token needs renewal
token_info = self.client.auth.token.lookup_self()
ttl = token_info.get("data", {}).get("ttl", 0)
renewable = token_info.get("data", {}).get("renewable", False)
# Renew if TTL < 10 minutes and renewable
if renewable and ttl > 0 and ttl < 600:
self.client.auth.token.renew_self()
logger.info("✅ Vault token renewed successfully")
return
except Exception:
pass
# Token expired and cannot be refreshed
raise Exception("User-specific Vault token expired and cannot be refreshed. Please restart your notebook server.")
```
#### Key Design Decisions
##### 1. User-Specific Token Creation
- Each user receives a unique Vault token during notebook spawn
- Individual policies ensure complete user isolation
- Admin token used only during pre-spawn hook for token creation
##### 2. Policy-Based Access Control
- User policies restrict access to `secret/data/jupyter/users/{username}/*`
- Each user can only access their own secret namespace
- Token management capabilities (`lookup_self`, `renew_self`) included
##### 3. Singleton Pattern
- Single SecretStore instance per notebook session
- Prevents multiple simultaneous authentications
- Maintains consistent token state
##### 4. Pre-created User Tokens
- Tokens are created during notebook spawn via pre-spawn hook
- Reduces initialization overhead in notebooks
- Provides immediate access to user's secret namespace
#### Error Handling
```python
# Primary error scenarios and responses:
1. User token unavailable
Token stored in NOTEBOOK_VAULT_TOKEN env var
Prompt to restart notebook server if missing
2. Vault token expired
Automatic renewal using renew_self if renewable
Restart notebook server required if not renewable
3. Vault authentication failure
Log detailed error information
Check user policy and token configuration
4. Network connectivity issues
Built-in retry in hvac client
Provide actionable error messages
```
#### Configuration
Environment variables passed to notebooks:
```yaml
# JupyterHub pre_spawn_hook sets:
spawner.environment:
# Core services
POSTGRES_HOST: 'postgres-cluster-rw.postgres'
POSTGRES_PORT: '5432'
JUPYTERHUB_API_URL: 'http://hub:8081/hub/api'
BUUNSTACK_LOG_LEVEL: 'info' # or 'debug' for detailed logging
# Vault integration
NOTEBOOK_VAULT_TOKEN: '<User-specific Vault token>'
VAULT_ADDR: 'http://vault.vault.svc:8200'
```
#### Monitoring and Debugging
Enable detailed logging for troubleshooting:
```python
# In notebook
import os
os.environ['BUUNSTACK_LOG_LEVEL'] = 'DEBUG'
# Restart kernel and check logs
from buunstack import SecretStore
secrets = SecretStore()
# Check authentication status
status = secrets.get_status()
print("Username:", status['username'])
print("Vault Address:", status['vault_addr'])
print("Authentication Method:", status['authentication_method'])
print("Vault Authenticated:", status['vault_authenticated'])
```
#### Performance Characteristics
- **Token renewal overhead**: ~10-50ms for renew_self call
- **Memory usage**: Minimal (single token stored as string)
- **Network traffic**: Only during token renewal (when TTL < 10 minutes)
- **Vault impact**: Standard token operations (lookup_self, renew_self)
## Performance Considerations ## Performance Considerations
- **Image Size**: Buun-stack images are ~13GB, plan storage accordingly - **Image Size**: Buun-stack images are ~13GB, plan storage accordingly
- **Pull Time**: Initial pulls take 5-15 minutes depending on network - **Pull Time**: Initial pulls take 5-15 minutes depending on network
- **Resource Usage**: Data science workloads require adequate CPU/memory - **Resource Usage**: Data science workloads require adequate CPU/memory
- **Storage**: NFS provides better performance for shared datasets - **Storage**: NFS provides better performance for shared datasets
- **Token Renewal**: User token renewal adds minimal overhead
For production deployments, consider: For production deployments, consider:
@@ -308,3 +470,4 @@ For production deployments, consider:
- Using faster storage backends - Using faster storage backends
- Configuring resource limits per user - Configuring resource limits per user
- Setting up monitoring and alerts - Setting up monitoring and alerts
- Monitoring Vault token expiration and renewal patterns

1
env/justfile vendored
View File

@@ -78,6 +78,7 @@ setup:
gomplate -f env.local.gomplate -o ../.env.local gomplate -f env.local.gomplate -o ../.env.local
npm i npm i
pip install build
# Set a specific key in .env.local # Set a specific key in .env.local
[working-directory("..")] [working-directory("..")]

View File

@@ -146,12 +146,6 @@ RUN pip install \
tavily-python \ tavily-python \
tweet-preprocessor tweet-preprocessor
# Install buunstack package
COPY *.whl /opt/
RUN pip install /opt/*.whl && \
fix-permissions "${CONDA_DIR}" && \
fix-permissions "/home/${NB_USER}"
# Install PyTorch with pip (https://pytorch.org/get-started/locally/) # Install PyTorch with pip (https://pytorch.org/get-started/locally/)
# langchain-openai must be updated to avoid pydantic v2 error # langchain-openai must be updated to avoid pydantic v2 error
# https://github.com/run-llama/llama_index/issues/16540https://github.com/run-llama/llama_index/issues/16540 # https://github.com/run-llama/llama_index/issues/16540https://github.com/run-llama/llama_index/issues/16540
@@ -164,6 +158,11 @@ RUN pip install --no-cache-dir --extra-index-url=https://pypi.nvidia.com --index
fix-permissions "${CONDA_DIR}" && \ fix-permissions "${CONDA_DIR}" && \
fix-permissions "/home/${NB_USER}" fix-permissions "/home/${NB_USER}"
# Install buunstack package
COPY *.whl /opt/
RUN pip install /opt/*.whl && \
fix-permissions "${CONDA_DIR}" && \
fix-permissions "/home/${NB_USER}"
WORKDIR "${HOME}" WORKDIR "${HOME}"
EXPOSE 4040 EXPOSE 4040

View File

@@ -146,12 +146,6 @@ RUN pip install \
tavily-python \ tavily-python \
tweet-preprocessor tweet-preprocessor
# Install buunstack package
COPY *.whl /opt/
RUN pip install /opt/*.whl && \
fix-permissions "${CONDA_DIR}" && \
fix-permissions "/home/${NB_USER}"
# Install PyTorch with pip (https://pytorch.org/get-started/locally/) # Install PyTorch with pip (https://pytorch.org/get-started/locally/)
# langchain-openai must be updated to avoid pydantic v2 error # langchain-openai must be updated to avoid pydantic v2 error
# https://github.com/run-llama/llama_index/issues/16540https://github.com/run-llama/llama_index/issues/16540 # https://github.com/run-llama/llama_index/issues/16540https://github.com/run-llama/llama_index/issues/16540
@@ -164,5 +158,11 @@ RUN pip install --no-cache-dir --index-url 'https://download.pytorch.org/whl/cpu
fix-permissions "${CONDA_DIR}" && \ fix-permissions "${CONDA_DIR}" && \
fix-permissions "/home/${NB_USER}" fix-permissions "/home/${NB_USER}"
# Install buunstack package
COPY *.whl /opt/
RUN pip install /opt/*.whl && \
fix-permissions "${CONDA_DIR}" && \
fix-permissions "/home/${NB_USER}"
WORKDIR "${HOME}" WORKDIR "${HOME}"
EXPOSE 4040 EXPOSE 4040

View File

@@ -1,4 +1,21 @@
hub: hub:
extraEnv:
JUPYTERHUB_CRYPT_KEY: {{ .Env.JUPYTERHUB_CRYPT_KEY | quote }}
# Install packages at container startup
extraFiles:
startup.sh:
mountPath: /usr/local/bin/startup.sh
mode: 0755
stringData: |
#!/bin/bash
pip install --no-cache-dir hvac==2.3.0
exec jupyterhub --config /usr/local/etc/jupyterhub/jupyterhub_config.py --upgrade-db
# Override the default command to run our startup script first
command:
- /usr/local/bin/startup.sh
config: config:
JupyterHub: JupyterHub:
authenticator_class: generic-oauth authenticator_class: generic-oauth
@@ -24,48 +41,97 @@ hub:
- profile - profile
- email - email
{{- if eq .Env.JUPYTERHUB_VAULT_INTEGRATION_ENABLED "true" }}
extraConfig: extraConfig:
01-vault-integration: | pre-spawn-hook: |
import os # Set environment variables for spawned containers
import hvac
async def pre_spawn_hook(spawner): async def pre_spawn_hook(spawner):
"""Pass OIDC tokens and Vault config to notebook environment""" """Set essential environment variables for spawned containers"""
auth_state = await spawner.user.get_auth_state() # PostgreSQL configuration
if auth_state: spawner.environment["POSTGRES_HOST"] = "postgres-cluster-rw.postgres"
if 'access_token' in auth_state: spawner.environment["POSTGRES_PORT"] = "5432"
spawner.environment['JUPYTERHUB_OIDC_ACCESS_TOKEN'] = auth_state['access_token']
if 'refresh_token' in auth_state:
spawner.environment['JUPYTERHUB_OIDC_REFRESH_TOKEN'] = auth_state['refresh_token']
if 'id_token' in auth_state:
spawner.environment['JUPYTERHUB_OIDC_ID_TOKEN'] = auth_state['id_token']
if 'expires_at' in auth_state:
spawner.environment['JUPYTERHUB_OIDC_TOKEN_EXPIRES_AT'] = str(auth_state['expires_at'])
# Add Keycloak configuration for token refresh # JupyterHub API configuration
spawner.environment['KEYCLOAK_HOST'] = '{{ .Env.KEYCLOAK_HOST }}' spawner.environment["JUPYTERHUB_API_URL"] = "http://hub:8081/hub/api"
spawner.environment['KEYCLOAK_REALM'] = '{{ .Env.KEYCLOAK_REALM }}'
spawner.environment['KEYCLOAK_CLIENT_ID'] = 'jupyterhub' # Logging configuration
spawner.environment["BUUNSTACK_LOG_LEVEL"] = "{{ .Env.JUPYTER_BUUNSTACK_LOG_LEVEL }}"
# Create user-specific Vault token directly
try:
username = spawner.user.name
# Step 1: Initialize admin Vault client
vault_client = hvac.Client(url="{{ .Env.VAULT_ADDR }}", verify=False)
vault_client.token = "{{ .Env.JUPYTERHUB_VAULT_TOKEN }}"
if not vault_client.is_authenticated():
raise Exception("Admin token is not authenticated")
# Step 2: Create user-specific policy
user_policy_name = "jupyter-user-{}".format(username)
user_path = "secret/data/jupyter/users/{}/*".format(username)
user_metadata_path = "secret/metadata/jupyter/users/{}/*".format(username)
user_base_path = "secret/metadata/jupyter/users/{}".format(username)
user_policy = (
"# User-specific policy for {}\n".format(username) +
"path \"{}\" ".format(user_path) + "{\n" +
" capabilities = [\"create\", \"update\", \"read\", \"delete\", \"list\"]\n" +
"}\n\n" +
"path \"{}\" ".format(user_metadata_path) + "{\n" +
" capabilities = [\"list\", \"read\", \"delete\", \"update\"]\n" +
"}\n\n" +
"path \"{}\" ".format(user_base_path) + "{\n" +
" capabilities = [\"list\"]\n" +
"}\n\n" +
"# Read access to shared resources\n" +
"path \"secret/data/jupyter/shared/*\" {\n" +
" capabilities = [\"read\", \"list\"]\n" +
"}\n\n" +
"path \"secret/metadata/jupyter/shared\" {\n" +
" capabilities = [\"list\"]\n" +
"}\n\n" +
"# Token management capabilities\n" +
"path \"auth/token/lookup-self\" {\n" +
" capabilities = [\"read\"]\n" +
"}\n\n" +
"path \"auth/token/renew-self\" {\n" +
" capabilities = [\"update\"]\n" +
"}"
)
# Write user-specific policy
try:
vault_client.sys.create_or_update_policy(user_policy_name, user_policy)
spawner.log.info("✅ Created policy: {}".format(user_policy_name))
except Exception as policy_e:
spawner.log.warning("Policy creation failed (may already exist): {}".format(policy_e))
# Step 3: Create user-specific token
token_response = vault_client.auth.token.create(
policies=[user_policy_name],
ttl="1h",
renewable=True,
display_name="notebook-{}".format(username)
)
user_vault_token = token_response["auth"]["client_token"]
lease_duration = token_response["auth"].get("lease_duration", 3600)
# Set user-specific Vault token as environment variable
spawner.environment["NOTEBOOK_VAULT_TOKEN"] = user_vault_token
spawner.log.info("✅ User-specific Vault token created for {} (expires in {}s, renewable)".format(username, lease_duration))
except Exception as e:
spawner.log.error("Failed to create user-specific Vault token for {}: {}".format(spawner.user.name, e))
import traceback
spawner.log.error("Full traceback: {}".format(traceback.format_exc()))
c.Spawner.pre_spawn_hook = pre_spawn_hook c.Spawner.pre_spawn_hook = pre_spawn_hook
{{- end }}
02-postgres-integration: |
from functools import wraps
# Store the original pre_spawn_hook if it exists
original_hook = c.Spawner.pre_spawn_hook if hasattr(c.Spawner, 'pre_spawn_hook') else None
async def postgres_pre_spawn_hook(spawner):
"""Add PostgreSQL connection information to notebook environment"""
# Call the original hook first if it exists
if original_hook:
await original_hook(spawner)
# Add PostgreSQL configuration
spawner.environment['POSTGRES_HOST'] = 'postgres-cluster-rw.postgres'
spawner.environment['POSTGRES_PORT'] = '5432'
c.Spawner.pre_spawn_hook = postgres_pre_spawn_hook
podSecurityContext: podSecurityContext:
fsGroup: {{ .Env.JUPYTER_FSGID }} fsGroup: {{ .Env.JUPYTER_FSGID }}
@@ -85,23 +151,8 @@ singleuser:
{{ end -}} {{ end -}}
capacity: 10Gi capacity: 10Gi
{{- if eq .Env.JUPYTERHUB_VAULT_INTEGRATION_ENABLED "true" }}
extraEnv: extraEnv:
VAULT_ADDR: "{{ .Env.VAULT_ADDR }}" VAULT_ADDR: "{{ .Env.VAULT_ADDR }}"
KEYCLOAK_HOST: "{{ .Env.KEYCLOAK_HOST }}"
KEYCLOAK_REALM: "{{ .Env.KEYCLOAK_REALM }}"
# lifecycleHooks:
# postStart:
# exec:
# command:
# - /bin/bash
# - -c
# - |
# # Install hvac for Vault integration
# mamba install hvac requests
# echo "Vault integration ready"
{{- end }}
networkPolicy: networkPolicy:
egress: egress:
- to: - to:
@@ -129,7 +180,6 @@ singleuser:
ports: ports:
- port: 4000 - port: 4000
protocol: TCP protocol: TCP
{{- if eq .Env.JUPYTERHUB_VAULT_INTEGRATION_ENABLED "true" }}
- to: - to:
- namespaceSelector: - namespaceSelector:
matchLabels: matchLabels:
@@ -137,9 +187,6 @@ singleuser:
ports: ports:
- port: 8200 - port: 8200
protocol: TCP protocol: TCP
- port: 8201
protocol: TCP
{{- end }}
- to: - to:
- ipBlock: - ipBlock:
cidr: 0.0.0.0/0 cidr: 0.0.0.0/0

View File

@@ -5,7 +5,7 @@ export JUPYTERHUB_CHART_VERSION := env("JUPYTERHUB_CHART_VERSION", "4.2.0")
export JUPYTERHUB_OIDC_CLIENT_ID := env("JUPYTERHUB_OIDC_CLIENT_ID", "jupyterhub") export JUPYTERHUB_OIDC_CLIENT_ID := env("JUPYTERHUB_OIDC_CLIENT_ID", "jupyterhub")
export JUPYTERHUB_NFS_PV_ENABLED := env("JUPYTERHUB_NFS_PV_ENABLED", "") export JUPYTERHUB_NFS_PV_ENABLED := env("JUPYTERHUB_NFS_PV_ENABLED", "")
export JUPYTERHUB_VAULT_INTEGRATION_ENABLED := env("JUPYTERHUB_VAULT_INTEGRATION_ENABLED", "") export JUPYTERHUB_VAULT_INTEGRATION_ENABLED := env("JUPYTERHUB_VAULT_INTEGRATION_ENABLED", "")
export JUPYTER_PYTHON_KERNEL_TAG := env("JUPYTER_PYTHON_KERNEL_TAG", "python-3.12-8") export JUPYTER_PYTHON_KERNEL_TAG := env("JUPYTER_PYTHON_KERNEL_TAG", "python-3.12-24")
export KERNEL_IMAGE_BUUN_STACK_REPOSITORY := env("KERNEL_IMAGE_BUUN_STACK_REPOSITORY", "buun-stack-notebook") export KERNEL_IMAGE_BUUN_STACK_REPOSITORY := env("KERNEL_IMAGE_BUUN_STACK_REPOSITORY", "buun-stack-notebook")
export KERNEL_IMAGE_BUUN_STACK_CUDA_REPOSITORY := env("KERNEL_IMAGE_BUUN_STACK_CUDA_REPOSITORY", "buun-stack-cuda-notebook") export KERNEL_IMAGE_BUUN_STACK_CUDA_REPOSITORY := env("KERNEL_IMAGE_BUUN_STACK_CUDA_REPOSITORY", "buun-stack-cuda-notebook")
export JUPYTER_PROFILE_MINIMAL_ENABLED := env("JUPYTER_PROFILE_MINIMAL_ENABLED", "false") export JUPYTER_PROFILE_MINIMAL_ENABLED := env("JUPYTER_PROFILE_MINIMAL_ENABLED", "false")
@@ -20,6 +20,7 @@ export IMAGE_REGISTRY := env("IMAGE_REGISTRY", "localhost:30500")
export KEYCLOAK_REALM := env("KEYCLOAK_REALM", "buunstack") export KEYCLOAK_REALM := env("KEYCLOAK_REALM", "buunstack")
export LONGHORN_NAMESPACE := env("LONGHORN_NAMESPACE", "longhorn") export LONGHORN_NAMESPACE := env("LONGHORN_NAMESPACE", "longhorn")
export VAULT_ADDR := env("VAULT_ADDR", "http://vault.vault.svc:8200") export VAULT_ADDR := env("VAULT_ADDR", "http://vault.vault.svc:8200")
export JUPYTER_BUUNSTACK_LOG_LEVEL := env("JUPYTER_BUUNSTACK_LOG_LEVEL", "info")
[private] [private]
default: default:
@@ -54,6 +55,15 @@ install:
--placeholder="e.g., jupyter.example.com" --placeholder="e.g., jupyter.example.com"
) )
done done
# Generate JUPYTERHUB_CRYPT_KEY if not exists
if [ -z "${JUPYTERHUB_CRYPT_KEY:-}" ]; then
echo "Generating JUPYTERHUB_CRYPT_KEY..."
export JUPYTERHUB_CRYPT_KEY=$(just utils::random-password)
echo "JUPYTERHUB_CRYPT_KEY=${JUPYTERHUB_CRYPT_KEY}" >> ../../.env.local
echo "✓ JUPYTERHUB_CRYPT_KEY generated and saved to .env.local"
fi
just create-namespace just create-namespace
# just k8s::copy-regcred ${JUPYTERHUB_NAMESPACE} # just k8s::copy-regcred ${JUPYTERHUB_NAMESPACE}
just keycloak::create-client ${KEYCLOAK_REALM} ${JUPYTERHUB_OIDC_CLIENT_ID} \ just keycloak::create-client ${KEYCLOAK_REALM} ${JUPYTERHUB_OIDC_CLIENT_ID} \
@@ -96,8 +106,17 @@ install:
fi fi
kubectl apply -n ${JUPYTERHUB_NAMESPACE} -f nfs-pvc.yaml kubectl apply -n ${JUPYTERHUB_NAMESPACE} -f nfs-pvc.yaml
fi fi
# Create or get JupyterHub Vault token before gomplate
if ! just vault::exist jupyterhub/vault-token &>/dev/null; then
echo "Creating JupyterHub Vault token..."
just create-jupyterhub-vault-token
fi
export JUPYTERHUB_VAULT_TOKEN=$(just vault::get jupyterhub/vault-token token)
# https://z2jh.jupyter.org/en/stable/ # https://z2jh.jupyter.org/en/stable/
gomplate -f jupyterhub-values.gomplate.yaml -o jupyterhub-values.yaml gomplate -f jupyterhub-values.gomplate.yaml -o jupyterhub-values.yaml
helm upgrade --cleanup-on-fail --install jupyterhub jupyterhub/jupyterhub \ helm upgrade --cleanup-on-fail --install jupyterhub jupyterhub/jupyterhub \
--version ${JUPYTERHUB_CHART_VERSION} -n ${JUPYTERHUB_NAMESPACE} \ --version ${JUPYTERHUB_CHART_VERSION} -n ${JUPYTERHUB_NAMESPACE} \
--timeout=20m -f jupyterhub-values.yaml --timeout=20m -f jupyterhub-values.yaml
@@ -138,62 +157,68 @@ delete-pv:
# Build Jupyter notebook kernel images # Build Jupyter notebook kernel images
build-kernel-images: build-kernel-images:
#!/bin/bash #!/bin/bash
set -euo pipefail set -euxo pipefail
# Build python package wheel (
cd ../python-package cd ../python-package
rm -rf dist/ build/ *.egg-info/ rm -rf dist/ build/ *.egg-info/
SETUPTOOLS_SCM_PRETEND_VERSION_FOR_BUUNSTACK=0.1.0 python -m build --wheel SETUPTOOLS_SCM_PRETEND_VERSION_FOR_BUUNSTACK=0.1.0 python -m build --wheel
cd ../jupyterhub )
# Copy built wheel to image directories
cp ../python-package/dist/*.whl ./images/datastack-notebook/
cp ../python-package/dist/*.whl ./images/datastack-cuda-notebook/
( (
cd ./images/datastack-notebook cd ./images/datastack-notebook
cp ../../../python-package/dist/*.whl ./
docker build -t \ docker build -t \
${IMAGE_REGISTRY}/${KERNEL_IMAGE_BUUN_STACK_REPOSITORY}:${JUPYTER_PYTHON_KERNEL_TAG} \ ${IMAGE_REGISTRY}/${KERNEL_IMAGE_BUUN_STACK_REPOSITORY}:${JUPYTER_PYTHON_KERNEL_TAG} \
--build-arg spark_version="3.5.4" \ --build-arg spark_version="3.5.4" \
--build-arg spark_download_url="https://archive.apache.org/dist/spark/" \ --build-arg spark_download_url="https://archive.apache.org/dist/spark/" \
. .
) )
rm -f ./images/datastack-notebook/*.whl
if [ "${JUPYTER_PROFILE_BUUN_STACK_CUDA_ENABLED}" = "true" ]; then
( (
cd ./images/datastack-cuda-notebook cd ./images/datastack-cuda-notebook
cp ../../../python-package/dist/*.whl ./
docker build -t \ docker build -t \
${IMAGE_REGISTRY}/${KERNEL_IMAGE_BUUN_STACK_CUDA_REPOSITORY}:${JUPYTER_PYTHON_KERNEL_TAG} \ ${IMAGE_REGISTRY}/${KERNEL_IMAGE_BUUN_STACK_CUDA_REPOSITORY}:${JUPYTER_PYTHON_KERNEL_TAG} \
--build-arg spark_version="3.5.4" \ --build-arg spark_version="3.5.4" \
--build-arg spark_download_url="https://archive.apache.org/dist/spark/" \ --build-arg spark_download_url="https://archive.apache.org/dist/spark/" \
. .
) )
# Clean up copied wheel files
rm -f ./images/datastack-notebook/*.whl
rm -f ./images/datastack-cuda-notebook/*.whl rm -f ./images/datastack-cuda-notebook/*.whl
fi
# Push Jupyter notebook kernel images # Push Jupyter notebook kernel images
push-kernel-images: build-kernel-images push-kernel-images: build-kernel-images
docker push ${IMAGE_REGISTRY}/${KERNEL_IMAGE_BUUN_STACK_REPOSITORY}:${JUPYTER_PYTHON_KERNEL_TAG}
docker push ${IMAGE_REGISTRY}/${KERNEL_IMAGE_BUUN_STACK_CUDA_REPOSITORY}:${JUPYTER_PYTHON_KERNEL_TAG}
# Configure Vault for JupyterHub integration
setup-vault-integration:
#!/bin/bash #!/bin/bash
set -euo pipefail set -euo pipefail
echo "Creating JupyterHub Vault policy..." docker push ${IMAGE_REGISTRY}/${KERNEL_IMAGE_BUUN_STACK_REPOSITORY}:${JUPYTER_PYTHON_KERNEL_TAG}
just vault::write-policy jupyter-user $(pwd)/vault-policy.hcl if [ "${JUPYTER_PROFILE_BUUN_STACK_CUDA_ENABLED}" = "true" ]; then
echo "✓ JupyterHub policy created" docker push ${IMAGE_REGISTRY}/${KERNEL_IMAGE_BUUN_STACK_CUDA_REPOSITORY}:${JUPYTER_PYTHON_KERNEL_TAG}
fi
# Setup JWT auth for JupyterHub tokens (no re-authentication needed) # Setup Vault integration for JupyterHub (user-specific tokens)
setup-vault-jwt-auth: setup-vault-jwt-auth:
#!/bin/bash #!/bin/bash
set -euo pipefail set -euo pipefail
echo "Setting up Vault integration for JupyterHub..." echo "Setting up Vault integration for JupyterHub..."
just setup-vault-integration
just vault::setup-jwt-auth "jupyterhub" "jupyter-token" "jupyter-user" echo "✓ Vault integration configured (user-specific tokens)"
echo "✓ Vault integration configured"
echo "" echo ""
echo "Users can now access Vault from notebooks using:" echo "Users can now access Vault from notebooks using:"
echo " import os, hvac" echo " from buunstack import SecretStore"
echo " client = hvac.Client(url=os.getenv('VAULT_ADDR'), verify=False)" echo " secrets = SecretStore()"
echo " client.auth.jwt.jwt_login(" echo " # Each user gets their own isolated Vault token and policy"
echo " role='jupyter-token',"
echo " jwt=os.getenv('JUPYTERHUB_OIDC_ACCESS_TOKEN')," # Create JupyterHub Vault token (uses admin policy for JWT operations)
echo " path='jwt'" create-jupyterhub-vault-token ttl="720h":
echo " )" #!/bin/bash
set -euo pipefail
echo "Creating JupyterHub Vault token with admin policy..."
# JupyterHub needs admin privileges to read Keycloak credentials from Vault
# Create token and store in Vault
just vault::create-token-and-store admin jupyterhub/vault-token {{ ttl }}
echo "✓ JupyterHub Vault token created and stored"
echo ""
echo "To use in JupyterHub deployment:"
echo " JUPYTERHUB_VAULT_TOKEN=\$(just vault::get jupyterhub/vault-token token)"

View File

@@ -1,26 +0,0 @@
# JupyterHub user policy for Vault access
# Read access to shared jupyter resources
path "secret/data/jupyter/shared/*" {
capabilities = ["read", "list"]
}
# Allow users to list shared directory
path "secret/metadata/jupyter/shared" {
capabilities = ["list"]
}
# Full access to user-specific paths
path "secret/data/jupyter/users/{{identity.entity.aliases.auth_jwt_*.metadata.username}}/*" {
capabilities = ["create", "update", "read", "delete", "list"]
}
# Allow users to list their own directory
path "secret/metadata/jupyter/users/{{identity.entity.aliases.auth_jwt_*.metadata.username}}/*" {
capabilities = ["list", "read", "delete"]
}
# Allow users to list only their own user directory for navigation
path "secret/metadata/jupyter/users/{{identity.entity.aliases.auth_jwt_*.metadata.username}}" {
capabilities = ["list"]
}

View File

@@ -6,4 +6,5 @@ just = "1.42.4"
k3sup = "0.13.10" k3sup = "0.13.10"
kubelogin = "1.34.0" kubelogin = "1.34.0"
node = "22.18.0" node = "22.18.0"
python = "3.12.11"
vault = "1.20.2" vault = "1.20.2"

View File

@@ -1,14 +1,14 @@
# buunstack # buunstack
A Python package for buun-stack that provides secure secrets management with HashiCorp Vault and automatic Keycloak OIDC token refresh for JupyterHub users. A Python package for buun-stack that provides secure secrets management with HashiCorp Vault using pre-acquired Vault tokens from JupyterHub for seamless authentication.
## Features ## Features
- 🔒 **Secure Secrets Management**: Integration with HashiCorp Vault - 🔒 **Secure Secrets Management**: Integration with HashiCorp Vault
- 🔄 **Automatic Token Refresh**: Seamless Keycloak OIDC token management - 🚀 **Pre-acquired Authentication**: Uses Vault tokens created at notebook spawn
- 📱 **Simple API**: Easy-to-use interface for secrets storage and retrieval - 📱 **Simple API**: Easy-to-use interface for secrets storage and retrieval
- 🔄 **Automatic Token Renewal**: Built-in token refresh for long-running sessions
- 🏢 **Enterprise Ready**: Built for production environments - 🏢 **Enterprise Ready**: Built for production environments
- 🚀 **JupyterHub Integration**: Native support for JupyterHub workflows
## Quick Start ## Quick Start
@@ -23,15 +23,15 @@ pip install buunstack
```python ```python
from buunstack import SecretStore from buunstack import SecretStore
# Initialize with automatic token refresh (default) # Initialize with pre-acquired Vault token (automatic)
secrets = SecretStore() secrets = SecretStore()
# Put API keys and configuration # Put API keys and configuration
secrets.put('api-keys', { secrets.put('api-keys',
'openai_key': 'sk-your-key-here', openai_key='sk-your-key-here',
'github_token': 'ghp_your-token', github_token='ghp_your-token',
'database_url': 'postgresql://user:pass@host:5432/db' database_url='postgresql://user:pass@host:5432/db'
}) )
# Get secrets # Get secrets
api_keys = secrets.get('api-keys') api_keys = secrets.get('api-keys')
@@ -44,18 +44,19 @@ all_secrets = secrets.list()
### Configuration Options ### Configuration Options
```python ```python
# Manual token management # Disable JupyterHub token synchronization
secrets = SecretStore(auto_token_refresh=False) secrets = SecretStore(sync_with_jupyterhub=False)
# Custom refresh timing # Custom token validity buffer
secrets = SecretStore( secrets = SecretStore(
auto_token_refresh=True, sync_with_jupyterhub=True,
refresh_buffer_seconds=600, # Refresh 10 minutes before expiry refresh_buffer_seconds=600 # Sync tokens 10 minutes before expiry
background_refresh_interval=3600 # Background refresh every hour
) )
# Start background auto-refresh # Check synchronization status
refresher = secrets.start_background_refresh() status = secrets.get_status()
print(f"JupyterHub sync enabled: {status['sync_with_jupyterhub']}")
print(f"API configured: {status.get('jupyterhub_api_configured', False)}")
``` ```
### Environment Variables Helper ### Environment Variables Helper

1
python-package/buunstack/.gitignore vendored Normal file
View File

@@ -0,0 +1 @@
/examples/

View File

@@ -8,6 +8,6 @@ try:
from ._version import __version__ from ._version import __version__
except ImportError: except ImportError:
__version__ = "unknown" __version__ = "unknown"
__author__ = "Buun Stack Team" __author__ = "Buun ch."
__all__ = ["SecretStore", "get_env_from_secrets", "put_env_to_secrets"] __all__ = ["SecretStore", "get_env_from_secrets", "put_env_to_secrets"]

View File

@@ -12,7 +12,7 @@ def quickstart_example():
print("🚀 buunstack QuickStart Example") print("🚀 buunstack QuickStart Example")
print("=" * 40) print("=" * 40)
# Initialize SecretStore (auto-refresh enabled by default) # Initialize SecretStore (JupyterHub sync enabled by default)
secrets = SecretStore() secrets = SecretStore()
print(f"✅ SecretStore initialized for user: {secrets.username}") print(f"✅ SecretStore initialized for user: {secrets.username}")
@@ -87,32 +87,29 @@ def advanced_example():
print("\n🔧 Advanced Configuration Example") print("\n🔧 Advanced Configuration Example")
print("=" * 40) print("=" * 40)
# Manual token management # Manual token management (disable JupyterHub sync)
print("\n1⃣ Manual token management:") print("\n1⃣ Manual token management:")
manual_secrets = SecretStore(auto_token_refresh=False) manual_secrets = SecretStore(sync_with_jupyterhub=False)
print(f" Auto-refresh: {manual_secrets.auto_token_refresh}") print(f" JupyterHub sync: {manual_secrets.sync_with_jupyterhub}")
# Custom timing # Custom timing
print("\n2⃣ Custom refresh timing:") print("\n2⃣ Custom refresh timing:")
custom_secrets = SecretStore( custom_secrets = SecretStore(
auto_token_refresh=True, sync_with_jupyterhub=True,
refresh_buffer_seconds=600, # Refresh 10 minutes before expiry refresh_buffer_seconds=600, # Sync 10 minutes before expiry
background_refresh_interval=3600, # Background refresh every hour
) )
print(f" Refresh buffer: {custom_secrets.refresh_buffer_seconds}s") print(f" Refresh buffer: {custom_secrets.refresh_buffer_seconds}s")
print(f" Background interval: {custom_secrets.background_refresh_interval}s") print(f" JupyterHub sync: {custom_secrets.sync_with_jupyterhub}")
# Background refresh (if auto_token_refresh is enabled) # Check JupyterHub API configuration
if custom_secrets.auto_token_refresh and custom_secrets.refresh_token: print("\n3⃣ JupyterHub API configuration:")
print("\n3⃣ Starting background refresher:") status = custom_secrets.get_status()
refresher = custom_secrets.start_background_refresh() api_configured = status.get('jupyterhub_api_configured', False)
refresher_status = refresher.get_status() print(f" API configured: {api_configured}")
print(f" Running: {refresher_status['running']}") if api_configured:
print(f" Interval: {refresher_status['interval_seconds']}s") print(f" API URL: {custom_secrets.jupyterhub_api_url}")
else:
# Stop the refresher print(" API token or URL not configured")
custom_secrets.stop_background_refresh()
print(" Stopped background refresher")
if __name__ == "__main__": if __name__ == "__main__":

View File

@@ -1,61 +1,60 @@
""" """
Secrets management for JupyterHub with Vault backend Secrets management with user-specific Vault token authentication
""" """
import logging import logging
import os import os
import threading
import warnings import warnings
from datetime import datetime, timedelta
from typing import Any, overload from typing import Any, overload
import hvac import hvac
import jwt
import requests
# Suppress SSL warnings for self-signed certificates # Suppress SSL warnings for self-signed certificates
warnings.filterwarnings("ignore", message="Unverified HTTPS request") warnings.filterwarnings("ignore", message="Unverified HTTPS request")
# Set up logging (disabled by default)
logger = logging.getLogger("buunstack") logger = logging.getLogger("buunstack")
logger.addHandler(logging.NullHandler()) # Default to no output log_level_str = os.getenv("BUUNSTACK_LOG_LEVEL", "warning").upper()
log_level = getattr(logging, log_level_str, logging.WARNING)
logger.setLevel(log_level)
# For Jupyter notebooks, we need to ensure proper logging configuration
# Always add handler if none exists, regardless of conditions
if not logger.handlers:
handler = logging.StreamHandler()
handler.setLevel(log_level)
formatter = logging.Formatter(
"%(asctime)s - %(name)s - %(levelname)s - %(message)s"
)
handler.setFormatter(formatter)
logger.addHandler(handler)
# Disable propagation to avoid root logger interference in notebooks
logger.propagate = False
# Debug: Log the handler addition
if log_level <= logging.DEBUG:
print(f"DEBUG: Added StreamHandler to buunstack logger (level={log_level})")
logging.getLogger().setLevel(log_level)
# Additional debug information for troubleshooting
if log_level <= logging.DEBUG:
print(
f"DEBUG: buunstack logger initialized - level={logger.level}, handlers={len(logger.handlers)}"
)
class SecretStore: class SecretStore:
""" """
Simple secrets management for JupyterHub with Vault backend. Secure secrets management with JupyterHub API authentication.
SecretStore provides a secure interface for managing secrets in JupyterHub Uses JupyterHub's vault-token API endpoint to obtain Vault tokens
environments using HashiCorp Vault as the backend storage. It supports by exchanging auth_state JWT. Implements singleton pattern for
automatic OIDC token refresh via Keycloak integration and provides both consistent state across imports.
manual and background token management options.
This class implements the singleton pattern to ensure only one instance
exists per user session, preventing duplicate background refresh threads.
Attributes
----------
auto_token_refresh : bool
Whether automatic token refresh is enabled.
refresh_buffer_seconds : int
Seconds before token expiry to trigger refresh.
background_refresh_interval : int
Seconds between background refresh checks.
username : str or None
JupyterHub username from environment.
vault_addr : str or None
Vault server address from environment.
base_path : str
Base path for user's secrets in Vault.
Examples Examples
-------- --------
>>> secrets = SecretStore() >>> secrets = SecretStore()
>>> secrets.put('api-keys', openai='sk-123', github='ghp-456') >>> secrets.put('api-keys', openai='sk-123', github='ghp-456')
>>> data = secrets.get('api-keys')
>>> print(data['openai'])
'sk-123'
>>> # Or get specific field directly
>>> openai_key = secrets.get('api-keys', field='openai') >>> openai_key = secrets.get('api-keys', field='openai')
>>> print(openai_key) >>> print(openai_key)
'sk-123' 'sk-123'
@@ -65,203 +64,84 @@ class SecretStore:
_initialized = False _initialized = False
def __new__(cls, *args, **kwargs): def __new__(cls, *args, **kwargs):
"""Return singleton SecretStore instance."""
if cls._instance is None: if cls._instance is None:
cls._instance = super().__new__(cls) cls._instance = super().__new__(cls)
return cls._instance return cls._instance
def __init__( def __init__(self):
self,
auto_token_refresh: bool = True,
refresh_buffer_seconds: int = 300,
background_refresh_interval: int = 1800,
):
""" """
Initialize SecretStore with authentication and configuration. Initialize SecretStore with JupyterHub API authentication.
Note: Due to singleton pattern, parameters are only used on the first Uses JupyterHub's vault-token API endpoint to exchange
instantiation. Subsequent calls return the existing instance with auth_state JWT for Vault tokens.
its original configuration.
Parameters
----------
auto_token_refresh : bool, optional
Enable automatic token refresh using Keycloak OIDC, by default True.
Requires KEYCLOAK_HOST, KEYCLOAK_REALM, and JUPYTERHUB_OIDC_REFRESH_TOKEN
environment variables. Only used on first instantiation.
refresh_buffer_seconds : int, optional
Seconds before token expiry to trigger refresh, by default 300.
Only used when auto_token_refresh is True. Only used on first instantiation.
background_refresh_interval : int, optional
Seconds between background refresh checks, by default 1800.
Only used when background refresh is started. Only used on first instantiation.
Raises
------
ValueError
If required environment variables are missing:
- JUPYTERHUB_USER: JupyterHub username
- VAULT_ADDR: Vault server address
- JUPYTERHUB_OIDC_ACCESS_TOKEN: Initial access token
- KEYCLOAK_HOST, KEYCLOAK_REALM: Required for auto_token_refresh
ConnectionError
If unable to connect to Vault server or authenticate.
Examples
--------
>>> # Basic usage with auto-refresh
>>> secrets = SecretStore()
>>> # Manual token management
>>> secrets = SecretStore(auto_token_refresh=False)
>>> # Custom timing
>>> secrets = SecretStore(
... refresh_buffer_seconds=600,
... background_refresh_interval=3600
... )
""" """
if self._initialized: if self._initialized:
return return
self.auto_token_refresh = auto_token_refresh
self.refresh_buffer_seconds = refresh_buffer_seconds
self.background_refresh_interval = background_refresh_interval
self.username = os.getenv("JUPYTERHUB_USER") self.username = os.getenv("JUPYTERHUB_USER")
self.vault_addr = os.getenv("VAULT_ADDR") self.vault_addr = os.getenv("VAULT_ADDR")
if self.auto_token_refresh:
self.keycloak_host = os.getenv("KEYCLOAK_HOST")
self.keycloak_realm = os.getenv("KEYCLOAK_REALM")
self.keycloak_client_id = os.getenv("KEYCLOAK_CLIENT_ID", "jupyterhub")
self.refresh_token = os.getenv("JUPYTERHUB_OIDC_REFRESH_TOKEN")
self.access_token = os.getenv("JUPYTERHUB_OIDC_ACCESS_TOKEN")
self.token_expiry = (
self._get_token_expiry(self.access_token) if self.access_token else None
)
self.client = hvac.Client(url=self.vault_addr, verify=False)
self._background_refresher = None
self._authenticate_vault()
self.base_path = f"jupyter/users/{self.username}" self.base_path = f"jupyter/users/{self.username}"
logger.info(f"SecretStore initialized for user: {self.username}") # Using pre-acquired Vault token from notebook spawn
logger.info(
f"Auto token refresh: {'enabled' if self.auto_token_refresh else 'disabled'}"
)
if self.auto_token_refresh and self.token_expiry: # Initialize Vault client
logger.info(f"Token expires at: {self.token_expiry}") self.client = hvac.Client(url=self.vault_addr, verify=False)
# Attempt authentication
self._authenticate_vault()
logger.info(f"SecretStore initialized for user: {self.username}")
logger.info("Using user-specific Vault token authentication")
self._initialized = True self._initialized = True
def _get_token_expiry(self, token: str) -> datetime | None:
"""Extract expiry time from JWT token"""
if not token:
return None
try:
payload = jwt.decode(token, options={"verify_signature": False})
exp = payload.get("exp")
if exp:
return datetime.fromtimestamp(exp)
# Fallback to iat + 1 hour
iat = payload.get("iat")
if iat:
return datetime.fromtimestamp(iat + 3600)
except Exception as e:
logger.warning(f"Could not decode token expiry: {e}")
return datetime.now() + timedelta(hours=1)
def _is_token_valid(self) -> bool:
"""Check if current token is still valid"""
if not self.auto_token_refresh or not self.token_expiry:
return True # Assume valid if refresh is disabled
time_until_expiry = (self.token_expiry - datetime.now()).total_seconds()
return time_until_expiry > self.refresh_buffer_seconds
def _refresh_keycloak_tokens(self) -> bool:
"""Refresh tokens using Keycloak refresh token"""
if not self.auto_token_refresh:
return False
if not self.refresh_token or not self.keycloak_host or not self.keycloak_realm:
logger.error("Missing refresh token or Keycloak configuration")
return False
token_url = f"https://{self.keycloak_host}/realms/{self.keycloak_realm}/protocol/openid-connect/token"
try:
logger.info("Refreshing tokens from Keycloak...")
response = requests.post(
token_url,
data={
"grant_type": "refresh_token",
"refresh_token": self.refresh_token,
"client_id": self.keycloak_client_id,
},
verify=False,
)
if response.status_code == 200:
tokens = response.json()
# Update tokens
self.access_token = tokens["access_token"]
if "refresh_token" in tokens:
self.refresh_token = tokens["refresh_token"]
# Update environment variables
os.environ["JUPYTERHUB_OIDC_ACCESS_TOKEN"] = self.access_token
if "refresh_token" in tokens:
os.environ["JUPYTERHUB_OIDC_REFRESH_TOKEN"] = self.refresh_token
# Update token expiry
self.token_expiry = self._get_token_expiry(self.access_token)
logger.info("✅ Tokens refreshed successfully")
return True
else:
logger.error(
f"Token refresh failed: {response.status_code} - {response.text}"
)
return False
except Exception as e:
logger.error(f"Exception during token refresh: {e}")
return False
def _authenticate_vault(self): def _authenticate_vault(self):
"""Authenticate with Vault using current access token""" """
if not self.access_token: Authenticate with Vault using user-specific token from notebook spawn.
raise ValueError("No access token available")
try: Raises
self.client.auth.jwt.jwt_login( ------
role="jupyter-token", jwt=self.access_token, path="jwt" Exception
If user-specific Vault token is not available.
"""
vault_token = os.getenv("NOTEBOOK_VAULT_TOKEN")
if not vault_token:
raise Exception(
"No user-specific Vault token available. "
"Please restart your notebook server."
) )
logger.info("✅ Authenticated with Vault successfully")
except Exception as e: self.client.token = vault_token
logger.error(f"Vault authentication failed: {e}") logger.info("✅ Using user-specific Vault token from notebook spawn")
raise
def _ensure_authenticated(self): def _ensure_authenticated(self):
"""Ensure we have valid tokens and Vault authentication""" """
if self.auto_token_refresh and not self._is_token_valid(): Ensure we have valid Vault authentication with token renewal.
logger.info("Token invalid or expiring soon") """
try:
if self.client.is_authenticated():
# Check if token needs renewal (if renewable and close to expiry)
try:
token_info = self.client.auth.token.lookup_self()
ttl = token_info.get("data", {}).get("ttl", 0)
renewable = token_info.get("data", {}).get("renewable", False)
if self._refresh_keycloak_tokens(): # Renew if TTL < 10 minutes and renewable
self._authenticate_vault() if renewable and ttl > 0 and ttl < 600:
else: logger.info(f"Renewing Vault token (TTL: {ttl}s)")
self.client.auth.token.renew_self()
logger.info("✅ Vault token renewed successfully")
except Exception as e:
logger.warning(f"Token renewal check failed: {e}")
return
except Exception:
pass
# Token expired or invalid - no fallback available with user-specific tokens
raise Exception( raise Exception(
"Failed to refresh tokens. Manual re-authentication required." "User-specific Vault token expired and cannot be refreshed. Please restart your notebook server."
) )
def put(self, key: str, **kwargs: Any) -> None: def put(self, key: str, **kwargs: Any) -> None:
@@ -432,20 +312,24 @@ class SecretStore:
logger.warning(f'Could not get secret "{key}": {e}') logger.warning(f'Could not get secret "{key}": {e}')
raise KeyError(f"Secret '{key}' not found") from e raise KeyError(f"Secret '{key}' not found") from e
def delete(self, key: str) -> None: def delete(self, key: str, field: str | None = None) -> None:
""" """
Delete a secret from your personal storage. Delete a secret or a specific field from your personal storage.
Permanently removes the secret and all its versions from Vault. If field is None, permanently removes the entire secret and all its versions.
This operation cannot be undone. If field is specified, removes only that field from the secret.
Parameters Parameters
---------- ----------
key : str key : str
The key/name of the secret to delete. The key/name of the secret to delete or modify.
field : str, optional
Specific field to delete from the secret. If None, deletes entire secret.
Raises Raises
------ ------
KeyError
If the key or field doesn't exist.
ConnectionError ConnectionError
If unable to connect to Vault server. If unable to connect to Vault server.
hvac.exceptions.Forbidden hvac.exceptions.Forbidden
@@ -456,12 +340,20 @@ class SecretStore:
Examples Examples
-------- --------
>>> secrets = SecretStore() >>> secrets = SecretStore()
>>> # Delete entire secret
>>> secrets.delete('old-api-key') >>> secrets.delete('old-api-key')
>>> # Secret is permanently removed >>>
>>> # Delete only specific field
>>> secrets.put('credentials', github='token123', aws='secret456')
>>> secrets.delete('credentials', field='github')
>>> # Now only 'aws' field remains
""" """
self._ensure_authenticated() self._ensure_authenticated()
path = f"{self.base_path}/{key}" path = f"{self.base_path}/{key}"
if field is None:
# Delete entire secret
try: try:
self.client.secrets.kv.v2.delete_metadata_and_all_versions( self.client.secrets.kv.v2.delete_metadata_and_all_versions(
path=path, mount_point="secret" path=path, mount_point="secret"
@@ -470,6 +362,44 @@ class SecretStore:
except Exception as e: except Exception as e:
logger.error(f'Failed to delete secret "{key}": {e}') logger.error(f'Failed to delete secret "{key}": {e}')
raise raise
else:
# Delete specific field only
try:
# First, get the current secret
response = self.client.secrets.kv.v2.read_secret_version(
path=path, mount_point="secret", raise_on_deleted_version=False
)
if response and "data" in response and "data" in response["data"]:
data = response["data"]["data"]
# Check if field exists
if field not in data:
raise KeyError(f"Field '{field}' not found in secret '{key}'")
# Remove the field
del data[field]
# If no fields remain, delete the entire secret
if not data:
self.client.secrets.kv.v2.delete_metadata_and_all_versions(
path=path, mount_point="secret"
)
logger.info(f"Deleted secret '{key}' (no fields remaining)")
else:
# Update the secret without the deleted field
self.client.secrets.kv.v2.create_or_update_secret(
path=path, secret=data, mount_point="secret"
)
logger.info(f"Deleted field '{field}' from secret '{key}'")
else:
raise KeyError(f"Secret '{key}' not found")
except KeyError:
raise
except Exception as e:
logger.error(
f"Failed to delete field '{field}' from secret '{key}': {e}"
)
raise
def list(self) -> list[str]: def list(self) -> list[str]:
""" """
@@ -505,236 +435,35 @@ class SecretStore:
def get_status(self) -> dict[str, Any]: def get_status(self) -> dict[str, Any]:
""" """
Get comprehensive status information about the SecretStore instance. Get status information about the SecretStore instance.
Returns detailed information about configuration, authentication status,
token validity, and background refresh status.
Returns Returns
------- -------
dict[str, Any] dict[str, Any]
Status dictionary containing: Status dictionary containing:
- username: JupyterHub username - username: JupyterHub username
- auto_token_refresh: Whether auto-refresh is enabled
- has_access_token: Whether access token is available
- vault_addr: Vault server address - vault_addr: Vault server address
- has_refresh_token: Whether refresh token is available (if auto_token_refresh=True) - authentication_method: Authentication method used
- keycloak_configured: Whether Keycloak settings are configured (if auto_token_refresh=True) - vault_authenticated: Whether Vault client is authenticated
- token_expires_at: Token expiration time (if available)
- token_expires_in_seconds: Seconds until token expires (if available)
- background_refresher_running: Whether background refresher is active
Examples Examples
-------- --------
>>> secrets = SecretStore() >>> secrets = SecretStore()
>>> status = secrets.get_status() >>> status = secrets.get_status()
>>> print(f"User: {status['username']}") >>> print(f"User: {status['username']}")
>>> print(f"Token expires in: {status.get('token_expires_in_seconds', 'N/A')} seconds")
""" """
status = { status = {
"username": self.username, "username": self.username,
"auto_token_refresh": self.auto_token_refresh,
"has_access_token": bool(self.access_token),
"vault_addr": self.vault_addr, "vault_addr": self.vault_addr,
"authentication_method": "User-specific Vault token",
} }
if self.auto_token_refresh:
status.update(
{
"has_refresh_token": bool(self.refresh_token),
"keycloak_configured": bool(
self.keycloak_host and self.keycloak_realm
),
}
)
if self.token_expiry:
time_remaining = (self.token_expiry - datetime.now()).total_seconds()
status.update(
{
"token_valid": self._is_token_valid(),
"token_expiry": self.token_expiry.isoformat(),
"seconds_remaining": max(0, time_remaining),
"minutes_remaining": max(0, time_remaining / 60),
}
)
return status
def start_background_refresh(self) -> "BackgroundRefresher":
"""
Start automatic background token refreshing.
Begins a background thread that periodically checks and refreshes
the access token before it expires. Only available when
auto_token_refresh is enabled.
Returns
-------
BackgroundRefresher
The background refresher instance that can be used to monitor
or control the refresh process.
Raises
------
ValueError
If auto_token_refresh is False. Background refresh requires
automatic token refresh to be enabled.
Examples
--------
>>> secrets = SecretStore(auto_token_refresh=True)
>>> refresher = secrets.start_background_refresh()
>>> status = refresher.get_status()
>>> print(f"Background refresh running: {status['running']}")
"""
if not self.auto_token_refresh:
raise ValueError("Background refresh requires auto_token_refresh=True")
if self._background_refresher is None:
self._background_refresher = BackgroundRefresher(
self, interval_seconds=self.background_refresh_interval
)
self._background_refresher.start()
return self._background_refresher
def stop_background_refresh(self) -> None:
"""
Stop the background token refresher.
Stops the background thread that was refreshing tokens automatically.
It's safe to call this method even if no background refresher is running.
Examples
--------
>>> secrets = SecretStore()
>>> refresher = secrets.start_background_refresh()
>>> # ... do some work ...
>>> secrets.stop_background_refresh()
"""
if self._background_refresher:
self._background_refresher.stop()
class BackgroundRefresher:
"""
Background token refresher for automatic token management.
This class runs in a separate daemon thread and periodically checks if
the access token needs to be refreshed, automatically handling the refresh
process to maintain uninterrupted access to Vault.
Attributes
----------
secret_store : SecretStore
The SecretStore instance to refresh tokens for.
interval_seconds : int
Seconds between refresh checks.
refresh_count : int
Number of successful refreshes performed.
last_refresh : datetime or None
Timestamp of the last successful refresh.
Examples
--------
>>> secrets = SecretStore(auto_token_refresh=True)
>>> refresher = secrets.start_background_refresh()
>>> # Refresher runs automatically in background
>>> status = refresher.get_status()
>>> print(f"Refreshes performed: {status['refresh_count']}")
"""
def __init__(self, secret_store: SecretStore, interval_seconds: int = 1800):
"""
Initialize the background refresher.
Parameters
----------
secret_store : SecretStore
The SecretStore instance to manage tokens for.
interval_seconds : int, optional
Seconds between refresh checks, by default 1800 (30 minutes).
"""
self.secret_store = secret_store
self.interval_seconds = interval_seconds
self._stop_event = threading.Event()
self._thread = None
self.refresh_count = 0
self.last_refresh = None
def start(self) -> None:
"""
Start the background refresh thread.
Creates and starts a daemon thread that will periodically check
and refresh tokens. Safe to call multiple times.
"""
if self._thread is None or not self._thread.is_alive():
self._stop_event.clear()
self._thread = threading.Thread(target=self._refresh_loop, daemon=True)
self._thread.start()
logger.info(
f"Started background refresher (interval: {self.interval_seconds}s)"
)
def stop(self) -> None:
"""
Stop the background refresh thread.
Signals the refresh thread to stop and waits up to 5 seconds
for it to finish gracefully.
"""
if self._thread and self._thread.is_alive():
self._stop_event.set()
self._thread.join(timeout=5)
logger.info("Stopped background refresher")
def _refresh_loop(self):
while not self._stop_event.is_set():
if self._stop_event.wait(self.interval_seconds):
break
try: try:
if self.secret_store._refresh_keycloak_tokens(): status["vault_authenticated"] = self.client.is_authenticated()
self.secret_store._authenticate_vault() except Exception:
self.refresh_count += 1 status["vault_authenticated"] = False
self.last_refresh = datetime.now()
logger.info(
f"✅ Background refresh #{self.refresh_count} successful"
)
else:
logger.error("❌ Background refresh failed")
except Exception as e:
logger.error(f"Exception in background refresh: {e}")
def get_status(self) -> dict[str, Any]: return status
"""
Get the current status of the background refresher.
Returns
-------
dict[str, Any]
Status dictionary containing:
- running: Whether the refresh thread is active
- refresh_count: Number of successful refreshes performed
- last_refresh: ISO timestamp of last successful refresh (or None)
- interval_seconds: Configured refresh interval
Examples
--------
>>> refresher = secrets.start_background_refresh()
>>> status = refresher.get_status()
>>> print(f"Running: {status['running']}, Count: {status['refresh_count']}")
"""
return {
"running": self._thread and self._thread.is_alive(),
"refresh_count": self.refresh_count,
"last_refresh": self.last_refresh.isoformat()
if self.last_refresh
else None,
"interval_seconds": self.interval_seconds,
}
# Utility functions # Utility functions
@@ -817,7 +546,7 @@ def put_env_to_secrets(
>>> # Store with custom key >>> # Store with custom key
>>> put_env_to_secrets(secrets, {'API_KEY': 'secret'}, 'production-config') >>> put_env_to_secrets(secrets, {'API_KEY': 'secret'}, 'production-config')
'jupyter/users/username/environment' 'jupyter/users/username/production-config'
""" """
# Convert all values to strings and use **kwargs for put() # Convert all values to strings and use **kwargs for put()
string_env_dict = {k: str(v) for k, v in env_dict.items()} string_env_dict = {k: str(v) for k, v in env_dict.items()}

View File

@@ -76,3 +76,7 @@ strict_equality = true
minversion = "6.0" minversion = "6.0"
addopts = "-ra -q" addopts = "-ra -q"
testpaths = ["tests"] testpaths = ["tests"]
[tool.pyright]
reportUnusedParameter = "none"
reportUnusedVariable = "warning"

View File

@@ -136,6 +136,29 @@ create-admin-token root_token='': check-env
# Create token with admin policy # Create token with admin policy
vault token create -policy=admin vault token create -policy=admin
# Create token with specified policy and store in Vault
create-token-and-store policy path ttl="24h" root_token='': check-env
#!/bin/bash
set -euo pipefail
{{ _vault_root_env_setup }}
echo "Creating token with policy '{{ policy }}'..."
# Create token with specified policy
token_output=$(vault token create -policy={{ policy }} -ttl={{ ttl }} -format=json)
service_token=$(echo "${token_output}" | jq -r '.auth.client_token')
echo "Storing token in Vault at path '{{ path }}'..."
# Store the token in Vault itself for later retrieval
vault kv put -mount=secret {{ path }} token="${service_token}"
echo "✓ Token created and stored in Vault"
echo "Policy: {{ policy }}"
echo "Path: secret/{{ path }}"
echo "Token (first 20 chars): ${service_token:0:20}..."
echo ""
echo "To retrieve the token later:"
echo " just vault::get {{ path }} token"
# Create admin policy for Vault # Create admin policy for Vault
create-admin-policy root_token='': create-admin-policy root_token='':
#!/bin/bash #!/bin/bash
@@ -160,6 +183,12 @@ create-admin-policy root_token='':
path "sys/policies/acl/*" { path "sys/policies/acl/*" {
capabilities = ["create", "read", "update", "delete", "list"] capabilities = ["create", "read", "update", "delete", "list"]
} }
path "auth/token/create" {
capabilities = ["create", "update"]
}
path "auth/token/create/*" {
capabilities = ["create", "update"]
}
EOF EOF
echo "Admin policy created successfully" echo "Admin policy created successfully"
@@ -287,7 +316,7 @@ setup-jwt-auth audience role policy='default':
user_claim="preferred_username" \ user_claim="preferred_username" \
token_policies="{{ policy }}" \ token_policies="{{ policy }}" \
ttl="1h" \ ttl="1h" \
max_ttl="24h" max_ttl="48h"
echo "✓ JWT authentication configured" echo "✓ JWT authentication configured"
echo " Audience: {{ audience }}" echo " Audience: {{ audience }}"