feat(querybook): install Querybook

This commit is contained in:
Masaki Yatsu
2025-10-18 13:10:46 +09:00
parent c4e27f348f
commit 8d29fe25c0
10 changed files with 979 additions and 0 deletions

View File

@@ -36,6 +36,7 @@ A remotely accessible Kubernetes home lab with OIDC authentication. Build a mode
- **[JupyterHub](https://jupyter.org/hub)**: Interactive computing with collaborative notebooks - **[JupyterHub](https://jupyter.org/hub)**: Interactive computing with collaborative notebooks
- **[Trino](https://trino.io/)**: Distributed SQL query engine for querying multiple data sources - **[Trino](https://trino.io/)**: Distributed SQL query engine for querying multiple data sources
- **[Querybook](https://www.querybook.org/)**: Big data querying UI with notebook interface
- **[ClickHouse](https://clickhouse.com/)**: High-performance columnar analytics database - **[ClickHouse](https://clickhouse.com/)**: High-performance columnar analytics database
- **[Qdrant](https://qdrant.tech/)**: Vector database for AI/ML applications - **[Qdrant](https://qdrant.tech/)**: Vector database for AI/ML applications
- **[Lakekeeper](https://lakekeeper.io/)**: Apache Iceberg REST Catalog for data lake management - **[Lakekeeper](https://lakekeeper.io/)**: Apache Iceberg REST Catalog for data lake management
@@ -152,6 +153,17 @@ Business intelligence and data visualization platform with PostgreSQL integratio
[📖 See Metabase Documentation](./metabase/README.md) [📖 See Metabase Documentation](./metabase/README.md)
### Querybook
Pinterest's big data querying UI with notebook interface for collaborative data exploration:
- **Trino Integration**: Execute SQL queries against multiple data sources with user impersonation
- **Notebook Interface**: Create shareable datadocs with queries, visualizations, and documentation
- **Keycloak Authentication**: OAuth2 integration with group-based admin access
- **Real-time Execution**: WebSocket-based query execution with live progress updates
[📖 See Querybook Documentation](./querybook/README.md)
### Trino ### Trino
Fast distributed SQL query engine for big data analytics with: Fast distributed SQL query engine for big data analytics with:
@@ -299,6 +311,7 @@ kubectl --context yourpc-oidc get nodes
# Vault: https://vault.yourdomain.com # Vault: https://vault.yourdomain.com
# Keycloak: https://auth.yourdomain.com # Keycloak: https://auth.yourdomain.com
# Trino: https://trino.yourdomain.com # Trino: https://trino.yourdomain.com
# Querybook: https://querybook.yourdomain.com
# Metabase: https://metabase.yourdomain.com # Metabase: https://metabase.yourdomain.com
# Airflow: https://airflow.yourdomain.com # Airflow: https://airflow.yourdomain.com
# JupyterHub: https://jupyter.yourdomain.com # JupyterHub: https://jupyter.yourdomain.com

View File

@@ -23,6 +23,7 @@ mod minio
mod oauth2-proxy mod oauth2-proxy
mod postgres mod postgres
mod qdrant mod qdrant
mod querybook
mod trino mod trino
mod utils mod utils
mod vault mod vault

10
querybook/.gitignore vendored Normal file
View File

@@ -0,0 +1,10 @@
# Generated Helm values (contains OAuth client secret)
querybook-values.yaml
# Generated Kubernetes manifests
querybook-config-external-secret.yaml
keycloak-auth-configmap.yaml
traefik-middleware.yaml
# Cloned Helm chart repository
querybook-repo/

252
querybook/README.md Normal file
View File

@@ -0,0 +1,252 @@
# Querybook
Pinterest's big data querying UI with notebook interface, Keycloak OAuth authentication, and Trino integration.
## Overview
This module deploys Querybook using the official Helm chart from Pinterest with:
- **Keycloak OAuth2 authentication** for user login
- **Trino integration** with user impersonation for query attribution
- **PostgreSQL backend** for metadata storage
- **Redis** for caching and session management
- **Traefik integration** with WebSocket support for real-time query execution
- **Group-based admin access** via Keycloak groups
## Prerequisites
- Kubernetes cluster (k3s)
- Keycloak installed and configured
- PostgreSQL cluster (CloudNativePG)
- Trino with access control configured
- External Secrets Operator (optional, for Vault integration)
## Installation
### Basic Installation
```bash
just querybook::install
```
You will be prompted for:
1. **Querybook host (FQDN)**: e.g., `querybook.example.com`
2. **Keycloak host (FQDN)**: e.g., `auth.example.com`
### What Gets Installed
- Querybook web service
- Querybook scheduler (background jobs)
- Querybook workers (query execution)
- PostgreSQL database for Querybook metadata
- Redis for caching and sessions
- Keycloak OAuth2 client (confidential client)
- `querybook-admin` group in Keycloak for admin access
- Traefik Middleware for WebSocket and header forwarding
## Configuration
Environment variables (set in `.env.local` or override):
```bash
QUERYBOOK_NAMESPACE=querybook # Kubernetes namespace
QUERYBOOK_HOST=querybook.example.com # External hostname
KEYCLOAK_HOST=auth.example.com # Keycloak hostname
KEYCLOAK_REALM=buunstack # Keycloak realm name
```
## Usage
### Access Querybook
1. Navigate to `https://your-querybook-host/`
2. Click "Login with OAuth" to authenticate with Keycloak
3. Create datadocs (notebooks) and execute queries
### Grant Admin Access
Add users to the `querybook-admin` group:
```bash
just keycloak::add-user-to-group <username> querybook-admin
```
Admin users can:
- Manage query engines
- Configure data sources
- Manage user permissions
- View all datadocs
### Configure Trino Query Engine
1. Log in as an admin user
2. Navigate to Admin → Query Engines
3. Click "Add Query Engine"
4. Configure:
```plain
Name: Trino
Language: Trino
Environment: production (or your preferred environment name)
```
5. Navigate to Admin → Environments → [your environment]
6. Add new query engine connection:
```plain
Connection String: trino://trino.example.com:443?SSL=true
Username: admin
Password: [from just trino::admin-password]
```
7. Optional: Configure additional connection parameters:
- **Catalog**: Specify default catalog (e.g., `postgresql` or `iceberg`)
- **Schema**: Specify default schema
- **Proxy_user_id**: Leave empty or set to enable user impersonation
### User Impersonation
Querybook connects to Trino as `admin` but executes queries as the logged-in user via Trino's impersonation feature. This provides:
- **Query Attribution**: Queries are attributed to the actual user, not the admin account
- **Audit Logging**: Trino logs show the real user who executed each query
- **Access Control**: Future per-user access policies can be enforced
**How it Works**:
1. User logs into Querybook with Keycloak
2. Querybook connects to Trino using admin credentials
3. Querybook sends queries with `X-Trino-User: <username>` header
4. Trino impersonates the user (allowed by access control rules)
5. Query runs as if executed by the actual user
## Architecture
```
External Users
Cloudflare Tunnel (HTTPS)
Traefik Ingress (HTTPS)
├─ Traefik Middleware (X-Forwarded-*, WebSocket upgrade)
└─ Backend: HTTP
Querybook Web
├─ OAuth2 → Keycloak (authentication)
├─ PostgreSQL (metadata)
├─ Redis (cache/sessions)
└─ WebSocket (real-time query updates)
Querybook Workers
Trino (HTTPS via external hostname)
└─ Password auth + User impersonation
```
**Key Components**:
- **Traefik Middleware**: Handles WebSocket upgrade headers and X-Forwarded-* headers
- **OAuth2 Integration**: Uses standard OIDC scopes (openid, email, profile) with groups mapper
- **Trino Connection**: Must use external HTTPS hostname (not internal service name)
- **User Impersonation**: Admin credentials with X-Trino-User header for query attribution
## Authentication
### User Login (OAuth2)
- Users authenticate via Keycloak
- Standard OIDC flow with Authorization Code grant
- Group membership included in UserInfo endpoint response
- Session stored in Redis
### Admin Access
- Controlled by Keycloak group membership
- Users in `querybook-admin` group have full admin privileges
- Regular users can create and manage their own datadocs
### Trino Connection
- Uses password authentication (admin user)
- Connects via external HTTPS hostname (Traefik provides TLS)
- Python Trino client enforces HTTPS when authentication is used
- User impersonation via X-Trino-User header
## Management
### Upgrade Querybook
```bash
just querybook::upgrade
```
Updates the Helm deployment with current configuration.
### Uninstall
```bash
# Keep PostgreSQL database
just querybook::uninstall false
# Delete PostgreSQL database too
just querybook::uninstall true
```
## Troubleshooting
### Check Pod Status
```bash
kubectl get pods -n querybook
```
### WebSocket Connection Fails
- Verify Traefik middleware exists: `kubectl get middleware querybook-headers -n querybook`
- Check WebSocket upgrade headers in middleware configuration
- Ensure Ingress annotation references middleware: `querybook-querybook-headers@kubernetescrd`
### OAuth Login Fails
- Verify Keycloak client exists: `just keycloak::list-clients`
- Check redirect URL: `https://<querybook-host>/oauth2callback`
- Verify client secret matches: Compare Vault/K8s secret with Keycloak
- Check Keycloak is accessible from Querybook pods
### Trino Connection Fails
- **Error: "cannot use authentication with HTTP"**
- Must use external hostname with HTTPS: `trino://trino.example.com:443?SSL=true`
- Do NOT use internal service name (e.g., `trino.trino.svc.cluster.local:8080`)
- Python Trino client enforces HTTPS when authentication is used
- **Error: "500 Internal Server Error"**
- Verify Trino is accessible via external hostname
- Check Trino admin password: `just trino::admin-password`
- Test Trino connection manually with curl
- **Error: "Access Denied: User admin cannot impersonate user X"**
- Verify Trino access control is configured
- Check impersonation rules: `kubectl exec -n trino deployment/trino-coordinator -- cat /etc/trino/access-control/rules.json`
- Ensure admin can impersonate all users
### Query Execution Stuck
- Check worker pod logs: `just querybook::logs worker`
- Verify Redis is running: `kubectl get pods -n querybook | grep redis`
- Check Trino coordinator health: `kubectl get pods -n trino`
### Database Connection Issues
- Verify PostgreSQL cluster is running: `kubectl get cluster -n postgres`
- Check database exists: `just postgres::list-databases | grep querybook`
- Verify secret exists: `kubectl get secret querybook-config-secret -n querybook`
## References
- [Querybook Documentation](https://www.querybook.org/)
- [Querybook GitHub](https://github.com/pinterest/querybook)
- [Trino Integration](../trino/README.md)
- [Keycloak OAuth2](https://www.keycloak.org/docs/latest/securing_apps/#_oidc)

View File

@@ -0,0 +1,46 @@
"""
Keycloak OIDC authentication backend for Querybook
"""
from app.auth.oauth_auth import OAuthLoginManager, OAUTH_CALLBACK_PATH
from env import QuerybookSettings
class KeycloakLoginManager(OAuthLoginManager):
@property
def oauth_config(self):
return {
"callback_url": "{}{}".format(
QuerybookSettings.PUBLIC_URL, OAUTH_CALLBACK_PATH
),
"client_id": QuerybookSettings.OAUTH_CLIENT_ID,
"client_secret": QuerybookSettings.OAUTH_CLIENT_SECRET,
"authorization_url": QuerybookSettings.OAUTH_AUTHORIZATION_URL,
"token_url": QuerybookSettings.OAUTH_TOKEN_URL,
"profile_url": QuerybookSettings.OAUTH_USER_PROFILE,
"scope": ["openid", "email", "profile"],
}
def _parse_user_profile(self, resp):
"""Parse standard OIDC UserInfo response from Keycloak"""
user = resp.json()
# Keycloak returns standard OIDC claims:
# - preferred_username: username
# - email: email address
# - name: full name (optional)
username = user.get("preferred_username") or user.get("email", "").split("@")[0]
email = user.get("email", "")
fullname = user.get("name", username)
return username, email, fullname
login_manager = KeycloakLoginManager()
ignore_paths = [OAUTH_CALLBACK_PATH]
def init_app(app):
login_manager.init_app(app)
def login(request):
return login_manager.login(request)

327
querybook/justfile Normal file
View File

@@ -0,0 +1,327 @@
set fallback := true
export QUERYBOOK_NAMESPACE := env("QUERYBOOK_NAMESPACE", "querybook")
export QUERYBOOK_HOST := env("QUERYBOOK_HOST", "")
export QUERYBOOK_CHART_REPO := env("QUERYBOOK_CHART_REPO", "https://github.com/pinterest/querybook")
export QUERYBOOK_CHART_PATH := env("QUERYBOOK_CHART_PATH", "helm")
export EXTERNAL_SECRETS_NAMESPACE := env("EXTERNAL_SECRETS_NAMESPACE", "external-secrets")
export K8S_VAULT_NAMESPACE := env("K8S_VAULT_NAMESPACE", "vault")
export KEYCLOAK_REALM := env("KEYCLOAK_REALM", "buunstack")
export KEYCLOAK_HOST := env("KEYCLOAK_HOST", "")
[private]
default:
@just --list --unsorted --list-submodules
# Create Querybook namespace
create-namespace:
@kubectl get namespace ${QUERYBOOK_NAMESPACE} &>/dev/null || \
kubectl create namespace ${QUERYBOOK_NAMESPACE}
# Delete Querybook namespace
delete-namespace:
@kubectl delete namespace ${QUERYBOOK_NAMESPACE} --ignore-not-found
# Clone Querybook Helm chart repository
clone-chart-repo:
#!/bin/bash
set -euo pipefail
if [ ! -d "querybook-repo" ]; then
echo "Cloning Querybook Helm chart repository..."
git clone --depth 1 ${QUERYBOOK_CHART_REPO} querybook-repo
else
echo "Querybook repository already exists. Pulling latest changes..."
cd querybook-repo && git pull
fi
# Remove cloned chart repository
remove-chart-repo:
rm -rf querybook-repo
# Create Keycloak client and OAuth secret for Querybook
create-keycloak-client:
#!/bin/bash
set -euo pipefail
while [ -z "${QUERYBOOK_HOST}" ]; do
QUERYBOOK_HOST=$(
gum input --prompt="Querybook host (FQDN): " --width=100 \
--placeholder="e.g., querybook.example.com"
)
done
echo "Creating Keycloak client for Querybook..."
# Delete existing client if present
just keycloak::delete-client ${KEYCLOAK_REALM} querybook || true
# Generate client secret
CLIENT_SECRET=$(just utils::random-password)
# Create 'querybook-admin' group if it doesn't exist
echo "Creating 'querybook-admin' group..."
just keycloak::create-group querybook-admin '' 'Querybook administrators' || echo "Group may already exist"
# Create confidential client with client secret
# Uses standard OIDC scopes: openid, email, profile (no custom scopes needed)
just keycloak::create-client \
realm=${KEYCLOAK_REALM} \
client_id=querybook \
redirect_url="https://${QUERYBOOK_HOST}/oauth2callback" \
client_secret="${CLIENT_SECRET}"
# Add groups mapper to include group membership in UserInfo
echo "Adding groups mapper to querybook client..."
just keycloak::add-groups-mapper querybook
# Store client secret temporarily in Kubernetes Secret (always created)
kubectl delete secret querybook-oauth-temp -n ${QUERYBOOK_NAMESPACE} --ignore-not-found
kubectl create secret generic querybook-oauth-temp -n ${QUERYBOOK_NAMESPACE} \
--from-literal=client_secret="${CLIENT_SECRET}"
# Also store in Vault if available
if helm status vault -n ${K8S_VAULT_NAMESPACE} &>/dev/null; then
echo "Storing OAuth client secret in Vault..."
just vault::put querybook/oauth client_secret="${CLIENT_SECRET}"
fi
echo "Keycloak client created successfully"
echo "Client ID: querybook"
echo "Scopes: openid, email, profile (standard OIDC scopes)"
echo "Redirect URI: https://${QUERYBOOK_HOST}/oauth2callback"
echo ""
echo "Admin Group: querybook-admin"
echo "To grant admin access, add users to 'querybook-admin' group:"
echo " just keycloak::add-user-to-group <username> querybook-admin"
# Delete Keycloak client
delete-keycloak-client:
#!/bin/bash
set -euo pipefail
echo "Deleting Keycloak client for Querybook..."
just keycloak::delete-client ${KEYCLOAK_REALM} querybook || true
echo "Deleting querybook-admin group..."
just keycloak::delete-group querybook-admin || true
kubectl delete secret querybook-oauth-temp -n ${QUERYBOOK_NAMESPACE} --ignore-not-found
# Create Querybook secrets
create-secrets:
#!/bin/bash
set -euo pipefail
# Generate Flask secret key
flask_secret=$(just utils::random-password)
# Get PostgreSQL credentials
pg_host="postgres-cluster-rw.postgres"
pg_port="5432"
pg_user=$(just postgres::admin-username)
pg_password=$(just postgres::admin-password)
pg_database="querybook"
# Build database connection string
database_conn="postgresql://${pg_user}:${pg_password}@${pg_host}:${pg_port}/${pg_database}"
# Get OAuth client secret (created by create-keycloak-client)
# Try Vault first, fallback to Kubernetes Secret
if helm status vault -n ${K8S_VAULT_NAMESPACE} &>/dev/null && \
just vault::get querybook/oauth client_secret &>/dev/null; then
oauth_client_secret=$(just vault::get querybook/oauth client_secret)
elif kubectl get secret querybook-oauth-temp -n ${QUERYBOOK_NAMESPACE} &>/dev/null; then
oauth_client_secret=$(kubectl get secret querybook-oauth-temp -n ${QUERYBOOK_NAMESPACE} \
-o jsonpath='{.data.client_secret}' | base64 -d)
else
echo "Error: Cannot retrieve OAuth client secret. Please run 'just querybook::create-keycloak-client' first."
exit 1
fi
if helm status external-secrets -n ${EXTERNAL_SECRETS_NAMESPACE} &>/dev/null; then
echo "External Secrets Operator detected. Storing secrets in Vault..."
just vault::put querybook/config \
FLASK_SECRET_KEY="${flask_secret}" \
DATABASE_CONN="${database_conn}" \
REDIS_URL="redis://redis:6379/0" \
ELASTICSEARCH_HOST="elasticsearch:9200" \
OAUTH_CLIENT_SECRET="${oauth_client_secret}"
kubectl delete secret querybook-secret -n ${QUERYBOOK_NAMESPACE} --ignore-not-found
kubectl delete externalsecret querybook-secret -n ${QUERYBOOK_NAMESPACE} --ignore-not-found
gomplate -f querybook-config-external-secret.gomplate.yaml \
-o querybook-config-external-secret.yaml
kubectl apply -f querybook-config-external-secret.yaml
echo "Waiting for ExternalSecret to sync..."
kubectl wait --for=condition=Ready externalsecret/querybook-secret \
-n ${QUERYBOOK_NAMESPACE} --timeout=60s
else
echo "External Secrets Operator not found. Creating secret directly..."
kubectl delete secret querybook-secret -n ${QUERYBOOK_NAMESPACE} --ignore-not-found
kubectl create secret generic querybook-secret -n ${QUERYBOOK_NAMESPACE} \
--from-literal=FLASK_SECRET_KEY="${flask_secret}" \
--from-literal=DATABASE_CONN="${database_conn}" \
--from-literal=REDIS_URL="redis://redis:6379/0" \
--from-literal=ELASTICSEARCH_HOST="elasticsearch:9200" \
--from-literal=OAUTH_CLIENT_SECRET="${oauth_client_secret}"
if helm status vault -n ${K8S_VAULT_NAMESPACE} &>/dev/null; then
just vault::put querybook/config \
FLASK_SECRET_KEY="${flask_secret}" \
DATABASE_CONN="${database_conn}" \
REDIS_URL="redis://redis:6379/0" \
ELASTICSEARCH_HOST="elasticsearch:9200" \
OAUTH_CLIENT_SECRET="${oauth_client_secret}"
fi
fi
# Delete Querybook secrets
delete-secrets:
@kubectl delete secret querybook-secret -n ${QUERYBOOK_NAMESPACE} --ignore-not-found
@kubectl delete externalsecret querybook-secret -n ${QUERYBOOK_NAMESPACE} --ignore-not-found
# Create Keycloak auth ConfigMap
create-auth-configmap:
#!/bin/bash
set -euo pipefail
echo "Creating Keycloak auth ConfigMap..."
gomplate -f keycloak-auth-configmap.gomplate.yaml -o keycloak-auth-configmap.yaml
kubectl apply -f keycloak-auth-configmap.yaml
# Create Traefik Middleware for WebSocket support
create-traefik-middleware:
#!/bin/bash
set -euo pipefail
echo "Creating Traefik Middleware for WebSocket support..."
gomplate -f traefik-middleware.gomplate.yaml -o traefik-middleware.yaml
kubectl apply -f traefik-middleware.yaml
# Install Querybook
install:
#!/bin/bash
set -euo pipefail
while [ -z "${QUERYBOOK_HOST}" ]; do
QUERYBOOK_HOST=$(
gum input --prompt="Querybook host (FQDN): " --width=100 \
--placeholder="e.g., querybook.example.com"
)
done
while [ -z "${KEYCLOAK_HOST}" ]; do
KEYCLOAK_HOST=$(
gum input --prompt="Keycloak host (FQDN): " --width=100 \
--placeholder="e.g., auth.example.com"
)
done
just create-namespace
just postgres::create-db querybook
just create-keycloak-client
just create-secrets
just clone-chart-repo
# Get OAuth client secret for gomplate template
# Try Vault first, fallback to Kubernetes Secret
if helm status vault -n ${K8S_VAULT_NAMESPACE} &>/dev/null && \
just vault::get querybook/oauth client_secret &>/dev/null; then
export OAUTH_CLIENT_SECRET=$(just vault::get querybook/oauth client_secret)
elif kubectl get secret querybook-oauth-temp -n ${QUERYBOOK_NAMESPACE} &>/dev/null; then
export OAUTH_CLIENT_SECRET=$(kubectl get secret querybook-oauth-temp -n ${QUERYBOOK_NAMESPACE} \
-o jsonpath='{.data.client_secret}' | base64 -d)
else
echo "Error: Cannot retrieve OAuth client secret. Please run 'just querybook::create-keycloak-client' first."
exit 1
fi
# Create Traefik Middleware (must exist before Helm install)
just create-traefik-middleware
# Create Keycloak auth ConfigMap (must exist before Helm install)
just create-auth-configmap
gomplate -f querybook-values.gomplate.yaml -o querybook-values.yaml
helm upgrade --cleanup-on-fail --install querybook ./querybook-repo/${QUERYBOOK_CHART_PATH} \
-n ${QUERYBOOK_NAMESPACE} --wait \
-f querybook-values.yaml
echo ""
echo "Querybook installed successfully!"
echo "Access URL: https://${QUERYBOOK_HOST}"
echo ""
echo "OAuth Configuration:"
echo " Provider: Keycloak (custom OIDC backend)"
echo " Realm: ${KEYCLOAK_REALM}"
echo " Scopes: openid, email, profile"
echo " Authorization URL: https://${KEYCLOAK_HOST}/realms/${KEYCLOAK_REALM}/protocol/openid-connect/auth"
echo ""
echo "Admin Access:"
echo " To grant admin access, add users to 'querybook-admin' group:"
echo " just keycloak::add-user-to-group <username> querybook-admin"
echo ""
# Upgrade Querybook
upgrade:
#!/bin/bash
set -euo pipefail
while [ -z "${QUERYBOOK_HOST}" ]; do
QUERYBOOK_HOST=$(
gum input --prompt="Querybook host (FQDN): " --width=100 \
--placeholder="e.g., querybook.example.com"
)
done
while [ -z "${KEYCLOAK_HOST}" ]; do
KEYCLOAK_HOST=$(
gum input --prompt="Keycloak host (FQDN): " --width=100 \
--placeholder="e.g., auth.example.com"
)
done
# Get OAuth client secret for gomplate template
# Try Vault first, fallback to Kubernetes Secret
if helm status vault -n ${K8S_VAULT_NAMESPACE} &>/dev/null && \
just vault::get querybook/oauth client_secret &>/dev/null; then
export OAUTH_CLIENT_SECRET=$(just vault::get querybook/oauth client_secret)
elif kubectl get secret querybook-oauth-temp -n ${QUERYBOOK_NAMESPACE} &>/dev/null; then
export OAUTH_CLIENT_SECRET=$(kubectl get secret querybook-oauth-temp -n ${QUERYBOOK_NAMESPACE} \
-o jsonpath='{.data.client_secret}' | base64 -d)
else
echo "Error: Cannot retrieve OAuth client secret. Please run 'just querybook::create-keycloak-client' first."
exit 1
fi
echo "Upgrading Querybook..."
# Update Traefik Middleware (must exist before Helm upgrade)
just create-traefik-middleware
# Update Keycloak auth ConfigMap (must exist before Helm upgrade)
just create-auth-configmap
gomplate -f querybook-values.gomplate.yaml -o querybook-values.yaml
helm upgrade querybook ./querybook-repo/${QUERYBOOK_CHART_PATH} \
-n ${QUERYBOOK_NAMESPACE} --wait \
-f querybook-values.yaml
echo "Querybook upgraded successfully"
# Uninstall Querybook
uninstall delete-db='true':
#!/bin/bash
set -euo pipefail
helm uninstall querybook -n ${QUERYBOOK_NAMESPACE} --ignore-not-found --wait
kubectl delete configmap querybook-keycloak-auth -n ${QUERYBOOK_NAMESPACE} --ignore-not-found
kubectl delete middleware querybook-headers -n ${QUERYBOOK_NAMESPACE} --ignore-not-found
kubectl delete serverstransport querybook-transport -n ${QUERYBOOK_NAMESPACE} --ignore-not-found
just delete-secrets
just delete-keycloak-client
just delete-namespace
if [ "{{ delete-db }}" = "true" ]; then
just postgres::delete-db querybook
fi
# Clean up Vault entries if present
if helm status vault -n ${K8S_VAULT_NAMESPACE} &>/dev/null; then
just vault::delete querybook/config || true
just vault::delete querybook/oauth || true
fi

View File

@@ -0,0 +1,84 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: querybook-keycloak-auth
namespace: {{ .Env.QUERYBOOK_NAMESPACE }}
data:
keycloak_auth.py: |
"""
Keycloak OIDC authentication backend for Querybook
"""
from app.auth.oauth_auth import OAuthLoginManager, OAUTH_CALLBACK_PATH
from env import QuerybookSettings
from lib.logger import get_logger
from logic.user import get_user_by_name, create_user
LOG = get_logger(__file__)
class KeycloakLoginManager(OAuthLoginManager):
def __init__(self):
super().__init__()
self._current_user_groups = []
@property
def oauth_config(self):
return {
"callback_url": "{}{}".format(
QuerybookSettings.PUBLIC_URL, OAUTH_CALLBACK_PATH
),
"client_id": QuerybookSettings.OAUTH_CLIENT_ID,
"client_secret": QuerybookSettings.OAUTH_CLIENT_SECRET,
"authorization_url": QuerybookSettings.OAUTH_AUTHORIZATION_URL,
"token_url": QuerybookSettings.OAUTH_TOKEN_URL,
"profile_url": QuerybookSettings.OAUTH_USER_PROFILE,
"scope": ["openid", "email", "profile"],
}
def _parse_user_profile(self, resp):
"""Parse standard OIDC UserInfo response from Keycloak"""
user = resp.json()
username = user.get("preferred_username") or user.get("email", "").split("@")[0]
email = user.get("email", "")
# Store groups for role synchronization
self._current_user_groups = user.get("groups", [])
LOG.info(f"User {username} groups: {self._current_user_groups}")
return username, email
def login_user(self, username, email, session=None):
"""Override login_user - using default Querybook behavior
Note: Querybook automatically makes the first user an admin via
create_admin_when_no_admin() function. Additional users can be
granted admin access through Querybook's UI or database.
"""
from .utils import AuthenticationError
if not username or not isinstance(username, str):
raise AuthenticationError("Please provide a valid username")
user = get_user_by_name(username, session=session)
if not user:
user = create_user(
username=username, fullname=username, email=email, session=session
)
# Log group membership for debugging
LOG.info(f"User {username} Keycloak groups: {self._current_user_groups}")
return user
login_manager = KeycloakLoginManager()
ignore_paths = [OAUTH_CALLBACK_PATH]
def init_app(app):
login_manager.init_app(app)
def login(request):
return login_manager.login(request)

View File

@@ -0,0 +1,34 @@
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
name: querybook-secret
namespace: {{ .Env.QUERYBOOK_NAMESPACE }}
spec:
refreshInterval: 1h
secretStoreRef:
name: vault-secret-store
kind: ClusterSecretStore
target:
name: querybook-secret
creationPolicy: Owner
data:
- secretKey: FLASK_SECRET_KEY
remoteRef:
key: querybook/config
property: FLASK_SECRET_KEY
- secretKey: DATABASE_CONN
remoteRef:
key: querybook/config
property: DATABASE_CONN
- secretKey: REDIS_URL
remoteRef:
key: querybook/config
property: REDIS_URL
- secretKey: ELASTICSEARCH_HOST
remoteRef:
key: querybook/config
property: ELASTICSEARCH_HOST
- secretKey: OAUTH_CLIENT_SECRET
remoteRef:
key: querybook/config
property: OAUTH_CLIENT_SECRET

View File

@@ -0,0 +1,187 @@
# Querybook Helm Chart Values
# https://github.com/pinterest/querybook/tree/master/helm
# Worker configuration
worker:
replicaCount: 1
name: worker
image:
repository: querybook/querybook
pullPolicy: IfNotPresent
tag: latest
resources:
requests:
memory: 1Gi
cpu: 700m
limits:
memory: 2Gi
cpu: 1
# Scheduler configuration
scheduler:
replicaCount: 1
name: scheduler
image:
repository: querybook/querybook
pullPolicy: IfNotPresent
tag: latest
resources:
requests:
memory: 200Mi
cpu: 100m
limits:
memory: 300Mi
cpu: 200m
# Web server configuration
web:
replicaCount: 1
name: web
image:
repository: querybook/querybook
pullPolicy: IfNotPresent
tag: latest
service:
serviceType: ClusterIP
servicePort: 80
containerPort: 10001
resources:
requests:
memory: 1Gi
cpu: 500m
limits:
memory: 2Gi
cpu: 1
# Custom initContainer to inject Keycloak auth backend
initContainers:
- name: copy-keycloak-auth
image: busybox:latest
command:
- sh
- -c
- cp /config/keycloak_auth.py /auth/keycloak_auth.py && chmod 644 /auth/keycloak_auth.py
volumeMounts:
- name: keycloak-auth-config
mountPath: /config
- name: auth-volume
mountPath: /auth
# Volume mounts for main container
volumeMounts:
- name: auth-volume
mountPath: /opt/querybook/querybook/server/app/auth/keycloak_auth.py
subPath: keycloak_auth.py
# Volumes
volumes:
- name: keycloak-auth-config
configMap:
name: querybook-keycloak-auth
- name: auth-volume
emptyDir: {}
# Use external PostgreSQL (buun-stack PostgreSQL cluster)
mysql:
enabled: false
# Redis configuration (use Helm chart's embedded Redis)
redis:
enabled: true
replicaCount: 1
name: redis
image:
repository: redis
pullPolicy: IfNotPresent
tag: "7.2"
service:
serviceType: ClusterIP
servicePort: 6379
resources:
requests:
memory: 512Mi
cpu: 200m
limits:
memory: 1Gi
cpu: 500m
# Elasticsearch configuration (use Helm chart's embedded Elasticsearch)
elasticsearch:
enabled: true
replicaCount: 1
name: elasticsearch
image:
repository: docker.elastic.co/elasticsearch/elasticsearch
pullPolicy: IfNotPresent
tag: "7.17.16"
extraEnvs:
- name: ES_JAVA_OPTS
value: -Xms1g -Xmx1g
- name: bootstrap.memory_lock
value: 'false'
- name: cluster.name
value: querybook-cluster
- name: discovery.type
value: single-node
service:
serviceType: ClusterIP
servicePort: 9200
resources:
requests:
memory: 2Gi
cpu: 500m
limits:
memory: 3Gi
cpu: 1
# Ingress configuration
ingress:
enabled: true
ingressClassName: traefik
annotations:
kubernetes.io/ingress.class: traefik
traefik.ingress.kubernetes.io/router.entrypoints: websecure
# WebSocket support - apply middleware for X-Forwarded-Proto header
traefik.ingress.kubernetes.io/router.middlewares: querybook-querybook-headers@kubernetescrd
# Sticky sessions for WebSocket connections
traefik.ingress.kubernetes.io/service.sticky.cookie: "true"
traefik.ingress.kubernetes.io/service.sticky.cookie.name: querybook-session
# Increase timeouts for WebSocket connections (in seconds)
traefik.ingress.kubernetes.io/service.serversTransport: querybook-transport@kubernetescrd
path: /
pathType: Prefix
hosts:
- {{ .Env.QUERYBOOK_HOST }}
tls:
- hosts:
- {{ .Env.QUERYBOOK_HOST }}
# Querybook environment variables
extraEnv:
# Public URL (required for OAuth)
PUBLIC_URL: https://{{ .Env.QUERYBOOK_HOST }}
# WebSocket CORS origins (required for socket.io to accept connections)
WS_CORS_ALLOWED_ORIGINS: '["https://{{ .Env.QUERYBOOK_HOST }}"]'
# Authentication backend (custom Keycloak OIDC implementation)
AUTH_BACKEND: app.auth.keycloak_auth
# OAuth configuration for Keycloak
OAUTH_CLIENT_ID: querybook
OAUTH_CLIENT_SECRET: {{ .Env.OAUTH_CLIENT_SECRET }}
OAUTH_AUTHORIZATION_URL: https://{{ .Env.KEYCLOAK_HOST }}/realms/{{ .Env.KEYCLOAK_REALM }}/protocol/openid-connect/auth
OAUTH_TOKEN_URL: https://{{ .Env.KEYCLOAK_HOST }}/realms/{{ .Env.KEYCLOAK_REALM }}/protocol/openid-connect/token
OAUTH_USER_PROFILE: https://{{ .Env.KEYCLOAK_HOST }}/realms/{{ .Env.KEYCLOAK_REALM }}/protocol/openid-connect/userinfo
# Session configuration
LOGS_OUT_AFTER: "0" # Never expire (re-login on browser close)
# Use existing secret for Flask, database, Redis, and Elasticsearch configuration
existingSecret: querybook-secret
# Node selector, affinity, and tolerations
nodeSelector: {}
affinity: {}
tolerations: []
podAnnotations: {}

View File

@@ -0,0 +1,25 @@
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: querybook-headers
namespace: {{ .Env.QUERYBOOK_NAMESPACE }}
spec:
headers:
customRequestHeaders:
X-Forwarded-Proto: "https"
customResponseHeaders:
X-Forwarded-Proto: "https"
---
apiVersion: traefik.io/v1alpha1
kind: ServersTransport
metadata:
name: querybook-transport
namespace: {{ .Env.QUERYBOOK_NAMESPACE }}
spec:
serverName: ""
insecureSkipVerify: false
# Timeouts for WebSocket connections
forwardingTimeouts:
dialTimeout: 30s
responseHeaderTimeout: 0s # No timeout for response headers (needed for WebSocket)
idleConnTimeout: 0s # No timeout for idle connections (needed for WebSocket)