diff --git a/README.md b/README.md index 0cf7c4d..590aa65 100644 --- a/README.md +++ b/README.md @@ -36,6 +36,7 @@ A remotely accessible Kubernetes home lab with OIDC authentication. Build a mode - **[JupyterHub](https://jupyter.org/hub)**: Interactive computing with collaborative notebooks - **[Trino](https://trino.io/)**: Distributed SQL query engine for querying multiple data sources +- **[Querybook](https://www.querybook.org/)**: Big data querying UI with notebook interface - **[ClickHouse](https://clickhouse.com/)**: High-performance columnar analytics database - **[Qdrant](https://qdrant.tech/)**: Vector database for AI/ML applications - **[Lakekeeper](https://lakekeeper.io/)**: Apache Iceberg REST Catalog for data lake management @@ -152,6 +153,17 @@ Business intelligence and data visualization platform with PostgreSQL integratio [📖 See Metabase Documentation](./metabase/README.md) +### Querybook + +Pinterest's big data querying UI with notebook interface for collaborative data exploration: + +- **Trino Integration**: Execute SQL queries against multiple data sources with user impersonation +- **Notebook Interface**: Create shareable datadocs with queries, visualizations, and documentation +- **Keycloak Authentication**: OAuth2 integration with group-based admin access +- **Real-time Execution**: WebSocket-based query execution with live progress updates + +[📖 See Querybook Documentation](./querybook/README.md) + ### Trino Fast distributed SQL query engine for big data analytics with: @@ -299,6 +311,7 @@ kubectl --context yourpc-oidc get nodes # Vault: https://vault.yourdomain.com # Keycloak: https://auth.yourdomain.com # Trino: https://trino.yourdomain.com +# Querybook: https://querybook.yourdomain.com # Metabase: https://metabase.yourdomain.com # Airflow: https://airflow.yourdomain.com # JupyterHub: https://jupyter.yourdomain.com diff --git a/justfile b/justfile index db052ed..5e14ba0 100644 --- a/justfile +++ b/justfile @@ -23,6 +23,7 @@ mod minio mod oauth2-proxy mod postgres mod qdrant +mod querybook mod trino mod utils mod vault diff --git a/querybook/.gitignore b/querybook/.gitignore new file mode 100644 index 0000000..f4271d8 --- /dev/null +++ b/querybook/.gitignore @@ -0,0 +1,10 @@ +# Generated Helm values (contains OAuth client secret) +querybook-values.yaml + +# Generated Kubernetes manifests +querybook-config-external-secret.yaml +keycloak-auth-configmap.yaml +traefik-middleware.yaml + +# Cloned Helm chart repository +querybook-repo/ diff --git a/querybook/README.md b/querybook/README.md new file mode 100644 index 0000000..d34315c --- /dev/null +++ b/querybook/README.md @@ -0,0 +1,252 @@ +# Querybook + +Pinterest's big data querying UI with notebook interface, Keycloak OAuth authentication, and Trino integration. + +## Overview + +This module deploys Querybook using the official Helm chart from Pinterest with: + +- **Keycloak OAuth2 authentication** for user login +- **Trino integration** with user impersonation for query attribution +- **PostgreSQL backend** for metadata storage +- **Redis** for caching and session management +- **Traefik integration** with WebSocket support for real-time query execution +- **Group-based admin access** via Keycloak groups + +## Prerequisites + +- Kubernetes cluster (k3s) +- Keycloak installed and configured +- PostgreSQL cluster (CloudNativePG) +- Trino with access control configured +- External Secrets Operator (optional, for Vault integration) + +## Installation + +### Basic Installation + +```bash +just querybook::install +``` + +You will be prompted for: + +1. **Querybook host (FQDN)**: e.g., `querybook.example.com` +2. **Keycloak host (FQDN)**: e.g., `auth.example.com` + +### What Gets Installed + +- Querybook web service +- Querybook scheduler (background jobs) +- Querybook workers (query execution) +- PostgreSQL database for Querybook metadata +- Redis for caching and sessions +- Keycloak OAuth2 client (confidential client) +- `querybook-admin` group in Keycloak for admin access +- Traefik Middleware for WebSocket and header forwarding + +## Configuration + +Environment variables (set in `.env.local` or override): + +```bash +QUERYBOOK_NAMESPACE=querybook # Kubernetes namespace +QUERYBOOK_HOST=querybook.example.com # External hostname +KEYCLOAK_HOST=auth.example.com # Keycloak hostname +KEYCLOAK_REALM=buunstack # Keycloak realm name +``` + +## Usage + +### Access Querybook + +1. Navigate to `https://your-querybook-host/` +2. Click "Login with OAuth" to authenticate with Keycloak +3. Create datadocs (notebooks) and execute queries + +### Grant Admin Access + +Add users to the `querybook-admin` group: + +```bash +just keycloak::add-user-to-group querybook-admin +``` + +Admin users can: + +- Manage query engines +- Configure data sources +- Manage user permissions +- View all datadocs + +### Configure Trino Query Engine + +1. Log in as an admin user +2. Navigate to Admin → Query Engines +3. Click "Add Query Engine" +4. Configure: + + ```plain + Name: Trino + Language: Trino + Environment: production (or your preferred environment name) + ``` + +5. Navigate to Admin → Environments → [your environment] +6. Add new query engine connection: + + ```plain + Connection String: trino://trino.example.com:443?SSL=true + Username: admin + Password: [from just trino::admin-password] + ``` + +7. Optional: Configure additional connection parameters: + - **Catalog**: Specify default catalog (e.g., `postgresql` or `iceberg`) + - **Schema**: Specify default schema + - **Proxy_user_id**: Leave empty or set to enable user impersonation + +### User Impersonation + +Querybook connects to Trino as `admin` but executes queries as the logged-in user via Trino's impersonation feature. This provides: + +- **Query Attribution**: Queries are attributed to the actual user, not the admin account +- **Audit Logging**: Trino logs show the real user who executed each query +- **Access Control**: Future per-user access policies can be enforced + +**How it Works**: + +1. User logs into Querybook with Keycloak +2. Querybook connects to Trino using admin credentials +3. Querybook sends queries with `X-Trino-User: ` header +4. Trino impersonates the user (allowed by access control rules) +5. Query runs as if executed by the actual user + +## Architecture + +``` +External Users + ↓ +Cloudflare Tunnel (HTTPS) + ↓ +Traefik Ingress (HTTPS) + ├─ Traefik Middleware (X-Forwarded-*, WebSocket upgrade) + └─ Backend: HTTP + ↓ +Querybook Web + ├─ OAuth2 → Keycloak (authentication) + ├─ PostgreSQL (metadata) + ├─ Redis (cache/sessions) + └─ WebSocket (real-time query updates) + ↓ +Querybook Workers + ↓ +Trino (HTTPS via external hostname) + └─ Password auth + User impersonation +``` + +**Key Components**: + +- **Traefik Middleware**: Handles WebSocket upgrade headers and X-Forwarded-* headers +- **OAuth2 Integration**: Uses standard OIDC scopes (openid, email, profile) with groups mapper +- **Trino Connection**: Must use external HTTPS hostname (not internal service name) +- **User Impersonation**: Admin credentials with X-Trino-User header for query attribution + +## Authentication + +### User Login (OAuth2) + +- Users authenticate via Keycloak +- Standard OIDC flow with Authorization Code grant +- Group membership included in UserInfo endpoint response +- Session stored in Redis + +### Admin Access + +- Controlled by Keycloak group membership +- Users in `querybook-admin` group have full admin privileges +- Regular users can create and manage their own datadocs + +### Trino Connection + +- Uses password authentication (admin user) +- Connects via external HTTPS hostname (Traefik provides TLS) +- Python Trino client enforces HTTPS when authentication is used +- User impersonation via X-Trino-User header + +## Management + +### Upgrade Querybook + +```bash +just querybook::upgrade +``` + +Updates the Helm deployment with current configuration. + +### Uninstall + +```bash +# Keep PostgreSQL database +just querybook::uninstall false + +# Delete PostgreSQL database too +just querybook::uninstall true +``` + +## Troubleshooting + +### Check Pod Status + +```bash +kubectl get pods -n querybook +``` + +### WebSocket Connection Fails + +- Verify Traefik middleware exists: `kubectl get middleware querybook-headers -n querybook` +- Check WebSocket upgrade headers in middleware configuration +- Ensure Ingress annotation references middleware: `querybook-querybook-headers@kubernetescrd` + +### OAuth Login Fails + +- Verify Keycloak client exists: `just keycloak::list-clients` +- Check redirect URL: `https:///oauth2callback` +- Verify client secret matches: Compare Vault/K8s secret with Keycloak +- Check Keycloak is accessible from Querybook pods + +### Trino Connection Fails + +- **Error: "cannot use authentication with HTTP"** + - Must use external hostname with HTTPS: `trino://trino.example.com:443?SSL=true` + - Do NOT use internal service name (e.g., `trino.trino.svc.cluster.local:8080`) + - Python Trino client enforces HTTPS when authentication is used + +- **Error: "500 Internal Server Error"** + - Verify Trino is accessible via external hostname + - Check Trino admin password: `just trino::admin-password` + - Test Trino connection manually with curl + +- **Error: "Access Denied: User admin cannot impersonate user X"** + - Verify Trino access control is configured + - Check impersonation rules: `kubectl exec -n trino deployment/trino-coordinator -- cat /etc/trino/access-control/rules.json` + - Ensure admin can impersonate all users + +### Query Execution Stuck + +- Check worker pod logs: `just querybook::logs worker` +- Verify Redis is running: `kubectl get pods -n querybook | grep redis` +- Check Trino coordinator health: `kubectl get pods -n trino` + +### Database Connection Issues + +- Verify PostgreSQL cluster is running: `kubectl get cluster -n postgres` +- Check database exists: `just postgres::list-databases | grep querybook` +- Verify secret exists: `kubectl get secret querybook-config-secret -n querybook` + +## References + +- [Querybook Documentation](https://www.querybook.org/) +- [Querybook GitHub](https://github.com/pinterest/querybook) +- [Trino Integration](../trino/README.md) +- [Keycloak OAuth2](https://www.keycloak.org/docs/latest/securing_apps/#_oidc) diff --git a/querybook/custom-auth/keycloak_auth.py b/querybook/custom-auth/keycloak_auth.py new file mode 100644 index 0000000..1039003 --- /dev/null +++ b/querybook/custom-auth/keycloak_auth.py @@ -0,0 +1,46 @@ +""" +Keycloak OIDC authentication backend for Querybook +""" +from app.auth.oauth_auth import OAuthLoginManager, OAUTH_CALLBACK_PATH +from env import QuerybookSettings + + +class KeycloakLoginManager(OAuthLoginManager): + @property + def oauth_config(self): + return { + "callback_url": "{}{}".format( + QuerybookSettings.PUBLIC_URL, OAUTH_CALLBACK_PATH + ), + "client_id": QuerybookSettings.OAUTH_CLIENT_ID, + "client_secret": QuerybookSettings.OAUTH_CLIENT_SECRET, + "authorization_url": QuerybookSettings.OAUTH_AUTHORIZATION_URL, + "token_url": QuerybookSettings.OAUTH_TOKEN_URL, + "profile_url": QuerybookSettings.OAUTH_USER_PROFILE, + "scope": ["openid", "email", "profile"], + } + + def _parse_user_profile(self, resp): + """Parse standard OIDC UserInfo response from Keycloak""" + user = resp.json() + # Keycloak returns standard OIDC claims: + # - preferred_username: username + # - email: email address + # - name: full name (optional) + username = user.get("preferred_username") or user.get("email", "").split("@")[0] + email = user.get("email", "") + fullname = user.get("name", username) + return username, email, fullname + + +login_manager = KeycloakLoginManager() + +ignore_paths = [OAUTH_CALLBACK_PATH] + + +def init_app(app): + login_manager.init_app(app) + + +def login(request): + return login_manager.login(request) diff --git a/querybook/justfile b/querybook/justfile new file mode 100644 index 0000000..86dc89d --- /dev/null +++ b/querybook/justfile @@ -0,0 +1,327 @@ +set fallback := true + +export QUERYBOOK_NAMESPACE := env("QUERYBOOK_NAMESPACE", "querybook") +export QUERYBOOK_HOST := env("QUERYBOOK_HOST", "") +export QUERYBOOK_CHART_REPO := env("QUERYBOOK_CHART_REPO", "https://github.com/pinterest/querybook") +export QUERYBOOK_CHART_PATH := env("QUERYBOOK_CHART_PATH", "helm") +export EXTERNAL_SECRETS_NAMESPACE := env("EXTERNAL_SECRETS_NAMESPACE", "external-secrets") +export K8S_VAULT_NAMESPACE := env("K8S_VAULT_NAMESPACE", "vault") +export KEYCLOAK_REALM := env("KEYCLOAK_REALM", "buunstack") +export KEYCLOAK_HOST := env("KEYCLOAK_HOST", "") + +[private] +default: + @just --list --unsorted --list-submodules + +# Create Querybook namespace +create-namespace: + @kubectl get namespace ${QUERYBOOK_NAMESPACE} &>/dev/null || \ + kubectl create namespace ${QUERYBOOK_NAMESPACE} + +# Delete Querybook namespace +delete-namespace: + @kubectl delete namespace ${QUERYBOOK_NAMESPACE} --ignore-not-found + +# Clone Querybook Helm chart repository +clone-chart-repo: + #!/bin/bash + set -euo pipefail + if [ ! -d "querybook-repo" ]; then + echo "Cloning Querybook Helm chart repository..." + git clone --depth 1 ${QUERYBOOK_CHART_REPO} querybook-repo + else + echo "Querybook repository already exists. Pulling latest changes..." + cd querybook-repo && git pull + fi + +# Remove cloned chart repository +remove-chart-repo: + rm -rf querybook-repo + +# Create Keycloak client and OAuth secret for Querybook +create-keycloak-client: + #!/bin/bash + set -euo pipefail + while [ -z "${QUERYBOOK_HOST}" ]; do + QUERYBOOK_HOST=$( + gum input --prompt="Querybook host (FQDN): " --width=100 \ + --placeholder="e.g., querybook.example.com" + ) + done + + echo "Creating Keycloak client for Querybook..." + + # Delete existing client if present + just keycloak::delete-client ${KEYCLOAK_REALM} querybook || true + + # Generate client secret + CLIENT_SECRET=$(just utils::random-password) + + # Create 'querybook-admin' group if it doesn't exist + echo "Creating 'querybook-admin' group..." + just keycloak::create-group querybook-admin '' 'Querybook administrators' || echo "Group may already exist" + + # Create confidential client with client secret + # Uses standard OIDC scopes: openid, email, profile (no custom scopes needed) + just keycloak::create-client \ + realm=${KEYCLOAK_REALM} \ + client_id=querybook \ + redirect_url="https://${QUERYBOOK_HOST}/oauth2callback" \ + client_secret="${CLIENT_SECRET}" + + # Add groups mapper to include group membership in UserInfo + echo "Adding groups mapper to querybook client..." + just keycloak::add-groups-mapper querybook + + # Store client secret temporarily in Kubernetes Secret (always created) + kubectl delete secret querybook-oauth-temp -n ${QUERYBOOK_NAMESPACE} --ignore-not-found + kubectl create secret generic querybook-oauth-temp -n ${QUERYBOOK_NAMESPACE} \ + --from-literal=client_secret="${CLIENT_SECRET}" + + # Also store in Vault if available + if helm status vault -n ${K8S_VAULT_NAMESPACE} &>/dev/null; then + echo "Storing OAuth client secret in Vault..." + just vault::put querybook/oauth client_secret="${CLIENT_SECRET}" + fi + + echo "Keycloak client created successfully" + echo "Client ID: querybook" + echo "Scopes: openid, email, profile (standard OIDC scopes)" + echo "Redirect URI: https://${QUERYBOOK_HOST}/oauth2callback" + echo "" + echo "Admin Group: querybook-admin" + echo "To grant admin access, add users to 'querybook-admin' group:" + echo " just keycloak::add-user-to-group querybook-admin" + +# Delete Keycloak client +delete-keycloak-client: + #!/bin/bash + set -euo pipefail + echo "Deleting Keycloak client for Querybook..." + just keycloak::delete-client ${KEYCLOAK_REALM} querybook || true + echo "Deleting querybook-admin group..." + just keycloak::delete-group querybook-admin || true + kubectl delete secret querybook-oauth-temp -n ${QUERYBOOK_NAMESPACE} --ignore-not-found + +# Create Querybook secrets +create-secrets: + #!/bin/bash + set -euo pipefail + + # Generate Flask secret key + flask_secret=$(just utils::random-password) + + # Get PostgreSQL credentials + pg_host="postgres-cluster-rw.postgres" + pg_port="5432" + pg_user=$(just postgres::admin-username) + pg_password=$(just postgres::admin-password) + pg_database="querybook" + + # Build database connection string + database_conn="postgresql://${pg_user}:${pg_password}@${pg_host}:${pg_port}/${pg_database}" + + # Get OAuth client secret (created by create-keycloak-client) + # Try Vault first, fallback to Kubernetes Secret + if helm status vault -n ${K8S_VAULT_NAMESPACE} &>/dev/null && \ + just vault::get querybook/oauth client_secret &>/dev/null; then + oauth_client_secret=$(just vault::get querybook/oauth client_secret) + elif kubectl get secret querybook-oauth-temp -n ${QUERYBOOK_NAMESPACE} &>/dev/null; then + oauth_client_secret=$(kubectl get secret querybook-oauth-temp -n ${QUERYBOOK_NAMESPACE} \ + -o jsonpath='{.data.client_secret}' | base64 -d) + else + echo "Error: Cannot retrieve OAuth client secret. Please run 'just querybook::create-keycloak-client' first." + exit 1 + fi + + if helm status external-secrets -n ${EXTERNAL_SECRETS_NAMESPACE} &>/dev/null; then + echo "External Secrets Operator detected. Storing secrets in Vault..." + + just vault::put querybook/config \ + FLASK_SECRET_KEY="${flask_secret}" \ + DATABASE_CONN="${database_conn}" \ + REDIS_URL="redis://redis:6379/0" \ + ELASTICSEARCH_HOST="elasticsearch:9200" \ + OAUTH_CLIENT_SECRET="${oauth_client_secret}" + + kubectl delete secret querybook-secret -n ${QUERYBOOK_NAMESPACE} --ignore-not-found + kubectl delete externalsecret querybook-secret -n ${QUERYBOOK_NAMESPACE} --ignore-not-found + + gomplate -f querybook-config-external-secret.gomplate.yaml \ + -o querybook-config-external-secret.yaml + kubectl apply -f querybook-config-external-secret.yaml + + echo "Waiting for ExternalSecret to sync..." + kubectl wait --for=condition=Ready externalsecret/querybook-secret \ + -n ${QUERYBOOK_NAMESPACE} --timeout=60s + else + echo "External Secrets Operator not found. Creating secret directly..." + kubectl delete secret querybook-secret -n ${QUERYBOOK_NAMESPACE} --ignore-not-found + kubectl create secret generic querybook-secret -n ${QUERYBOOK_NAMESPACE} \ + --from-literal=FLASK_SECRET_KEY="${flask_secret}" \ + --from-literal=DATABASE_CONN="${database_conn}" \ + --from-literal=REDIS_URL="redis://redis:6379/0" \ + --from-literal=ELASTICSEARCH_HOST="elasticsearch:9200" \ + --from-literal=OAUTH_CLIENT_SECRET="${oauth_client_secret}" + + if helm status vault -n ${K8S_VAULT_NAMESPACE} &>/dev/null; then + just vault::put querybook/config \ + FLASK_SECRET_KEY="${flask_secret}" \ + DATABASE_CONN="${database_conn}" \ + REDIS_URL="redis://redis:6379/0" \ + ELASTICSEARCH_HOST="elasticsearch:9200" \ + OAUTH_CLIENT_SECRET="${oauth_client_secret}" + fi + fi + +# Delete Querybook secrets +delete-secrets: + @kubectl delete secret querybook-secret -n ${QUERYBOOK_NAMESPACE} --ignore-not-found + @kubectl delete externalsecret querybook-secret -n ${QUERYBOOK_NAMESPACE} --ignore-not-found + +# Create Keycloak auth ConfigMap +create-auth-configmap: + #!/bin/bash + set -euo pipefail + echo "Creating Keycloak auth ConfigMap..." + gomplate -f keycloak-auth-configmap.gomplate.yaml -o keycloak-auth-configmap.yaml + kubectl apply -f keycloak-auth-configmap.yaml + +# Create Traefik Middleware for WebSocket support +create-traefik-middleware: + #!/bin/bash + set -euo pipefail + echo "Creating Traefik Middleware for WebSocket support..." + gomplate -f traefik-middleware.gomplate.yaml -o traefik-middleware.yaml + kubectl apply -f traefik-middleware.yaml + +# Install Querybook +install: + #!/bin/bash + set -euo pipefail + while [ -z "${QUERYBOOK_HOST}" ]; do + QUERYBOOK_HOST=$( + gum input --prompt="Querybook host (FQDN): " --width=100 \ + --placeholder="e.g., querybook.example.com" + ) + done + + while [ -z "${KEYCLOAK_HOST}" ]; do + KEYCLOAK_HOST=$( + gum input --prompt="Keycloak host (FQDN): " --width=100 \ + --placeholder="e.g., auth.example.com" + ) + done + + just create-namespace + just postgres::create-db querybook + just create-keycloak-client + just create-secrets + just clone-chart-repo + + # Get OAuth client secret for gomplate template + # Try Vault first, fallback to Kubernetes Secret + if helm status vault -n ${K8S_VAULT_NAMESPACE} &>/dev/null && \ + just vault::get querybook/oauth client_secret &>/dev/null; then + export OAUTH_CLIENT_SECRET=$(just vault::get querybook/oauth client_secret) + elif kubectl get secret querybook-oauth-temp -n ${QUERYBOOK_NAMESPACE} &>/dev/null; then + export OAUTH_CLIENT_SECRET=$(kubectl get secret querybook-oauth-temp -n ${QUERYBOOK_NAMESPACE} \ + -o jsonpath='{.data.client_secret}' | base64 -d) + else + echo "Error: Cannot retrieve OAuth client secret. Please run 'just querybook::create-keycloak-client' first." + exit 1 + fi + + # Create Traefik Middleware (must exist before Helm install) + just create-traefik-middleware + + # Create Keycloak auth ConfigMap (must exist before Helm install) + just create-auth-configmap + + gomplate -f querybook-values.gomplate.yaml -o querybook-values.yaml + + helm upgrade --cleanup-on-fail --install querybook ./querybook-repo/${QUERYBOOK_CHART_PATH} \ + -n ${QUERYBOOK_NAMESPACE} --wait \ + -f querybook-values.yaml + + echo "" + echo "Querybook installed successfully!" + echo "Access URL: https://${QUERYBOOK_HOST}" + echo "" + echo "OAuth Configuration:" + echo " Provider: Keycloak (custom OIDC backend)" + echo " Realm: ${KEYCLOAK_REALM}" + echo " Scopes: openid, email, profile" + echo " Authorization URL: https://${KEYCLOAK_HOST}/realms/${KEYCLOAK_REALM}/protocol/openid-connect/auth" + echo "" + echo "Admin Access:" + echo " To grant admin access, add users to 'querybook-admin' group:" + echo " just keycloak::add-user-to-group querybook-admin" + echo "" + +# Upgrade Querybook +upgrade: + #!/bin/bash + set -euo pipefail + while [ -z "${QUERYBOOK_HOST}" ]; do + QUERYBOOK_HOST=$( + gum input --prompt="Querybook host (FQDN): " --width=100 \ + --placeholder="e.g., querybook.example.com" + ) + done + + while [ -z "${KEYCLOAK_HOST}" ]; do + KEYCLOAK_HOST=$( + gum input --prompt="Keycloak host (FQDN): " --width=100 \ + --placeholder="e.g., auth.example.com" + ) + done + + # Get OAuth client secret for gomplate template + # Try Vault first, fallback to Kubernetes Secret + if helm status vault -n ${K8S_VAULT_NAMESPACE} &>/dev/null && \ + just vault::get querybook/oauth client_secret &>/dev/null; then + export OAUTH_CLIENT_SECRET=$(just vault::get querybook/oauth client_secret) + elif kubectl get secret querybook-oauth-temp -n ${QUERYBOOK_NAMESPACE} &>/dev/null; then + export OAUTH_CLIENT_SECRET=$(kubectl get secret querybook-oauth-temp -n ${QUERYBOOK_NAMESPACE} \ + -o jsonpath='{.data.client_secret}' | base64 -d) + else + echo "Error: Cannot retrieve OAuth client secret. Please run 'just querybook::create-keycloak-client' first." + exit 1 + fi + + echo "Upgrading Querybook..." + + # Update Traefik Middleware (must exist before Helm upgrade) + just create-traefik-middleware + + # Update Keycloak auth ConfigMap (must exist before Helm upgrade) + just create-auth-configmap + + gomplate -f querybook-values.gomplate.yaml -o querybook-values.yaml + helm upgrade querybook ./querybook-repo/${QUERYBOOK_CHART_PATH} \ + -n ${QUERYBOOK_NAMESPACE} --wait \ + -f querybook-values.yaml + + echo "Querybook upgraded successfully" + +# Uninstall Querybook +uninstall delete-db='true': + #!/bin/bash + set -euo pipefail + helm uninstall querybook -n ${QUERYBOOK_NAMESPACE} --ignore-not-found --wait + kubectl delete configmap querybook-keycloak-auth -n ${QUERYBOOK_NAMESPACE} --ignore-not-found + kubectl delete middleware querybook-headers -n ${QUERYBOOK_NAMESPACE} --ignore-not-found + kubectl delete serverstransport querybook-transport -n ${QUERYBOOK_NAMESPACE} --ignore-not-found + just delete-secrets + just delete-keycloak-client + just delete-namespace + if [ "{{ delete-db }}" = "true" ]; then + just postgres::delete-db querybook + fi + + # Clean up Vault entries if present + if helm status vault -n ${K8S_VAULT_NAMESPACE} &>/dev/null; then + just vault::delete querybook/config || true + just vault::delete querybook/oauth || true + fi diff --git a/querybook/keycloak-auth-configmap.gomplate.yaml b/querybook/keycloak-auth-configmap.gomplate.yaml new file mode 100644 index 0000000..ea7cc7f --- /dev/null +++ b/querybook/keycloak-auth-configmap.gomplate.yaml @@ -0,0 +1,84 @@ +apiVersion: v1 +kind: ConfigMap +metadata: + name: querybook-keycloak-auth + namespace: {{ .Env.QUERYBOOK_NAMESPACE }} +data: + keycloak_auth.py: | + """ + Keycloak OIDC authentication backend for Querybook + """ + from app.auth.oauth_auth import OAuthLoginManager, OAUTH_CALLBACK_PATH + from env import QuerybookSettings + from lib.logger import get_logger + from logic.user import get_user_by_name, create_user + + LOG = get_logger(__file__) + + + class KeycloakLoginManager(OAuthLoginManager): + def __init__(self): + super().__init__() + self._current_user_groups = [] + + @property + def oauth_config(self): + return { + "callback_url": "{}{}".format( + QuerybookSettings.PUBLIC_URL, OAUTH_CALLBACK_PATH + ), + "client_id": QuerybookSettings.OAUTH_CLIENT_ID, + "client_secret": QuerybookSettings.OAUTH_CLIENT_SECRET, + "authorization_url": QuerybookSettings.OAUTH_AUTHORIZATION_URL, + "token_url": QuerybookSettings.OAUTH_TOKEN_URL, + "profile_url": QuerybookSettings.OAUTH_USER_PROFILE, + "scope": ["openid", "email", "profile"], + } + + def _parse_user_profile(self, resp): + """Parse standard OIDC UserInfo response from Keycloak""" + user = resp.json() + username = user.get("preferred_username") or user.get("email", "").split("@")[0] + email = user.get("email", "") + + # Store groups for role synchronization + self._current_user_groups = user.get("groups", []) + LOG.info(f"User {username} groups: {self._current_user_groups}") + + return username, email + + def login_user(self, username, email, session=None): + """Override login_user - using default Querybook behavior + + Note: Querybook automatically makes the first user an admin via + create_admin_when_no_admin() function. Additional users can be + granted admin access through Querybook's UI or database. + """ + from .utils import AuthenticationError + + if not username or not isinstance(username, str): + raise AuthenticationError("Please provide a valid username") + + user = get_user_by_name(username, session=session) + if not user: + user = create_user( + username=username, fullname=username, email=email, session=session + ) + + # Log group membership for debugging + LOG.info(f"User {username} Keycloak groups: {self._current_user_groups}") + + return user + + + login_manager = KeycloakLoginManager() + + ignore_paths = [OAUTH_CALLBACK_PATH] + + + def init_app(app): + login_manager.init_app(app) + + + def login(request): + return login_manager.login(request) diff --git a/querybook/querybook-config-external-secret.gomplate.yaml b/querybook/querybook-config-external-secret.gomplate.yaml new file mode 100644 index 0000000..3c9f019 --- /dev/null +++ b/querybook/querybook-config-external-secret.gomplate.yaml @@ -0,0 +1,34 @@ +apiVersion: external-secrets.io/v1 +kind: ExternalSecret +metadata: + name: querybook-secret + namespace: {{ .Env.QUERYBOOK_NAMESPACE }} +spec: + refreshInterval: 1h + secretStoreRef: + name: vault-secret-store + kind: ClusterSecretStore + target: + name: querybook-secret + creationPolicy: Owner + data: + - secretKey: FLASK_SECRET_KEY + remoteRef: + key: querybook/config + property: FLASK_SECRET_KEY + - secretKey: DATABASE_CONN + remoteRef: + key: querybook/config + property: DATABASE_CONN + - secretKey: REDIS_URL + remoteRef: + key: querybook/config + property: REDIS_URL + - secretKey: ELASTICSEARCH_HOST + remoteRef: + key: querybook/config + property: ELASTICSEARCH_HOST + - secretKey: OAUTH_CLIENT_SECRET + remoteRef: + key: querybook/config + property: OAUTH_CLIENT_SECRET diff --git a/querybook/querybook-values.gomplate.yaml b/querybook/querybook-values.gomplate.yaml new file mode 100644 index 0000000..4d04bd3 --- /dev/null +++ b/querybook/querybook-values.gomplate.yaml @@ -0,0 +1,187 @@ +# Querybook Helm Chart Values +# https://github.com/pinterest/querybook/tree/master/helm + +# Worker configuration +worker: + replicaCount: 1 + name: worker + image: + repository: querybook/querybook + pullPolicy: IfNotPresent + tag: latest + resources: + requests: + memory: 1Gi + cpu: 700m + limits: + memory: 2Gi + cpu: 1 + +# Scheduler configuration +scheduler: + replicaCount: 1 + name: scheduler + image: + repository: querybook/querybook + pullPolicy: IfNotPresent + tag: latest + resources: + requests: + memory: 200Mi + cpu: 100m + limits: + memory: 300Mi + cpu: 200m + +# Web server configuration +web: + replicaCount: 1 + name: web + image: + repository: querybook/querybook + pullPolicy: IfNotPresent + tag: latest + service: + serviceType: ClusterIP + servicePort: 80 + containerPort: 10001 + resources: + requests: + memory: 1Gi + cpu: 500m + limits: + memory: 2Gi + cpu: 1 + + # Custom initContainer to inject Keycloak auth backend + initContainers: + - name: copy-keycloak-auth + image: busybox:latest + command: + - sh + - -c + - cp /config/keycloak_auth.py /auth/keycloak_auth.py && chmod 644 /auth/keycloak_auth.py + volumeMounts: + - name: keycloak-auth-config + mountPath: /config + - name: auth-volume + mountPath: /auth + + # Volume mounts for main container + volumeMounts: + - name: auth-volume + mountPath: /opt/querybook/querybook/server/app/auth/keycloak_auth.py + subPath: keycloak_auth.py + + # Volumes + volumes: + - name: keycloak-auth-config + configMap: + name: querybook-keycloak-auth + - name: auth-volume + emptyDir: {} + +# Use external PostgreSQL (buun-stack PostgreSQL cluster) +mysql: + enabled: false + +# Redis configuration (use Helm chart's embedded Redis) +redis: + enabled: true + replicaCount: 1 + name: redis + image: + repository: redis + pullPolicy: IfNotPresent + tag: "7.2" + service: + serviceType: ClusterIP + servicePort: 6379 + resources: + requests: + memory: 512Mi + cpu: 200m + limits: + memory: 1Gi + cpu: 500m + +# Elasticsearch configuration (use Helm chart's embedded Elasticsearch) +elasticsearch: + enabled: true + replicaCount: 1 + name: elasticsearch + image: + repository: docker.elastic.co/elasticsearch/elasticsearch + pullPolicy: IfNotPresent + tag: "7.17.16" + extraEnvs: + - name: ES_JAVA_OPTS + value: -Xms1g -Xmx1g + - name: bootstrap.memory_lock + value: 'false' + - name: cluster.name + value: querybook-cluster + - name: discovery.type + value: single-node + service: + serviceType: ClusterIP + servicePort: 9200 + resources: + requests: + memory: 2Gi + cpu: 500m + limits: + memory: 3Gi + cpu: 1 + +# Ingress configuration +ingress: + enabled: true + ingressClassName: traefik + annotations: + kubernetes.io/ingress.class: traefik + traefik.ingress.kubernetes.io/router.entrypoints: websecure + # WebSocket support - apply middleware for X-Forwarded-Proto header + traefik.ingress.kubernetes.io/router.middlewares: querybook-querybook-headers@kubernetescrd + # Sticky sessions for WebSocket connections + traefik.ingress.kubernetes.io/service.sticky.cookie: "true" + traefik.ingress.kubernetes.io/service.sticky.cookie.name: querybook-session + # Increase timeouts for WebSocket connections (in seconds) + traefik.ingress.kubernetes.io/service.serversTransport: querybook-transport@kubernetescrd + path: / + pathType: Prefix + hosts: + - {{ .Env.QUERYBOOK_HOST }} + tls: + - hosts: + - {{ .Env.QUERYBOOK_HOST }} + +# Querybook environment variables +extraEnv: + # Public URL (required for OAuth) + PUBLIC_URL: https://{{ .Env.QUERYBOOK_HOST }} + + # WebSocket CORS origins (required for socket.io to accept connections) + WS_CORS_ALLOWED_ORIGINS: '["https://{{ .Env.QUERYBOOK_HOST }}"]' + + # Authentication backend (custom Keycloak OIDC implementation) + AUTH_BACKEND: app.auth.keycloak_auth + + # OAuth configuration for Keycloak + OAUTH_CLIENT_ID: querybook + OAUTH_CLIENT_SECRET: {{ .Env.OAUTH_CLIENT_SECRET }} + OAUTH_AUTHORIZATION_URL: https://{{ .Env.KEYCLOAK_HOST }}/realms/{{ .Env.KEYCLOAK_REALM }}/protocol/openid-connect/auth + OAUTH_TOKEN_URL: https://{{ .Env.KEYCLOAK_HOST }}/realms/{{ .Env.KEYCLOAK_REALM }}/protocol/openid-connect/token + OAUTH_USER_PROFILE: https://{{ .Env.KEYCLOAK_HOST }}/realms/{{ .Env.KEYCLOAK_REALM }}/protocol/openid-connect/userinfo + + # Session configuration + LOGS_OUT_AFTER: "0" # Never expire (re-login on browser close) + +# Use existing secret for Flask, database, Redis, and Elasticsearch configuration +existingSecret: querybook-secret + +# Node selector, affinity, and tolerations +nodeSelector: {} +affinity: {} +tolerations: [] +podAnnotations: {} diff --git a/querybook/traefik-middleware.gomplate.yaml b/querybook/traefik-middleware.gomplate.yaml new file mode 100644 index 0000000..af558e5 --- /dev/null +++ b/querybook/traefik-middleware.gomplate.yaml @@ -0,0 +1,25 @@ +apiVersion: traefik.io/v1alpha1 +kind: Middleware +metadata: + name: querybook-headers + namespace: {{ .Env.QUERYBOOK_NAMESPACE }} +spec: + headers: + customRequestHeaders: + X-Forwarded-Proto: "https" + customResponseHeaders: + X-Forwarded-Proto: "https" +--- +apiVersion: traefik.io/v1alpha1 +kind: ServersTransport +metadata: + name: querybook-transport + namespace: {{ .Env.QUERYBOOK_NAMESPACE }} +spec: + serverName: "" + insecureSkipVerify: false + # Timeouts for WebSocket connections + forwardingTimeouts: + dialTimeout: 30s + responseHeaderTimeout: 0s # No timeout for response headers (needed for WebSocket) + idleConnTimeout: 0s # No timeout for idle connections (needed for WebSocket)