Querybook
Pinterest's big data querying UI with notebook interface, Keycloak OAuth authentication, and Trino integration.
Overview
This module deploys Querybook using the official Helm chart from Pinterest with:
- Keycloak OAuth2 authentication for user login
- Trino integration with user impersonation for query attribution
- PostgreSQL backend for metadata storage
- Redis for caching and session management
- Traefik integration with WebSocket support for real-time query execution
- Group-based admin access via Keycloak groups
Prerequisites
- Kubernetes cluster (k3s)
- Keycloak installed and configured
- PostgreSQL cluster (CloudNativePG)
- Trino with access control configured
- External Secrets Operator (optional, for Vault integration)
Installation
Basic Installation
just querybook::install
You will be prompted for:
- Querybook host (FQDN): e.g.,
querybook.example.com - Keycloak host (FQDN): e.g.,
auth.example.com
What Gets Installed
- Querybook web service
- Querybook scheduler (background jobs)
- Querybook workers (query execution)
- PostgreSQL database for Querybook metadata
- Redis for caching and sessions
- Keycloak OAuth2 client (confidential client)
querybook-admingroup in Keycloak for admin access- Traefik Middleware for WebSocket and header forwarding
Configuration
Environment variables (set in .env.local or override):
QUERYBOOK_NAMESPACE=querybook # Kubernetes namespace
QUERYBOOK_HOST=querybook.example.com # External hostname
KEYCLOAK_HOST=auth.example.com # Keycloak hostname
KEYCLOAK_REALM=buunstack # Keycloak realm name
Usage
Access Querybook
- Navigate to
https://your-querybook-host/ - Click "Login with OAuth" to authenticate with Keycloak
- Create datadocs (notebooks) and execute queries
Grant Admin Access
Add users to the querybook-admin group:
just keycloak::add-user-to-group <username> querybook-admin
Admin users can:
- Manage query engines
- Configure data sources
- Manage user permissions
- View all datadocs
Configure Trino Query Engine
-
Log in as an admin user
-
Navigate to Admin → Query Engines
-
Click "Add Query Engine"
-
Configure:
Name: Trino Language: Trino Environment: production (or your preferred environment name) -
Navigate to Admin → Environments → [your environment]
-
Add new query engine connection:
Connection String: trino://trino.example.com:443?SSL=true Username: admin Password: [from just trino::admin-password] -
Optional: Configure additional connection parameters:
- Catalog: Specify default catalog (e.g.,
postgresqloriceberg) - Schema: Specify default schema
- Proxy_user_id: Leave empty or set to enable user impersonation
- Catalog: Specify default catalog (e.g.,
User Impersonation
Querybook connects to Trino as admin but executes queries as the logged-in user via Trino's impersonation feature. This provides:
- Query Attribution: Queries are attributed to the actual user, not the admin account
- Audit Logging: Trino logs show the real user who executed each query
- Access Control: Future per-user access policies can be enforced
How it Works:
- User logs into Querybook with Keycloak
- Querybook connects to Trino using admin credentials
- Querybook sends queries with
X-Trino-User: <username>header - Trino impersonates the user (allowed by access control rules)
- Query runs as if executed by the actual user
Architecture
External Users
↓
Cloudflare Tunnel (HTTPS)
↓
Traefik Ingress (HTTPS)
├─ Traefik Middleware (X-Forwarded-*, WebSocket upgrade)
└─ Backend: HTTP
↓
Querybook Web
├─ OAuth2 → Keycloak (authentication)
├─ PostgreSQL (metadata)
├─ Redis (cache/sessions)
└─ WebSocket (real-time query updates)
↓
Querybook Workers
↓
Trino (HTTPS via external hostname)
└─ Password auth + User impersonation
Key Components:
- Traefik Middleware: Handles WebSocket upgrade headers and X-Forwarded-* headers
- OAuth2 Integration: Uses standard OIDC scopes (openid, email, profile) with groups mapper
- Trino Connection: Must use external HTTPS hostname (not internal service name)
- User Impersonation: Admin credentials with X-Trino-User header for query attribution
Authentication
User Login (OAuth2)
- Users authenticate via Keycloak
- Standard OIDC flow with Authorization Code grant
- Group membership included in UserInfo endpoint response
- Session stored in Redis
Admin Access
- Controlled by Keycloak group membership
- Users in
querybook-admingroup have full admin privileges - Regular users can create and manage their own datadocs
Trino Connection
- Uses password authentication (admin user)
- Connects via external HTTPS hostname (Traefik provides TLS)
- Python Trino client enforces HTTPS when authentication is used
- User impersonation via X-Trino-User header
Management
Upgrade Querybook
just querybook::upgrade
Updates the Helm deployment with current configuration.
Uninstall
# Keep PostgreSQL database
just querybook::uninstall false
# Delete PostgreSQL database too
just querybook::uninstall true
Troubleshooting
Check Pod Status
kubectl get pods -n querybook
WebSocket Connection Fails
- Verify Traefik middleware exists:
kubectl get middleware querybook-headers -n querybook - Check WebSocket upgrade headers in middleware configuration
- Ensure Ingress annotation references middleware:
querybook-querybook-headers@kubernetescrd
OAuth Login Fails
- Verify Keycloak client exists:
just keycloak::list-clients - Check redirect URL:
https://<querybook-host>/oauth2callback - Verify client secret matches: Compare Vault/K8s secret with Keycloak
- Check Keycloak is accessible from Querybook pods
Trino Connection Fails
-
Error: "cannot use authentication with HTTP"
- Must use external hostname with HTTPS:
trino://trino.example.com:443?SSL=true - Do NOT use internal service name (e.g.,
trino.trino.svc.cluster.local:8080) - Python Trino client enforces HTTPS when authentication is used
- Must use external hostname with HTTPS:
-
Error: "500 Internal Server Error"
- Verify Trino is accessible via external hostname
- Check Trino admin password:
just trino::admin-password - Test Trino connection manually with curl
-
Error: "Access Denied: User admin cannot impersonate user X"
- Verify Trino access control is configured
- Check impersonation rules:
kubectl exec -n trino deployment/trino-coordinator -- cat /etc/trino/access-control/rules.json - Ensure admin can impersonate all users
Query Execution Stuck
- Check worker pod logs:
just querybook::logs worker - Verify Redis is running:
kubectl get pods -n querybook | grep redis - Check Trino coordinator health:
kubectl get pods -n trino
Database Connection Issues
- Verify PostgreSQL cluster is running:
kubectl get cluster -n postgres - Check database exists:
just postgres::list-databases | grep querybook - Verify secret exists:
kubectl get secret querybook-config-secret -n querybook