Apache Superset
Modern, enterprise-ready business intelligence web application with Keycloak OAuth authentication and Trino integration.
Overview
This module deploys Apache Superset using the official Helm chart with:
- Keycloak OAuth authentication for user login
- Trino integration for data lake analytics
- PostgreSQL backend for metadata storage (dedicated user)
- Redis for caching and Celery task queue
- HTTPS reverse proxy support via Traefik
- Group-based access control via Keycloak groups
Prerequisites
- Kubernetes cluster (k3s)
- Keycloak installed and configured
- PostgreSQL cluster (CloudNativePG)
- Trino with password authentication
- External Secrets Operator (optional, for Vault integration)
Installation
Basic Installation
just superset::install
You will be prompted for:
- Superset host (FQDN): e.g.,
superset.example.com - Keycloak host (FQDN): e.g.,
auth.example.com
What Gets Installed
- Superset web application
- Superset worker (Celery for async tasks)
- PostgreSQL database and user for Superset metadata
- Redis for caching and Celery broker
- Keycloak OAuth client (confidential client)
superset-admingroup in Keycloak for admin access
Configuration
Environment variables (set in .env.local or override):
SUPERSET_NAMESPACE=superset # Kubernetes namespace
SUPERSET_CHART_VERSION=0.15.0 # Helm chart version
SUPERSET_HOST=superset.example.com # External hostname
KEYCLOAK_HOST=auth.example.com # Keycloak hostname
KEYCLOAK_REALM=buunstack # Keycloak realm name
Architecture Notes
Superset 5.0+ Changes:
- Uses
uvinstead ofpipfor package management - Lean base image without database drivers (installed via bootstrapScript)
- Required packages:
psycopg2-binary,sqlalchemy-trino,authlib
Redis Image:
- Uses
bitnami/redis:latestdue to Bitnami's August 2025 strategy change - Community users can only use
latesttag (no version pinning) - For production version pinning, consider using official Redis image separately
Usage
Access Superset
- Navigate to
https://your-superset-host/ - Click "Sign in with Keycloak" to authenticate
- Create charts and dashboards
Grant Admin Access
Add users to the superset-admin group:
just keycloak::add-user-to-group <username> superset-admin
Admin users have full privileges including:
- Database connection management
- User and role management
- All chart and dashboard operations
Configure Database Connections
Prerequisites: User must be in superset-admin group
Trino Connection
-
Log in as an admin user
-
Navigate to Settings → Database Connections → + Database
-
Select Trino from supported databases
-
Configure connection:
DISPLAY NAME: Trino Iceberg (or any name) SQLALCHEMY URI: trino://admin:<password>@trino.example.com/icebergImportant Notes:
- Must use HTTPS hostname (e.g.,
trino.example.com) - Cannot use internal service (e.g.,
trino.trino:8080) - Trino password authentication requires HTTPS connection
- Get admin password:
just trino::admin-password
- Must use HTTPS hostname (e.g.,
-
Click TEST CONNECTION to verify
-
Click CONNECT to save
Available Trino Catalogs:
iceberg- Iceberg data lakehouse (Lakekeeper)postgresql- PostgreSQL connectortpch- TPC-H benchmark data
Example URIs:
trino://admin:<password>@trino.example.com/iceberg
trino://admin:<password>@trino.example.com/postgresql
trino://admin:<password>@trino.example.com/tpch
Other Database Connections
Superset supports many databases. Examples:
PostgreSQL:
postgresql://user:password@postgres-cluster-rw.postgres:5432/database
MySQL:
mysql://user:password@mysql-host:3306/database
Create Charts and Dashboards
- Navigate to Charts → + Chart
- Select dataset (from configured database)
- Choose visualization type
- Configure chart settings
- Save chart
- Add to dashboard
Features
- Rich Visualizations: 40+ chart types including tables, line charts, bar charts, maps, etc.
- SQL Lab: Interactive SQL editor with query history
- No-code Chart Builder: Drag-and-drop interface for creating charts
- Dashboard Composer: Create interactive dashboards with filters
- Row-level Security: Control data access per user/role
- Alerting & Reports: Schedule email reports and alerts
- Semantic Layer: Define metrics and dimensions for consistent analysis
Architecture
External Users
↓
Cloudflare Tunnel (HTTPS)
↓
Traefik Ingress (HTTPS)
↓
Superset Web (HTTP inside cluster)
├─ OAuth → Keycloak (authentication)
├─ PostgreSQL (metadata: charts, dashboards, users)
├─ Redis (cache, Celery broker)
└─ Celery Worker (async tasks)
↓
Data Sources (via HTTPS)
├─ Trino (analytics)
├─ PostgreSQL (operational data)
└─ Others
Key Components:
- Proxy Fix:
ENABLE_PROXY_FIX = Truefor correct HTTPS redirect URLs behind Traefik - OAuth Integration: Uses Keycloak OIDC discovery (
.well-known/openid-configuration) - Database Connections: Must use external HTTPS hostnames for authenticated connections
- Role Mapping: Keycloak groups map to Superset roles (Admin, Alpha, Gamma)
Security
Pod Security Standards
This deployment applies Kubernetes Pod Security Standards at the baseline level.
Security Configuration
Namespace Level:
pod-security.kubernetes.io/enforce=baseline
Container Security Context:
runAsUser: 1000(non-root user)runAsNonRoot: trueallowPrivilegeEscalation: falsecapabilities: drop ALLseccompProfile: RuntimeDefaultreadOnlyRootFilesystem: false(required for Python package installation)
Init Container (copy-venv):
- Purpose: Copy Python virtual environment to writable emptyDir volume
runAsUser: 0(root) - required forchownoperation- Runs before main container to prepare writable
.venvdirectory
Volume Configuration:
Two emptyDir volumes are mounted for write operations:
/app/.venv- Python virtual environment (copied from image and made writable)/app/superset_home/.cache- uv package manager cache
Why Baseline Instead of Restricted?
The baseline level is required because:
-
Init container needs root: The
copy-venvinitContainer must run as root (uid=0) to:- Copy Python virtual environment from read-only image layer
- Change ownership to uid=1000 for main container
- Enable bootstrap script to install additional packages
-
Image architecture limitation: The official Apache Superset image:
- Installs Python packages as root during build →
/app/.venvowned by root - Runs application as uid=1000
- Does not provide writable
.venvfor runtime package installation
- Installs Python packages as root during build →
-
Restricted would require:
- All containers (including init) to run as non-root
- Custom Docker image with pre-chowned directories
- Or forgoing bootstrap script package installation
Security Impact:
- Main application containers run as non-root (uid=1000) ✓
- Init container runs as root (uid=0) for ~2 seconds during pod startup
- Application runtime is non-root - the attack surface is minimal
- All other security controls (capabilities drop, seccomp, etc.) are applied
Achieving Restricted Level (Optional)
To deploy with restricted Pod Security Standards, create a custom Docker image:
FROM apachesuperset.docker.scarf.sh/apache/superset:5.0.0
# Switch to root to install packages and fix permissions
USER root
# Install required packages into the existing venv
RUN . /app/.venv/bin/activate && \
uv pip install psycopg2-binary sqlalchemy-trino authlib
# Change ownership to superset user (uid=1000)
RUN chown -R superset:superset /app/.venv
# Switch back to superset user
USER superset
Changes Required:
-
Build and push custom image to your registry
-
Update
superset-values.gomplate.yaml:- Change
image.repositoryto your custom image - Remove
extraVolumesandextraVolumeMounts(emptyDir no longer needed) - Remove
initContainerssections frominit,supersetNode,supersetWorker - Add
runAsNonRoot: trueto Pod-levelpodSecurityContext - Remove
bootstrapScript(packages already installed in image)
- Change
-
Update namespace label to
restricted:kubectl label namespace superset pod-security.kubernetes.io/enforce=restricted --overwrite
Trade-offs:
- Pros: Strictest security posture, all containers run as non-root
- Cons: Custom image maintenance required (rebuild on Superset version updates)
- Current approach: Uses official images with minimal customization via bootstrap script
Authentication
User Login (OAuth)
- Users authenticate via Keycloak
- Standard OIDC flow with Authorization Code grant
- Group membership included in UserInfo endpoint response
- Roles synced at each login (
AUTH_ROLES_SYNC_AT_LOGIN = True)
Role Mapping
Keycloak groups automatically map to Superset roles:
AUTH_ROLES_MAPPING = {
"superset-admin": ["Admin"], # Full privileges
"Alpha": ["Alpha"], # Create charts/dashboards
"Gamma": ["Gamma"], # View only
}
Default Role: New users are assigned Gamma role by default
Access Levels
- Admin: Full access to all features (requires
superset-admingroup) - Alpha: Create and edit charts/dashboards
- Gamma: View charts and dashboards only
Management
Upgrade Superset
just superset::upgrade
Updates the Helm deployment with current configuration.
Uninstall
# Keep PostgreSQL database
just superset::uninstall false
# Delete PostgreSQL database and user
just superset::uninstall true
Troubleshooting
Check Pod Status
kubectl get pods -n superset
Expected pods:
superset-*- Main application (1 replica)superset-worker-*- Celery worker (1 replica)superset-redis-master-*- Redis cachesuperset-init-db-*- Database initialization (Completed)
OAuth Login Fails with "Invalid parameter: redirect_uri"
Error: Redirect URI uses http:// instead of https://
Solution: Ensure proxy configuration is enabled in configOverrides:
ENABLE_PROXY_FIX = True
PREFERRED_URL_SCHEME = "https"
OAuth Login Fails with "The request to sign in was denied"
Error: Missing "jwks_uri" in metadata
Solution: Ensure server_metadata_url is configured in OAuth provider:
"server_metadata_url": f"https://{KEYCLOAK_HOST}/realms/{REALM}/.well-known/openid-configuration"
Database Connection Test Fails
Trino: "Password not allowed for insecure authentication"
- Must use external HTTPS hostname (e.g.,
trino.example.com) - Cannot use internal service name (e.g.,
trino.trino:8080) - Trino enforces HTTPS for password authentication
Trino: "error 401: Basic authentication required"
- Missing username in SQLAlchemy URI
- Format:
trino://username:password@host:port/catalog
Database Connection Not Available
- Only users in
superset-adminKeycloak group can add databases - Add user to group:
just keycloak::add-user-to-group <user> superset-admin - Logout and login again to sync roles
Worker Pod Crashes
Check worker logs:
kubectl logs -n superset deployment/superset-worker
Common issues:
- Redis connection failed (check Redis pod status)
- PostgreSQL connection failed (check database credentials)
- Missing Python packages (check bootstrapScript execution)
Package Installation Issues
Superset 5.0+ uses uv for package management. Check bootstrap logs:
kubectl logs -n superset deployment/superset -c superset | grep "uv pip install"
Expected packages:
psycopg2-binary- PostgreSQL driversqlalchemy-trino- Trino driverauthlib- OAuth library
Chart/Dashboard Not Loading
- Check browser console for errors
- Verify database connection is active: Settings → Database Connections
- Test query in SQL Lab first
- Check Superset logs for errors
"Unable to migrate query editor state to backend" Error
Symptom: Repeated error message in SQL Lab:
Unable to migrate query editor state to backend. Superset will retry later.
Please contact your administrator if this problem persists.
Root Cause: Known Apache Superset bug (#30351, #33423) where /tabstateview/ endpoint returns HTTP 400 errors. Multiple underlying causes:
- Missing
dbIdin query editor state (KeyError) - Foreign key constraint violations in
tab_statetable - Missing PostgreSQL development tools in container images
Solution: Disable SQL Lab backend persistence in configOverrides:
# Disable SQL Lab backend persistence to avoid tab state migration errors
SQLLAB_BACKEND_PERSISTENCE = False
Impact:
- Query editor state stored in browser local storage only (not in database)
- Browser cache clear may lose unsaved queries
- Use "Saved Queries" feature for important queries
- This configuration is already applied in this deployment