475 lines
14 KiB
Markdown
475 lines
14 KiB
Markdown
# Apache Superset
|
|
|
|
Modern, enterprise-ready business intelligence web application with Keycloak OAuth authentication and Trino integration.
|
|
|
|
## Overview
|
|
|
|
This module deploys Apache Superset using the official Helm chart with:
|
|
|
|
- **Keycloak OAuth authentication** for user login
|
|
- **Trino integration** for data lake analytics
|
|
- **PostgreSQL backend** for metadata storage (dedicated user)
|
|
- **Redis** for caching and Celery task queue
|
|
- **HTTPS reverse proxy support** via Traefik
|
|
- **Group-based access control** via Keycloak groups
|
|
|
|
## Prerequisites
|
|
|
|
- Kubernetes cluster (k3s)
|
|
- Keycloak installed and configured
|
|
- PostgreSQL cluster (CloudNativePG)
|
|
- Trino with password authentication
|
|
- External Secrets Operator (optional, for Vault integration)
|
|
|
|
## Installation
|
|
|
|
### Basic Installation
|
|
|
|
```bash
|
|
just superset::install
|
|
```
|
|
|
|
You will be prompted for:
|
|
|
|
1. **Superset host (FQDN)**: e.g., `superset.example.com`
|
|
2. **Keycloak host (FQDN)**: e.g., `auth.example.com`
|
|
|
|
### What Gets Installed
|
|
|
|
- Superset web application
|
|
- Superset worker (Celery for async tasks)
|
|
- PostgreSQL database and user for Superset metadata
|
|
- Redis for caching and Celery broker
|
|
- Keycloak OAuth client (confidential client)
|
|
- `superset-admin` group in Keycloak for admin access
|
|
|
|
## Configuration
|
|
|
|
Environment variables (set in `.env.local` or override):
|
|
|
|
```bash
|
|
SUPERSET_NAMESPACE=superset # Kubernetes namespace
|
|
SUPERSET_CHART_VERSION=0.15.0 # Helm chart version
|
|
SUPERSET_HOST=superset.example.com # External hostname
|
|
KEYCLOAK_HOST=auth.example.com # Keycloak hostname
|
|
KEYCLOAK_REALM=buunstack # Keycloak realm name
|
|
```
|
|
|
|
### Architecture Notes
|
|
|
|
**Superset 5.0+ Changes**:
|
|
|
|
- Uses `uv` instead of `pip` for package management
|
|
- Lean base image without database drivers (installed via bootstrapScript)
|
|
- Required packages: `psycopg2-binary`, `sqlalchemy-trino`, `authlib`
|
|
|
|
**Redis Image**:
|
|
|
|
- Uses `bitnami/redis:latest` due to Bitnami's August 2025 strategy change
|
|
- Community users can only use `latest` tag (no version pinning)
|
|
- For production version pinning, consider using official Redis image separately
|
|
|
|
## Usage
|
|
|
|
### Access Superset
|
|
|
|
1. Navigate to `https://your-superset-host/`
|
|
2. Click "Sign in with Keycloak" to authenticate
|
|
3. Create charts and dashboards
|
|
|
|
### Grant Admin Access
|
|
|
|
Add users to the `superset-admin` group:
|
|
|
|
```bash
|
|
just keycloak::add-user-to-group <username> superset-admin
|
|
```
|
|
|
|
Admin users have full privileges including:
|
|
|
|
- Database connection management
|
|
- User and role management
|
|
- All chart and dashboard operations
|
|
|
|
### Configure Database Connections
|
|
|
|
**Prerequisites**: User must be in `superset-admin` group
|
|
|
|
#### Trino Connection
|
|
|
|
1. Log in as an admin user
|
|
2. Navigate to **Settings** → **Database Connections** → **+ Database**
|
|
3. Select **Trino** from supported databases
|
|
4. Configure connection:
|
|
|
|
```plain
|
|
DISPLAY NAME: Trino Iceberg (or any name)
|
|
SQLALCHEMY URI: trino://admin:<password>@trino.example.com/iceberg
|
|
```
|
|
|
|
**Important Notes**:
|
|
- **Must use HTTPS hostname** (e.g., `trino.example.com`)
|
|
- **Cannot use internal service** (e.g., `trino.trino:8080`)
|
|
- Trino password authentication requires HTTPS connection
|
|
- Get admin password: `just trino::admin-password`
|
|
|
|
5. Click **TEST CONNECTION** to verify
|
|
6. Click **CONNECT** to save
|
|
|
|
**Available Trino Catalogs**:
|
|
|
|
- `iceberg` - Iceberg data lakehouse (Lakekeeper)
|
|
- `postgresql` - PostgreSQL connector
|
|
- `tpch` - TPC-H benchmark data
|
|
|
|
Example URIs:
|
|
|
|
```plain
|
|
trino://admin:<password>@trino.example.com/iceberg
|
|
trino://admin:<password>@trino.example.com/postgresql
|
|
trino://admin:<password>@trino.example.com/tpch
|
|
```
|
|
|
|
#### Other Database Connections
|
|
|
|
Superset supports many databases. Examples:
|
|
|
|
**PostgreSQL**:
|
|
|
|
```plain
|
|
postgresql://user:password@postgres-cluster-rw.postgres:5432/database
|
|
```
|
|
|
|
**MySQL**:
|
|
|
|
```plain
|
|
mysql://user:password@mysql-host:3306/database
|
|
```
|
|
|
|
### Create Charts and Dashboards
|
|
|
|
1. Navigate to **Charts** → **+ Chart**
|
|
2. Select dataset (from configured database)
|
|
3. Choose visualization type
|
|
4. Configure chart settings
|
|
5. Save chart
|
|
6. Add to dashboard
|
|
|
|
## Features
|
|
|
|
- **Rich Visualizations**: 40+ chart types including tables, line charts, bar charts, maps, etc.
|
|
- **SQL Lab**: Interactive SQL editor with query history
|
|
- **No-code Chart Builder**: Drag-and-drop interface for creating charts
|
|
- **Dashboard Composer**: Create interactive dashboards with filters
|
|
- **Row-level Security**: Control data access per user/role
|
|
- **Alerting & Reports**: Schedule email reports and alerts
|
|
- **Semantic Layer**: Define metrics and dimensions for consistent analysis
|
|
|
|
## Architecture
|
|
|
|
```plain
|
|
External Users
|
|
↓
|
|
Cloudflare Tunnel (HTTPS)
|
|
↓
|
|
Traefik Ingress (HTTPS)
|
|
↓
|
|
Superset Web (HTTP inside cluster)
|
|
├─ OAuth → Keycloak (authentication)
|
|
├─ PostgreSQL (metadata: charts, dashboards, users)
|
|
├─ Redis (cache, Celery broker)
|
|
└─ Celery Worker (async tasks)
|
|
↓
|
|
Data Sources (via HTTPS)
|
|
├─ Trino (analytics)
|
|
├─ PostgreSQL (operational data)
|
|
└─ Others
|
|
```
|
|
|
|
**Key Components**:
|
|
|
|
- **Proxy Fix**: `ENABLE_PROXY_FIX = True` for correct HTTPS redirect URLs behind Traefik
|
|
- **OAuth Integration**: Uses Keycloak OIDC discovery (`.well-known/openid-configuration`)
|
|
- **Database Connections**: Must use external HTTPS hostnames for authenticated connections
|
|
- **Role Mapping**: Keycloak groups map to Superset roles (Admin, Alpha, Gamma)
|
|
|
|
## Security
|
|
|
|
### Pod Security Standards
|
|
|
|
This deployment applies Kubernetes Pod Security Standards at the **baseline** level.
|
|
|
|
#### Security Configuration
|
|
|
|
**Namespace Level**:
|
|
|
|
```bash
|
|
pod-security.kubernetes.io/enforce=baseline
|
|
```
|
|
|
|
**Container Security Context**:
|
|
|
|
- `runAsUser: 1000` (non-root user)
|
|
- `runAsNonRoot: true`
|
|
- `allowPrivilegeEscalation: false`
|
|
- `capabilities: drop ALL`
|
|
- `seccompProfile: RuntimeDefault`
|
|
- `readOnlyRootFilesystem: false` (required for Python package installation)
|
|
|
|
**Init Container (copy-venv)**:
|
|
|
|
- Purpose: Copy Python virtual environment to writable emptyDir volume
|
|
- `runAsUser: 0` (root) - required for `chown` operation
|
|
- Runs before main container to prepare writable `.venv` directory
|
|
|
|
**Volume Configuration**:
|
|
|
|
Two emptyDir volumes are mounted for write operations:
|
|
|
|
1. `/app/.venv` - Python virtual environment (copied from image and made writable)
|
|
2. `/app/superset_home/.cache` - uv package manager cache
|
|
|
|
#### Why Baseline Instead of Restricted?
|
|
|
|
The **baseline** level is required because:
|
|
|
|
1. **Init container needs root**: The `copy-venv` initContainer must run as root (uid=0) to:
|
|
- Copy Python virtual environment from read-only image layer
|
|
- Change ownership to uid=1000 for main container
|
|
- Enable bootstrap script to install additional packages
|
|
|
|
2. **Image architecture limitation**: The official Apache Superset image:
|
|
- Installs Python packages as root during build → `/app/.venv` owned by root
|
|
- Runs application as uid=1000
|
|
- Does not provide writable `.venv` for runtime package installation
|
|
|
|
3. **Restricted would require**:
|
|
- All containers (including init) to run as non-root
|
|
- Custom Docker image with pre-chowned directories
|
|
- Or forgoing bootstrap script package installation
|
|
|
|
**Security Impact**:
|
|
|
|
- **Main application containers run as non-root** (uid=1000) ✓
|
|
- **Init container runs as root** (uid=0) for ~2 seconds during pod startup
|
|
- **Application runtime is non-root** - the attack surface is minimal
|
|
- All other security controls (capabilities drop, seccomp, etc.) are applied
|
|
|
|
#### Achieving Restricted Level (Optional)
|
|
|
|
To deploy with **restricted** Pod Security Standards, create a custom Docker image:
|
|
|
|
```dockerfile
|
|
FROM apachesuperset.docker.scarf.sh/apache/superset:5.0.0
|
|
|
|
# Switch to root to install packages and fix permissions
|
|
USER root
|
|
|
|
# Install required packages into the existing venv
|
|
RUN . /app/.venv/bin/activate && \
|
|
uv pip install psycopg2-binary sqlalchemy-trino authlib
|
|
|
|
# Change ownership to superset user (uid=1000)
|
|
RUN chown -R superset:superset /app/.venv
|
|
|
|
# Switch back to superset user
|
|
USER superset
|
|
```
|
|
|
|
**Changes Required**:
|
|
|
|
1. Build and push custom image to your registry
|
|
2. Update `superset-values.gomplate.yaml`:
|
|
- Change `image.repository` to your custom image
|
|
- Remove `extraVolumes` and `extraVolumeMounts` (emptyDir no longer needed)
|
|
- Remove `initContainers` sections from `init`, `supersetNode`, `supersetWorker`
|
|
- Add `runAsNonRoot: true` to Pod-level `podSecurityContext`
|
|
- Remove `bootstrapScript` (packages already installed in image)
|
|
3. Update namespace label to `restricted`:
|
|
|
|
```bash
|
|
kubectl label namespace superset pod-security.kubernetes.io/enforce=restricted --overwrite
|
|
```
|
|
|
|
**Trade-offs**:
|
|
|
|
- **Pros**: Strictest security posture, all containers run as non-root
|
|
- **Cons**: Custom image maintenance required (rebuild on Superset version updates)
|
|
- **Current approach**: Uses official images with minimal customization via bootstrap script
|
|
|
|
## Authentication
|
|
|
|
### User Login (OAuth)
|
|
|
|
- Users authenticate via Keycloak
|
|
- Standard OIDC flow with Authorization Code grant
|
|
- Group membership included in UserInfo endpoint response
|
|
- Roles synced at each login (`AUTH_ROLES_SYNC_AT_LOGIN = True`)
|
|
|
|
### Role Mapping
|
|
|
|
Keycloak groups automatically map to Superset roles:
|
|
|
|
```python
|
|
AUTH_ROLES_MAPPING = {
|
|
"superset-admin": ["Admin"], # Full privileges
|
|
"Alpha": ["Alpha"], # Create charts/dashboards
|
|
"Gamma": ["Gamma"], # View only
|
|
}
|
|
```
|
|
|
|
**Default Role**: New users are assigned `Gamma` role by default
|
|
|
|
### Access Levels
|
|
|
|
- **Admin**: Full access to all features (requires `superset-admin` group)
|
|
- **Alpha**: Create and edit charts/dashboards
|
|
- **Gamma**: View charts and dashboards only
|
|
|
|
## Management
|
|
|
|
### Upgrade Superset
|
|
|
|
```bash
|
|
just superset::upgrade
|
|
```
|
|
|
|
Updates the Helm deployment with current configuration.
|
|
|
|
### Uninstall
|
|
|
|
```bash
|
|
# Keep PostgreSQL database
|
|
just superset::uninstall false
|
|
|
|
# Delete PostgreSQL database and user
|
|
just superset::uninstall true
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Check Pod Status
|
|
|
|
```bash
|
|
kubectl get pods -n superset
|
|
```
|
|
|
|
Expected pods:
|
|
|
|
- `superset-*` - Main application (1 replica)
|
|
- `superset-worker-*` - Celery worker (1 replica)
|
|
- `superset-redis-master-*` - Redis cache
|
|
- `superset-init-db-*` - Database initialization (Completed)
|
|
|
|
### OAuth Login Fails with "Invalid parameter: redirect_uri"
|
|
|
|
**Error**: Redirect URI uses `http://` instead of `https://`
|
|
|
|
**Solution**: Ensure proxy configuration is enabled in `configOverrides`:
|
|
|
|
```python
|
|
ENABLE_PROXY_FIX = True
|
|
PREFERRED_URL_SCHEME = "https"
|
|
```
|
|
|
|
### OAuth Login Fails with "The request to sign in was denied"
|
|
|
|
**Error**: `Missing "jwks_uri" in metadata`
|
|
|
|
**Solution**: Ensure `server_metadata_url` is configured in OAuth provider:
|
|
|
|
```python
|
|
"server_metadata_url": f"https://{KEYCLOAK_HOST}/realms/{REALM}/.well-known/openid-configuration"
|
|
```
|
|
|
|
### Database Connection Test Fails
|
|
|
|
#### Trino: "Password not allowed for insecure authentication"
|
|
|
|
- Must use external HTTPS hostname (e.g., `trino.example.com`)
|
|
- Cannot use internal service name (e.g., `trino.trino:8080`)
|
|
- Trino enforces HTTPS for password authentication
|
|
|
|
#### Trino: "error 401: Basic authentication required"
|
|
|
|
- Missing username in SQLAlchemy URI
|
|
- Format: `trino://username:password@host:port/catalog`
|
|
|
|
### Database Connection Not Available
|
|
|
|
- Only users in `superset-admin` Keycloak group can add databases
|
|
- Add user to group: `just keycloak::add-user-to-group <user> superset-admin`
|
|
- Logout and login again to sync roles
|
|
|
|
### Worker Pod Crashes
|
|
|
|
Check worker logs:
|
|
|
|
```bash
|
|
kubectl logs -n superset deployment/superset-worker
|
|
```
|
|
|
|
Common issues:
|
|
|
|
- Redis connection failed (check Redis pod status)
|
|
- PostgreSQL connection failed (check database credentials)
|
|
- Missing Python packages (check bootstrapScript execution)
|
|
|
|
### Package Installation Issues
|
|
|
|
Superset 5.0+ uses `uv` for package management. Check bootstrap logs:
|
|
|
|
```bash
|
|
kubectl logs -n superset deployment/superset -c superset | grep "uv pip install"
|
|
```
|
|
|
|
Expected packages:
|
|
|
|
- `psycopg2-binary` - PostgreSQL driver
|
|
- `sqlalchemy-trino` - Trino driver
|
|
- `authlib` - OAuth library
|
|
|
|
### Chart/Dashboard Not Loading
|
|
|
|
- Check browser console for errors
|
|
- Verify database connection is active: Settings → Database Connections
|
|
- Test query in SQL Lab first
|
|
- Check Superset logs for errors
|
|
|
|
### "Unable to migrate query editor state to backend" Error
|
|
|
|
**Symptom**: Repeated error message in SQL Lab:
|
|
|
|
```plain
|
|
Unable to migrate query editor state to backend. Superset will retry later.
|
|
Please contact your administrator if this problem persists.
|
|
```
|
|
|
|
**Root Cause**: Known Apache Superset bug ([#30351](https://github.com/apache/superset/issues/30351), [#33423](https://github.com/apache/superset/issues/33423)) where `/tabstateview/` endpoint returns HTTP 400 errors. Multiple underlying causes:
|
|
|
|
- Missing `dbId` in query editor state (KeyError)
|
|
- Foreign key constraint violations in `tab_state` table
|
|
- Missing PostgreSQL development tools in container images
|
|
|
|
**Solution**: Disable SQL Lab backend persistence in `configOverrides`:
|
|
|
|
```python
|
|
# Disable SQL Lab backend persistence to avoid tab state migration errors
|
|
SQLLAB_BACKEND_PERSISTENCE = False
|
|
```
|
|
|
|
**Impact**:
|
|
|
|
- Query editor state stored in browser local storage only (not in database)
|
|
- Browser cache clear may lose unsaved queries
|
|
- Use "Saved Queries" feature for important queries
|
|
- This configuration is already applied in this deployment
|
|
|
|
## References
|
|
|
|
- [Apache Superset Documentation](https://superset.apache.org/docs/)
|
|
- [Superset GitHub](https://github.com/apache/superset)
|
|
- [Superset Helm Chart](https://github.com/apache/superset/tree/master/helm/superset)
|
|
- [Trino Integration](../trino/README.md)
|
|
- [Keycloak OAuth](https://www.keycloak.org/docs/latest/securing_apps/#_oidc)
|