diff --git a/CLAUDE.md b/CLAUDE.md index 42f3d4d..7b266de 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -22,6 +22,27 @@ just # Show all available commands - **List All Recipes**: Run `just` to display all available recipes across modules - **Module-Specific Help**: Run `just ` (e.g., `just keycloak`) to show recipes for that module - **Execution Location**: ALWAYS run all recipes from the top directory (buun-stack root) +- **Recipe Parameters**: Recipe parameters are passed as **positional arguments**, not named arguments + +**Parameter Passing Examples:** + +```bash +# CORRECT: Positional arguments +just postgres::create-user-and-db superset superset "password123" + +# INCORRECT: Named arguments (will not work) +just postgres::create-user-and-db username=superset db_name=superset password="password123" + +# Recipe definition (for reference) +create-user-and-db username='' db_name='' password='': + just create-db "{{ db_name }}" + just create-user "{{ username }}" "{{ password }}" +``` + +**Important Notes:** +- Parameters must be passed in the exact order they appear in the recipe definition +- Named parameter syntax in the recipe definition is only for documentation +- Always quote parameters that contain special characters or spaces ### Core Installation Sequence @@ -92,53 +113,69 @@ All scripts in `/keycloak/scripts/` follow this pattern: ### Credential Storage Pattern -The credential storage approach depends on whether External Secrets Operator is available: - -**When External Secrets is available** (determined by `helm status external-secrets -n ${EXTERNAL_SECRETS_NAMESPACE}`): - -- Credentials are generated and stored in Vault using `just vault::put` commands -- Vault commands are used for secret management - -```bash -# Example: PostgreSQL superuser password (only when External Secrets is available) -just vault::get secret/postgres/superuser password -``` - -**When External Secrets is NOT available**: - -- Credentials are stored directly as Kubernetes Secrets -- Vault commands are NOT used +The credential storage approach depends on the type of secret and whether External Secrets Operator is available: #### Secret Management Rules 1. **Environment File**: Do NOT write to `.env.local` directly for secrets. Use it only for configuration values. -2. **Vault and External Secrets Integration**: - - When Vault and External Secrets are available, ALWAYS: - - Store secrets in Vault - - Create ExternalSecret resources to sync secrets from Vault to Kubernetes +2. **Two Types of Secrets**: + + **Application Secrets** (Metabase, Querybook, Superset, etc.): + - When External Secrets Operator is available: + - Store in Vault using `just vault::put` + - Create ExternalSecret resources to sync from Vault to Kubernetes - Let External Secrets Operator create the actual Secret resources - - Check availability with: + - When External Secrets Operator is NOT available: + - Create Kubernetes Secrets directly + - Do NOT store in Vault (even if Vault is available) - ```bash - if helm status external-secrets -n ${EXTERNAL_SECRETS_NAMESPACE} &>/dev/null; then - # Use Vault + External Secrets pattern - fi - ``` + ```bash + if helm status external-secrets -n ${EXTERNAL_SECRETS_NAMESPACE} &>/dev/null; then + # Store in Vault + create ExternalSecret + just vault::put app/config key="${value}" + gomplate -f app-external-secret.gomplate.yaml | kubectl apply -f - + else + # Create Kubernetes Secret directly (no Vault) + kubectl create secret generic app-secret --from-literal=key="${value}" + fi + ``` -3. **Fallback Pattern**: Only create Kubernetes Secrets directly when Vault/External Secrets are not available. + **Core/Admin Credentials** (PostgreSQL superuser, Keycloak admin, MinIO root, etc.): + - When External Secrets Operator is available: + - Store in Vault using `just vault::put` or `just vault::put-root` + - Create ExternalSecret resources + - When External Secrets Operator is NOT available: + - Create Kubernetes Secrets directly + - ALSO store in Vault if Vault is available (as backup) -4. **Helm Values Secret References**: + ```bash + if helm status external-secrets -n ${EXTERNAL_SECRETS_NAMESPACE} &>/dev/null; then + # Store in Vault + create ExternalSecret + just vault::put-root postgres/admin username=postgres password="${password}" + gomplate -f postgres-superuser-external-secret.gomplate.yaml | kubectl apply -f - + else + # Create Kubernetes Secret directly + kubectl create secret generic postgres-cluster-superuser \ + --from-literal=username=postgres --from-literal=password="${password}" + # ALSO store in Vault if available (backup for admin credentials) + if helm status vault -n ${K8S_VAULT_NAMESPACE} &>/dev/null; then + just vault::put-root postgres/admin username=postgres password="${password}" + fi + fi + ``` + +3. **Helm Values Secret References**: - When Helm charts support referencing external Secrets (via `existingSecret`, `secretName`, etc.), ALWAYS use this pattern - Create the Secret using External Secrets (preferred) or directly as Kubernetes Secret - Reference the Secret in Helm values instead of embedding credentials -5. **Keycloak Client Configuration**: +4. **Keycloak Client Configuration**: - Prefer creating Public clients (without client secret) when possible - Public clients are suitable for browser-based applications and native apps - Only use confidential clients (with secret) when required by the service -6. **Password Generation**: +5. **Password Generation**: - Use `just utils::random-password` whenever possible to generate random passwords - Avoid using `openssl rand -base64 32` or other direct methods - This ensures consistent password generation across all modules diff --git a/README.md b/README.md index 590aa65..d45b6a5 100644 --- a/README.md +++ b/README.md @@ -40,7 +40,8 @@ A remotely accessible Kubernetes home lab with OIDC authentication. Build a mode - **[ClickHouse](https://clickhouse.com/)**: High-performance columnar analytics database - **[Qdrant](https://qdrant.tech/)**: Vector database for AI/ML applications - **[Lakekeeper](https://lakekeeper.io/)**: Apache Iceberg REST Catalog for data lake management -- **[Metabase](https://www.metabase.com/)**: Business intelligence and data visualization +- **[Apache Superset](https://superset.apache.org/)**: BI platform with rich chart types and high customizability +- **[Metabase](https://www.metabase.com/)**: Lightweight BI with simple configuration and clean, modern interface - **[DataHub](https://datahubproject.io/)**: Data catalog and metadata management ### Orchestration (Optional) @@ -147,6 +148,18 @@ Multi-user platform for interactive computing with Keycloak authentication and p [📖 See JupyterHub Documentation](./jupyterhub/README.md) +### Apache Superset + +Modern business intelligence platform with rich visualization capabilities: + +- **40+ Chart Types**: Mixed charts, treemaps, sunburst, heatmaps, and more +- **SQL Lab**: Powerful SQL editor for complex queries and dataset creation +- **Keycloak Authentication**: OAuth2 integration with group-based admin access +- **Trino Integration**: Connect to Iceberg data lake and multiple data sources +- **High Customizability**: Extensive chart configuration and dashboard design options + +[📖 See Superset Documentation](./superset/README.md) + ### Metabase Business intelligence and data visualization platform with PostgreSQL integration. @@ -312,6 +325,7 @@ kubectl --context yourpc-oidc get nodes # Keycloak: https://auth.yourdomain.com # Trino: https://trino.yourdomain.com # Querybook: https://querybook.yourdomain.com +# Superset: https://superset.yourdomain.com # Metabase: https://metabase.yourdomain.com # Airflow: https://airflow.yourdomain.com # JupyterHub: https://jupyter.yourdomain.com diff --git a/justfile b/justfile index 5e14ba0..ab7926b 100644 --- a/justfile +++ b/justfile @@ -24,6 +24,7 @@ mod oauth2-proxy mod postgres mod qdrant mod querybook +mod superset mod trino mod utils mod vault diff --git a/superset/.gitignore b/superset/.gitignore new file mode 100644 index 0000000..ad64c4e --- /dev/null +++ b/superset/.gitignore @@ -0,0 +1,3 @@ +# Generated files from gomplate templates +superset-values.yaml +superset-config-external-secret.yaml diff --git a/superset/README.md b/superset/README.md new file mode 100644 index 0000000..110a7cc --- /dev/null +++ b/superset/README.md @@ -0,0 +1,341 @@ +# Apache Superset + +Modern, enterprise-ready business intelligence web application with Keycloak OAuth authentication and Trino integration. + +## Overview + +This module deploys Apache Superset using the official Helm chart with: + +- **Keycloak OAuth authentication** for user login +- **Trino integration** for data lake analytics +- **PostgreSQL backend** for metadata storage (dedicated user) +- **Redis** for caching and Celery task queue +- **HTTPS reverse proxy support** via Traefik +- **Group-based access control** via Keycloak groups + +## Prerequisites + +- Kubernetes cluster (k3s) +- Keycloak installed and configured +- PostgreSQL cluster (CloudNativePG) +- Trino with password authentication +- External Secrets Operator (optional, for Vault integration) + +## Installation + +### Basic Installation + +```bash +just superset::install +``` + +You will be prompted for: + +1. **Superset host (FQDN)**: e.g., `superset.example.com` +2. **Keycloak host (FQDN)**: e.g., `auth.example.com` + +### What Gets Installed + +- Superset web application +- Superset worker (Celery for async tasks) +- PostgreSQL database and user for Superset metadata +- Redis for caching and Celery broker +- Keycloak OAuth client (confidential client) +- `superset-admin` group in Keycloak for admin access + +## Configuration + +Environment variables (set in `.env.local` or override): + +```bash +SUPERSET_NAMESPACE=superset # Kubernetes namespace +SUPERSET_CHART_VERSION=0.15.0 # Helm chart version +SUPERSET_HOST=superset.example.com # External hostname +KEYCLOAK_HOST=auth.example.com # Keycloak hostname +KEYCLOAK_REALM=buunstack # Keycloak realm name +``` + +### Architecture Notes + +**Superset 5.0+ Changes**: + +- Uses `uv` instead of `pip` for package management +- Lean base image without database drivers (installed via bootstrapScript) +- Required packages: `psycopg2-binary`, `sqlalchemy-trino`, `authlib` + +**Redis Image**: + +- Uses `bitnami/redis:latest` due to Bitnami's August 2025 strategy change +- Community users can only use `latest` tag (no version pinning) +- For production version pinning, consider using official Redis image separately + +## Usage + +### Access Superset + +1. Navigate to `https://your-superset-host/` +2. Click "Sign in with Keycloak" to authenticate +3. Create charts and dashboards + +### Grant Admin Access + +Add users to the `superset-admin` group: + +```bash +just keycloak::add-user-to-group superset-admin +``` + +Admin users have full privileges including: + +- Database connection management +- User and role management +- All chart and dashboard operations + +### Configure Database Connections + +**Prerequisites**: User must be in `superset-admin` group + +#### Trino Connection + +1. Log in as an admin user +2. Navigate to **Settings** → **Database Connections** → **+ Database** +3. Select **Trino** from supported databases +4. Configure connection: + + ```plain + DISPLAY NAME: Trino Iceberg (or any name) + SQLALCHEMY URI: trino://admin:@trino.example.com/iceberg + ``` + + **Important Notes**: + - **Must use HTTPS hostname** (e.g., `trino.example.com`) + - **Cannot use internal service** (e.g., `trino.trino:8080`) + - Trino password authentication requires HTTPS connection + - Get admin password: `just trino::admin-password` + +5. Click **TEST CONNECTION** to verify +6. Click **CONNECT** to save + +**Available Trino Catalogs**: + +- `iceberg` - Iceberg data lakehouse (Lakekeeper) +- `postgresql` - PostgreSQL connector +- `tpch` - TPC-H benchmark data + +Example URIs: + +```plain +trino://admin:@trino.example.com/iceberg +trino://admin:@trino.example.com/postgresql +trino://admin:@trino.example.com/tpch +``` + +#### Other Database Connections + +Superset supports many databases. Examples: + +**PostgreSQL**: + +```plain +postgresql://user:password@postgres-cluster-rw.postgres:5432/database +``` + +**MySQL**: + +```plain +mysql://user:password@mysql-host:3306/database +``` + +### Create Charts and Dashboards + +1. Navigate to **Charts** → **+ Chart** +2. Select dataset (from configured database) +3. Choose visualization type +4. Configure chart settings +5. Save chart +6. Add to dashboard + +## Features + +- **Rich Visualizations**: 40+ chart types including tables, line charts, bar charts, maps, etc. +- **SQL Lab**: Interactive SQL editor with query history +- **No-code Chart Builder**: Drag-and-drop interface for creating charts +- **Dashboard Composer**: Create interactive dashboards with filters +- **Row-level Security**: Control data access per user/role +- **Alerting & Reports**: Schedule email reports and alerts +- **Semantic Layer**: Define metrics and dimensions for consistent analysis + +## Architecture + +```plain +External Users + ↓ +Cloudflare Tunnel (HTTPS) + ↓ +Traefik Ingress (HTTPS) + ↓ +Superset Web (HTTP inside cluster) + ├─ OAuth → Keycloak (authentication) + ├─ PostgreSQL (metadata: charts, dashboards, users) + ├─ Redis (cache, Celery broker) + └─ Celery Worker (async tasks) + ↓ +Data Sources (via HTTPS) + ├─ Trino (analytics) + ├─ PostgreSQL (operational data) + └─ Others +``` + +**Key Components**: + +- **Proxy Fix**: `ENABLE_PROXY_FIX = True` for correct HTTPS redirect URLs behind Traefik +- **OAuth Integration**: Uses Keycloak OIDC discovery (`.well-known/openid-configuration`) +- **Database Connections**: Must use external HTTPS hostnames for authenticated connections +- **Role Mapping**: Keycloak groups map to Superset roles (Admin, Alpha, Gamma) + +## Authentication + +### User Login (OAuth) + +- Users authenticate via Keycloak +- Standard OIDC flow with Authorization Code grant +- Group membership included in UserInfo endpoint response +- Roles synced at each login (`AUTH_ROLES_SYNC_AT_LOGIN = True`) + +### Role Mapping + +Keycloak groups automatically map to Superset roles: + +```python +AUTH_ROLES_MAPPING = { + "superset-admin": ["Admin"], # Full privileges + "Alpha": ["Alpha"], # Create charts/dashboards + "Gamma": ["Gamma"], # View only +} +``` + +**Default Role**: New users are assigned `Gamma` role by default + +### Access Levels + +- **Admin**: Full access to all features (requires `superset-admin` group) +- **Alpha**: Create and edit charts/dashboards +- **Gamma**: View charts and dashboards only + +## Management + +### Upgrade Superset + +```bash +just superset::upgrade +``` + +Updates the Helm deployment with current configuration. + +### Uninstall + +```bash +# Keep PostgreSQL database +just superset::uninstall false + +# Delete PostgreSQL database and user +just superset::uninstall true +``` + +## Troubleshooting + +### Check Pod Status + +```bash +kubectl get pods -n superset +``` + +Expected pods: + +- `superset-*` - Main application (1 replica) +- `superset-worker-*` - Celery worker (1 replica) +- `superset-redis-master-*` - Redis cache +- `superset-init-db-*` - Database initialization (Completed) + +### OAuth Login Fails with "Invalid parameter: redirect_uri" + +**Error**: Redirect URI uses `http://` instead of `https://` + +**Solution**: Ensure proxy configuration is enabled in `configOverrides`: + +```python +ENABLE_PROXY_FIX = True +PREFERRED_URL_SCHEME = "https" +``` + +### OAuth Login Fails with "The request to sign in was denied" + +**Error**: `Missing "jwks_uri" in metadata` + +**Solution**: Ensure `server_metadata_url` is configured in OAuth provider: + +```python +"server_metadata_url": f"https://{KEYCLOAK_HOST}/realms/{REALM}/.well-known/openid-configuration" +``` + +### Database Connection Test Fails + +#### Trino: "Password not allowed for insecure authentication" + +- Must use external HTTPS hostname (e.g., `trino.example.com`) +- Cannot use internal service name (e.g., `trino.trino:8080`) +- Trino enforces HTTPS for password authentication + +#### Trino: "error 401: Basic authentication required" + +- Missing username in SQLAlchemy URI +- Format: `trino://username:password@host:port/catalog` + +### Database Connection Not Available + +- Only users in `superset-admin` Keycloak group can add databases +- Add user to group: `just keycloak::add-user-to-group superset-admin` +- Logout and login again to sync roles + +### Worker Pod Crashes + +Check worker logs: + +```bash +kubectl logs -n superset deployment/superset-worker +``` + +Common issues: + +- Redis connection failed (check Redis pod status) +- PostgreSQL connection failed (check database credentials) +- Missing Python packages (check bootstrapScript execution) + +### Package Installation Issues + +Superset 5.0+ uses `uv` for package management. Check bootstrap logs: + +```bash +kubectl logs -n superset deployment/superset -c superset | grep "uv pip install" +``` + +Expected packages: + +- `psycopg2-binary` - PostgreSQL driver +- `sqlalchemy-trino` - Trino driver +- `authlib` - OAuth library + +### Chart/Dashboard Not Loading + +- Check browser console for errors +- Verify database connection is active: Settings → Database Connections +- Test query in SQL Lab first +- Check Superset logs for errors + +## References + +- [Apache Superset Documentation](https://superset.apache.org/docs/) +- [Superset GitHub](https://github.com/apache/superset) +- [Superset Helm Chart](https://github.com/apache/superset/tree/master/helm/superset) +- [Trino Integration](../trino/README.md) +- [Keycloak OAuth](https://www.keycloak.org/docs/latest/securing_apps/#_oidc) diff --git a/superset/justfile b/superset/justfile new file mode 100644 index 0000000..5a46bb1 --- /dev/null +++ b/superset/justfile @@ -0,0 +1,259 @@ +set fallback := true + +export SUPERSET_NAMESPACE := env("SUPERSET_NAMESPACE", "superset") +export SUPERSET_CHART_VERSION := env("SUPERSET_CHART_VERSION", "0.15.0") +export SUPERSET_HOST := env("SUPERSET_HOST", "") +export EXTERNAL_SECRETS_NAMESPACE := env("EXTERNAL_SECRETS_NAMESPACE", "external-secrets") +export K8S_VAULT_NAMESPACE := env("K8S_VAULT_NAMESPACE", "vault") +export KEYCLOAK_REALM := env("KEYCLOAK_REALM", "buunstack") +export KEYCLOAK_HOST := env("KEYCLOAK_HOST", "") + +[private] +default: + @just --list --unsorted --list-submodules + +# Add Helm repository +add-helm-repo: + helm repo add superset https://apache.github.io/superset + helm repo update + +# Remove Helm repository +remove-helm-repo: + helm repo remove superset + +# Create Superset namespace +create-namespace: + @kubectl get namespace ${SUPERSET_NAMESPACE} &>/dev/null || \ + kubectl create namespace ${SUPERSET_NAMESPACE} + +# Delete Superset namespace +delete-namespace: + @kubectl delete namespace ${SUPERSET_NAMESPACE} --ignore-not-found + +# Create Keycloak client and OAuth secret for Superset +create-keycloak-client: + #!/bin/bash + set -euo pipefail + while [ -z "${SUPERSET_HOST}" ]; do + SUPERSET_HOST=$( + gum input --prompt="Superset host (FQDN): " --width=100 \ + --placeholder="e.g., superset.example.com" + ) + done + + echo "Creating Keycloak client for Superset..." + + just keycloak::delete-client ${KEYCLOAK_REALM} superset || true + + CLIENT_SECRET=$(just utils::random-password) + + just keycloak::create-group superset-admin '' 'Superset administrators' || echo "Group may already exist" + + just keycloak::create-client \ + realm=${KEYCLOAK_REALM} \ + client_id=superset \ + redirect_url="https://${SUPERSET_HOST}/oauth-authorized/keycloak" \ + client_secret="${CLIENT_SECRET}" + + just keycloak::add-groups-mapper superset + + kubectl delete secret superset-oauth-temp -n ${SUPERSET_NAMESPACE} --ignore-not-found + kubectl create secret generic superset-oauth-temp -n ${SUPERSET_NAMESPACE} \ + --from-literal=client_secret="${CLIENT_SECRET}" + + echo "Keycloak client created successfully" + echo "Client ID: superset" + echo "Redirect URI: https://${SUPERSET_HOST}/oauth-authorized/keycloak" + echo "" + echo "Admin Group: superset-admin" + echo "To grant admin access, add users to 'superset-admin' group:" + echo " just keycloak::add-user-to-group superset-admin" + +# Delete Keycloak client +delete-keycloak-client: + #!/bin/bash + set -euo pipefail + echo "Deleting Keycloak client for Superset..." + just keycloak::delete-client ${KEYCLOAK_REALM} superset || true + echo "Deleting superset-admin group..." + just keycloak::delete-group superset-admin || true + kubectl delete secret superset-oauth-temp -n ${SUPERSET_NAMESPACE} --ignore-not-found + +# Create Superset secrets +create-secrets postgres_password='': + #!/bin/bash + set -euo pipefail + + secret_key=$(just utils::random-password) + + pg_host="postgres-cluster-rw.postgres" + pg_port="5432" + pg_user="superset" + pg_password="{{ postgres_password }}" + pg_database="superset" + + database_url="postgresql://${pg_user}:${pg_password}@${pg_host}:${pg_port}/${pg_database}" + + if helm status vault -n ${K8S_VAULT_NAMESPACE} &>/dev/null && \ + just vault::get superset/oauth client_secret &>/dev/null; then + oauth_client_secret=$(just vault::get superset/oauth client_secret) + elif kubectl get secret superset-oauth-temp -n ${SUPERSET_NAMESPACE} &>/dev/null; then + oauth_client_secret=$(kubectl get secret superset-oauth-temp -n ${SUPERSET_NAMESPACE} \ + -o jsonpath='{.data.client_secret}' | base64 -d) + else + echo "Error: Cannot retrieve OAuth client secret. Please run 'just superset::create-keycloak-client' first." + exit 1 + fi + + if helm status external-secrets -n ${EXTERNAL_SECRETS_NAMESPACE} &>/dev/null; then + echo "External Secrets Operator detected. Storing secrets in Vault..." + + just vault::put superset/config \ + SECRET_KEY="${secret_key}" \ + SQLALCHEMY_DATABASE_URI="${database_url}" \ + OAUTH_CLIENT_SECRET="${oauth_client_secret}" + + kubectl delete secret superset-secret -n ${SUPERSET_NAMESPACE} --ignore-not-found + kubectl delete externalsecret superset-secret -n ${SUPERSET_NAMESPACE} --ignore-not-found + + gomplate -f superset-config-external-secret.gomplate.yaml \ + -o superset-config-external-secret.yaml + kubectl apply -f superset-config-external-secret.yaml + + echo "Waiting for ExternalSecret to sync..." + kubectl wait --for=condition=Ready externalsecret/superset-secret \ + -n ${SUPERSET_NAMESPACE} --timeout=60s + else + echo "External Secrets Operator not found. Creating secret directly..." + kubectl delete secret superset-secret -n ${SUPERSET_NAMESPACE} --ignore-not-found + kubectl create secret generic superset-secret -n ${SUPERSET_NAMESPACE} \ + --from-literal=SECRET_KEY="${secret_key}" \ + --from-literal=SQLALCHEMY_DATABASE_URI="${database_url}" \ + --from-literal=OAUTH_CLIENT_SECRET="${oauth_client_secret}" + fi + +# Delete Superset secrets +delete-secrets: + @kubectl delete secret superset-secret -n ${SUPERSET_NAMESPACE} --ignore-not-found + @kubectl delete externalsecret superset-secret -n ${SUPERSET_NAMESPACE} --ignore-not-found + +# Install Superset +install: + #!/bin/bash + set -euo pipefail + while [ -z "${SUPERSET_HOST}" ]; do + SUPERSET_HOST=$( + gum input --prompt="Superset host (FQDN): " --width=100 \ + --placeholder="e.g., superset.example.com" + ) + done + + while [ -z "${KEYCLOAK_HOST}" ]; do + KEYCLOAK_HOST=$( + gum input --prompt="Keycloak host (FQDN): " --width=100 \ + --placeholder="e.g., auth.example.com" + ) + done + + just create-namespace + + # Create Superset database and user + POSTGRES_PASSWORD=$(just utils::random-password) + just postgres::create-user-and-db superset superset "${POSTGRES_PASSWORD}" + + just create-keycloak-client + just create-secrets "${POSTGRES_PASSWORD}" + + if helm status vault -n ${K8S_VAULT_NAMESPACE} &>/dev/null && \ + just vault::get superset/oauth client_secret &>/dev/null; then + export OAUTH_CLIENT_SECRET=$(just vault::get superset/oauth client_secret) + elif kubectl get secret superset-oauth-temp -n ${SUPERSET_NAMESPACE} &>/dev/null; then + export OAUTH_CLIENT_SECRET=$(kubectl get secret superset-oauth-temp -n ${SUPERSET_NAMESPACE} \ + -o jsonpath='{.data.client_secret}' | base64 -d) + else + echo "Error: Cannot retrieve OAuth client secret. Please run 'just superset::create-keycloak-client' first." + exit 1 + fi + + export SUPERSET_DB_PASSWORD="${POSTGRES_PASSWORD}" + + just add-helm-repo + gomplate -f superset-values.gomplate.yaml -o superset-values.yaml + + helm upgrade --cleanup-on-fail --install superset superset/superset \ + --version ${SUPERSET_CHART_VERSION} -n ${SUPERSET_NAMESPACE} --wait \ + -f superset-values.yaml + + echo "" + echo "Superset installed successfully!" + echo "Access URL: https://${SUPERSET_HOST}" + echo "" + echo "OAuth Configuration:" + echo " Provider: Keycloak" + echo " Realm: ${KEYCLOAK_REALM}" + echo " Authorization URL: https://${KEYCLOAK_HOST}/realms/${KEYCLOAK_REALM}/protocol/openid-connect/auth" + echo "" + echo "Admin Access:" + echo " To grant admin access, add users to 'superset-admin' group:" + echo " just keycloak::add-user-to-group superset-admin" + echo "" + +# Upgrade Superset +upgrade: + #!/bin/bash + set -euo pipefail + while [ -z "${SUPERSET_HOST}" ]; do + SUPERSET_HOST=$( + gum input --prompt="Superset host (FQDN): " --width=100 \ + --placeholder="e.g., superset.example.com" + ) + done + + while [ -z "${KEYCLOAK_HOST}" ]; do + KEYCLOAK_HOST=$( + gum input --prompt="Keycloak host (FQDN): " --width=100 \ + --placeholder="e.g., auth.example.com" + ) + done + + if helm status vault -n ${K8S_VAULT_NAMESPACE} &>/dev/null && \ + just vault::get superset/oauth client_secret &>/dev/null; then + export OAUTH_CLIENT_SECRET=$(just vault::get superset/oauth client_secret) + elif kubectl get secret superset-oauth-temp -n ${SUPERSET_NAMESPACE} &>/dev/null; then + export OAUTH_CLIENT_SECRET=$(kubectl get secret superset-oauth-temp -n ${SUPERSET_NAMESPACE} \ + -o jsonpath='{.data.client_secret}' | base64 -d) + else + echo "Error: Cannot retrieve OAuth client secret. Please run 'just superset::create-keycloak-client' first." + exit 1 + fi + + # Extract database password from SQLALCHEMY_DATABASE_URI in existing secret + database_uri=$(kubectl get secret superset-secret -n ${SUPERSET_NAMESPACE} \ + -o jsonpath='{.data.SQLALCHEMY_DATABASE_URI}' | base64 -d) + export SUPERSET_DB_PASSWORD=$(echo "$database_uri" | sed -n 's|.*://[^:]*:\([^@]*\)@.*|\1|p') + + echo "Upgrading Superset..." + + gomplate -f superset-values.gomplate.yaml -o superset-values.yaml + helm upgrade superset superset/superset \ + --version ${SUPERSET_CHART_VERSION} -n ${SUPERSET_NAMESPACE} --wait \ + -f superset-values.yaml + + echo "Superset upgraded successfully" + +# Uninstall Superset +uninstall delete-db='true': + #!/bin/bash + set -euo pipefail + helm uninstall superset -n ${SUPERSET_NAMESPACE} --ignore-not-found --wait + just delete-secrets + just delete-keycloak-client + just delete-namespace + if [ "{{ delete-db }}" = "true" ]; then + just postgres::delete-user-and-db superset superset + fi + + if helm status vault -n ${K8S_VAULT_NAMESPACE} &>/dev/null; then + just vault::delete superset/config || true + just vault::delete superset/oauth || true + fi diff --git a/superset/superset-config-external-secret.gomplate.yaml b/superset/superset-config-external-secret.gomplate.yaml new file mode 100644 index 0000000..17b725f --- /dev/null +++ b/superset/superset-config-external-secret.gomplate.yaml @@ -0,0 +1,26 @@ +apiVersion: external-secrets.io/v1 +kind: ExternalSecret +metadata: + name: superset-secret + namespace: {{ .Env.SUPERSET_NAMESPACE }} +spec: + refreshInterval: 1h + secretStoreRef: + name: vault-secret-store + kind: ClusterSecretStore + target: + name: superset-secret + creationPolicy: Owner + data: + - secretKey: SECRET_KEY + remoteRef: + key: superset/config + property: SECRET_KEY + - secretKey: SQLALCHEMY_DATABASE_URI + remoteRef: + key: superset/config + property: SQLALCHEMY_DATABASE_URI + - secretKey: OAUTH_CLIENT_SECRET + remoteRef: + key: superset/config + property: OAUTH_CLIENT_SECRET diff --git a/superset/superset-values.gomplate.yaml b/superset/superset-values.gomplate.yaml new file mode 100644 index 0000000..cd5ab35 --- /dev/null +++ b/superset/superset-values.gomplate.yaml @@ -0,0 +1,168 @@ +# Apache Superset Helm values +# Generated by gomplate + +# Service configuration +service: + type: ClusterIP + port: 8088 + +# Ingress configuration +ingress: + enabled: true + ingressClassName: traefik + annotations: + kubernetes.io/ingress.class: traefik + traefik.ingress.kubernetes.io/router.entrypoints: websecure + hosts: + - {{ env.Getenv "SUPERSET_HOST" }} + tls: + - secretName: superset-tls + hosts: + - {{ env.Getenv "SUPERSET_HOST" }} + +# Init job settings (disable to use external database initialization) +init: + enabled: true + loadExamples: false + +# Superset node configuration +supersetNode: + replicaCount: 1 + connections: + # Redis configuration + redis_host: superset-redis-headless + redis_port: "6379" + redis_cache_db: "1" + redis_celery_db: "0" + # PostgreSQL configuration for initContainer (wait-for-postgres) + # The actual database connection uses SQLALCHEMY_DATABASE_URI from extraEnvRaw + db_host: postgres-cluster-rw.postgres + db_port: "5432" + db_user: superset + db_pass: {{ env.Getenv "SUPERSET_DB_PASSWORD" }} + db_name: superset + +# Superset worker (Celery) configuration +supersetWorker: + replicaCount: 1 + +# Database configuration (use existing PostgreSQL) +postgresql: + enabled: false + +# Redis configuration (embedded) +redis: + enabled: true + image: + registry: docker.io + repository: bitnami/redis + # Since August 2025, Bitnami changed its strategy: + # - Community users can only use 'latest' tag (no version pinning) + # - Versioned tags moved to 'bitnamilegacy' repository (deprecated, no updates) + # - For production with version pinning, consider using official redis image separately + tag: latest + master: + persistence: + enabled: false + +# Extra environment variables +extraEnv: + KEYCLOAK_HOST: {{ env.Getenv "KEYCLOAK_HOST" }} + KEYCLOAK_REALM: {{ env.Getenv "KEYCLOAK_REALM" }} + +# Extra environment variables from existing secrets +extraEnvRaw: + - name: SUPERSET_SECRET_KEY + valueFrom: + secretKeyRef: + name: superset-secret + key: SECRET_KEY + - name: SQLALCHEMY_DATABASE_URI + valueFrom: + secretKeyRef: + name: superset-secret + key: SQLALCHEMY_DATABASE_URI + - name: OAUTH_CLIENT_SECRET + valueFrom: + secretKeyRef: + name: superset-secret + key: OAUTH_CLIENT_SECRET + +# Configuration overrides for superset_config.py +configOverrides: + keycloak_oauth: | + import os + from flask_appbuilder.security.manager import AUTH_OAUTH + from superset.security import SupersetSecurityManager + + + class CustomSsoSecurityManager(SupersetSecurityManager): + def oauth_user_info(self, provider, response=None): + """Get user information from OAuth provider.""" + if provider == "keycloak": + me = self.appbuilder.sm.oauth_remotes[provider].get( + "protocol/openid-connect/userinfo" + ) + data = me.json() + return { + "username": data.get("preferred_username"), + "name": data.get("name"), + "email": data.get("email"), + "first_name": data.get("given_name", ""), + "last_name": data.get("family_name", ""), + "role_keys": data.get("groups", []), + } + return {} + + + # Authentication type + AUTH_TYPE = AUTH_OAUTH + + # Auto-registration for new users + AUTH_USER_REGISTRATION = True + AUTH_USER_REGISTRATION_ROLE = "Gamma" + + # Custom security manager + CUSTOM_SECURITY_MANAGER = CustomSsoSecurityManager + + # OAuth configuration + OAUTH_PROVIDERS = [ + { + "name": "keycloak", + "icon": "fa-key", + "token_key": "access_token", + "remote_app": { + "client_id": "superset", + "client_secret": os.environ.get("OAUTH_CLIENT_SECRET"), + "server_metadata_url": f"https://{os.environ.get('KEYCLOAK_HOST')}/realms/{os.environ.get('KEYCLOAK_REALM')}/.well-known/openid-configuration", + "api_base_url": f"https://{os.environ.get('KEYCLOAK_HOST')}/realms/{os.environ.get('KEYCLOAK_REALM')}/", + "client_kwargs": { + "scope": "openid email profile" + }, + } + } + ] + + # Role mapping + AUTH_ROLES_MAPPING = { + "superset-admin": ["Admin"], + "Alpha": ["Alpha"], + "Gamma": ["Gamma"], + } + + # Sync roles at each login + AUTH_ROLES_SYNC_AT_LOGIN = True + + # Enable Trino database support + PREVENT_UNSAFE_DB_CONNECTIONS = False + + # Proxy configuration (for HTTPS behind Traefik) + ENABLE_PROXY_FIX = True + PREFERRED_URL_SCHEME = "https" + +# Bootstrap script for initial setup +# Note: Superset 5.0+ uses 'uv' instead of 'pip' for package management +bootstrapScript: | + #!/bin/bash + uv pip install psycopg2-binary sqlalchemy-trino authlib + if [ ! -f ~/bootstrap ]; then echo "Bootstrap complete" > ~/bootstrap; fi diff --git a/superset/superset_config.py.template b/superset/superset_config.py.template new file mode 100644 index 0000000..829f5ea --- /dev/null +++ b/superset/superset_config.py.template @@ -0,0 +1,66 @@ +import os +from flask_appbuilder.security.manager import AUTH_OAUTH +from superset.security import SupersetSecurityManager + + +class CustomSsoSecurityManager(SupersetSecurityManager): + def oauth_user_info(self, provider, response=None): + """Get user information from OAuth provider.""" + if provider == "keycloak": + me = self.appbuilder.sm.oauth_remotes[provider].get( + "protocol/openid-connect/userinfo" + ) + data = me.json() + return { + "username": data.get("preferred_username"), + "name": data.get("name"), + "email": data.get("email"), + "first_name": data.get("given_name", ""), + "last_name": data.get("family_name", ""), + "role_keys": data.get("groups", []), + } + return {} + + +# Authentication type +AUTH_TYPE = AUTH_OAUTH + +# Auto-registration for new users +AUTH_USER_REGISTRATION = True +AUTH_USER_REGISTRATION_ROLE = "Gamma" + +# Custom security manager +CUSTOM_SECURITY_MANAGER = CustomSsoSecurityManager + +# OAuth configuration +OAUTH_PROVIDERS = [ + { + "name": "keycloak", + "icon": "fa-key", + "token_key": "access_token", + "remote_app": { + "client_id": "superset", + "client_secret": os.environ.get("OAUTH_CLIENT_SECRET"), + "api_base_url": "https://{{ env.Getenv "KEYCLOAK_HOST" }}/realms/{{ env.Getenv "KEYCLOAK_REALM" }}/", + "client_kwargs": { + "scope": "openid email profile" + }, + "access_token_url": "https://{{ env.Getenv "KEYCLOAK_HOST" }}/realms/{{ env.Getenv "KEYCLOAK_REALM" }}/protocol/openid-connect/token", + "authorize_url": "https://{{ env.Getenv "KEYCLOAK_HOST" }}/realms/{{ env.Getenv "KEYCLOAK_REALM" }}/protocol/openid-connect/auth", + "request_token_url": None, + } + } +] + +# Role mapping +AUTH_ROLES_MAPPING = { + "superset-admin": ["Admin"], + "Alpha": ["Alpha"], + "Gamma": ["Gamma"], +} + +# Sync roles at each login +AUTH_ROLES_SYNC_AT_LOGIN = True + +# Enable Trino database support +PREVENT_UNSAFE_DB_CONNECTIONS = False