fix(trino): fix auth and ingress

This commit is contained in:
Masaki Yatsu
2025-10-18 11:46:46 +09:00
parent 18c8ebf789
commit 26ad27d943
5 changed files with 164 additions and 11 deletions

View File

@@ -7,10 +7,12 @@ Fast distributed SQL query engine for big data analytics with Keycloak authentic
This module deploys Trino using the official Helm chart with:
- **Keycloak OAuth2 authentication** for Web UI access
- **Password authentication** for JDBC clients (Metabase, etc.)
- **Password authentication** for JDBC clients (Metabase, Querybook, etc.)
- **Access control with user impersonation** for multi-user query attribution
- **PostgreSQL catalog** for querying PostgreSQL databases
- **Iceberg catalog** with Lakekeeper (optional)
- **TPCH catalog** with sample data for testing
- **Traefik integration** with proper header forwarding
## Prerequisites
@@ -39,7 +41,9 @@ You will be prompted for:
- Trino coordinator (1 instance)
- Trino workers (2 instances by default)
- OAuth2 client in Keycloak
- Password authentication for JDBC access
- Password authentication for JDBC/Querybook access
- Traefik Middleware for X-Forwarded-* header injection
- Access control with impersonation rules (admin can impersonate any user)
- PostgreSQL catalog (if selected)
- Iceberg catalog with Lakekeeper (if MinIO selected)
- Keycloak service account enabled for OAuth2 client credentials flow
@@ -47,6 +51,8 @@ You will be prompted for:
- `lakekeeper` audience mapper configured
- TPCH catalog with sample data
**Note**: Trino runs HTTP-only internally. HTTPS is provided by Traefik Ingress, which handles TLS termination.
## Configuration
Environment variables (set in `.env.local` or override):
@@ -70,6 +76,8 @@ TRINO_WORKER_COUNT=2 # Number of workers
2. Click "Sign in" to authenticate with Keycloak
3. Execute queries in the Web UI
**Note**: OAuth2 authentication works over HTTP internally. Traefik Ingress provides HTTPS to external users, and Trino processes X-Forwarded headers to generate correct HTTPS redirect URLs for OAuth2 flow.
### Get Admin Password
For JDBC/Metabase connections:
@@ -82,7 +90,7 @@ Returns the password for username `admin`.
### Metabase Integration
**Important**: Trino requires TLS/SSL for password authentication. You must use the external hostname (not the internal Kubernetes service name).
**Important**: The Python Trino client (used by Metabase) requires HTTPS when using authentication. You must use the external hostname which has TLS provided by Traefik Ingress.
1. In Metabase, go to Admin → Databases → Add database
2. Select **Database type**: Starburst
@@ -103,7 +111,52 @@ Returns the password for username `admin`.
- Use `iceberg` to query Iceberg tables via Lakekeeper
- You can create multiple Metabase connections, one for each catalog
**Note**: Do NOT use internal Kubernetes hostnames like `trino.trino.svc.cluster.local` as they do not have valid TLS certificates for password authentication.
**Note**: Do NOT use internal Kubernetes hostnames like `trino.trino.svc.cluster.local:8080`. Internal services do not have TLS, and the Python Trino client enforces HTTPS when authentication is used. Always use the external hostname with port 443.
### Querybook Integration
**Connection Configuration**:
1. In Querybook, create a new Environment and Query Engine
2. Configure the Trino connection:
```plain
Connection String: trino://your-trino-host:443?SSL=true
Username: admin
Password: [from just trino::admin-password]
Catalog: postgresql (or iceberg for Iceberg tables)
```
3. Optional: Configure `Proxy_user_id` to enable user impersonation
**User Impersonation**:
Trino is configured with file-based access control that allows the `admin` user to impersonate any user. This enables:
- Querybook to connect as `admin` but execute queries as the logged-in Querybook user
- Proper query attribution and audit logging
- User-specific access control (when configured)
The impersonation rules are defined in `trino-values.gomplate.yaml`:
```json
{
"catalogs": [{"allow": "all"}],
"impersonation": [
{
"original_user": "admin",
"new_user": ".*"
}
]
}
```
**Why External Hostname is Required**:
- The Python Trino client enforces HTTPS when authentication is used (client-side requirement)
- Trino runs HTTP-only internally; TLS is provided by Traefik Ingress
- Internal service names (e.g., `trino.trino.svc.cluster.local:8080`) do not have TLS termination
- Therefore, you must use the external hostname (e.g., `trino.example.com:443`) which has TLS from Traefik
### Example Queries
@@ -253,24 +306,45 @@ Removes:
- Requires valid user in the configured realm
- Automatic redirect to Keycloak login
### JDBC/Metabase (Password)
### JDBC/Metabase/Querybook (Password)
- Username: `admin`
- Password: Retrieved via `just trino::admin-password`
- Stored in Vault at `trino/password`
- Requires external hostname with SSL/TLS
### Access Control
Trino uses file-based system access control with the following configuration:
**Catalogs**: All users can access all catalogs
**Impersonation**: The `admin` user can impersonate any user
This configuration enables:
- **Querybook Integration**: Admin user connects and executes queries as logged-in users
- **Audit Logging**: Queries are attributed to the actual user, not the admin account
- **Future Access Control**: Can be extended to add user-specific catalog/schema restrictions
The access control rules are defined in `/etc/trino/access-control/rules.json` (automatically generated from Helm values).
## Architecture
```
External Users
External Users / Querybook
Cloudflare Tunnel (HTTPS)
Traefik Ingress
Traefik Ingress (HTTPS → HTTP)
├─ TLS Termination
├─ Traefik Middleware (X-Forwarded-* headers)
└─ Backend: HTTP (port 8080)
Trino Coordinator (HTTP:8080)
├─ OAuth2 → Keycloak (Web UI auth)
─ Password file (JDBC auth)
─ Password file (JDBC/Querybook auth)
└─ Access Control (file-based, impersonation rules)
Trino Workers (HTTP:8080)
@@ -285,6 +359,14 @@ Data Sources:
└─ Static credentials
```
**Key Components**:
- **TLS Termination**: Traefik Ingress handles HTTPS, Trino runs HTTP-only internally
- **Traefik Middleware**: Injects X-Forwarded-Proto, X-Forwarded-Host, X-Forwarded-Port headers for correct URL generation
- **Single Port Configuration**: HTTP (8080) for all communication (Ingress backend, internal clients, Querybook)
- **Access Control**: File-based system with impersonation rules allowing `admin` to impersonate any user
- **Querybook Integration**: Connects via external HTTPS hostname (Traefik provides TLS), uses admin credentials with user impersonation
## Troubleshooting
### Check Pod Status
@@ -315,6 +397,26 @@ kubectl exec -n trino deployment/trino-coordinator -- \
### Common Issues
#### Querybook Connection Fails
- **Error: "cannot use authentication with HTTP"**
- Python Trino client requires HTTPS when using authentication (client-side enforcement)
- Trino runs HTTP-only internally; TLS is provided by Traefik Ingress at the external hostname
- Solution: Use external hostname with SSL: `trino://your-trino-host:443?SSL=true`
- Do NOT use internal service names (e.g., `trino.trino.svc.cluster.local:8080`) as they lack TLS
- **Error: "Access Denied: User admin cannot impersonate user X"**
- Access control rules may not be properly configured
- Verify rules exist: `kubectl exec -n trino deployment/trino-coordinator -- cat /etc/trino/access-control/rules.json`
- Check for impersonation section in rules
- **Error: "500 Internal Server Error"**
- Check Traefik middleware exists: `kubectl get middleware trino-headers -n trino`
- Verify Ingress annotation references correct middleware: `trino-trino-headers@kubernetescrd`
- Check Traefik logs: `kubectl logs -n kube-system -l app.kubernetes.io/name=traefik`
- Verify Ingress backend port is 8080: `kubectl get ingress trino-coordinator -n trino -o yaml | grep "number:"`
- If Ingress shows port 8443, ensure `server.config.https.enabled: false` in values (Helm chart v1.41.0 auto-selects HTTPS port when enabled)
#### Metabase Sync Fails
- Ensure catalog is specified in connection settings (e.g., `postgresql` or `iceberg`)
@@ -332,8 +434,15 @@ kubectl exec -n trino deployment/trino-coordinator -- \
#### Password Authentication Fails
- Retrieve current password: `just trino::admin-password`
- Ensure SSL/TLS is enabled in JDBC URL
- For internal testing, HTTP is supported via `http-server.authentication.allow-insecure-over-http=true`
- Ensure SSL/TLS is enabled in JDBC URL (use external hostname with port 443)
- Python Trino client requires HTTPS for authentication (client-side enforcement)
- Trino coordinator allows password auth over HTTP via `http-server.authentication.allow-insecure-over-http=true`, but clients may refuse to connect without TLS
#### URLs Contain "localhost" Instead of Actual Hostname
- Traefik middleware is not injecting X-Forwarded headers
- Verify middleware exists and is referenced in Ingress annotations
- Check `http-server.process-forwarded=true` is set in Trino configuration
## References