348 lines
8.4 KiB
Markdown
348 lines
8.4 KiB
Markdown
# Trino
|
|
|
|
Fast distributed SQL query engine for big data analytics with Keycloak authentication.
|
|
|
|
## Overview
|
|
|
|
This module deploys Trino using the official Helm chart with:
|
|
|
|
- **Keycloak OAuth2 authentication** for Web UI access
|
|
- **Password authentication** for JDBC clients (Metabase, etc.)
|
|
- **PostgreSQL catalog** for querying PostgreSQL databases
|
|
- **Iceberg catalog** with Lakekeeper (optional)
|
|
- **TPCH catalog** with sample data for testing
|
|
|
|
## Prerequisites
|
|
|
|
- Kubernetes cluster (k3s)
|
|
- Keycloak installed and configured
|
|
- PostgreSQL cluster (CloudNativePG)
|
|
- MinIO (optional, for Iceberg catalog)
|
|
- External Secrets Operator (optional, for Vault integration)
|
|
|
|
## Installation
|
|
|
|
### Basic Installation
|
|
|
|
```bash
|
|
just trino::install
|
|
```
|
|
|
|
You will be prompted for:
|
|
|
|
1. **Trino host (FQDN)**: e.g., `trino.example.com`
|
|
2. **PostgreSQL catalog setup**: Recommended for production use
|
|
3. **MinIO storage setup**: Optional, for Iceberg/Hive catalogs
|
|
|
|
### What Gets Installed
|
|
|
|
- Trino coordinator (1 instance)
|
|
- Trino workers (2 instances by default)
|
|
- OAuth2 client in Keycloak
|
|
- Password authentication for JDBC access
|
|
- PostgreSQL catalog (if selected)
|
|
- Iceberg catalog with Lakekeeper (if MinIO selected)
|
|
- Keycloak service account enabled for OAuth2 client credentials flow
|
|
- `lakekeeper` client scope added
|
|
- `lakekeeper` audience mapper configured
|
|
- TPCH catalog with sample data
|
|
|
|
## Configuration
|
|
|
|
Environment variables (set in `.env.local` or override):
|
|
|
|
```bash
|
|
TRINO_NAMESPACE=trino # Kubernetes namespace
|
|
TRINO_CHART_VERSION=1.41.0 # Helm chart version
|
|
TRINO_IMAGE_TAG=477 # Trino version
|
|
TRINO_COORDINATOR_MEMORY=4Gi # Coordinator memory
|
|
TRINO_COORDINATOR_CPU=2 # Coordinator CPU
|
|
TRINO_WORKER_MEMORY=4Gi # Worker memory
|
|
TRINO_WORKER_CPU=2 # Worker CPU
|
|
TRINO_WORKER_COUNT=2 # Number of workers
|
|
```
|
|
|
|
## Usage
|
|
|
|
### Web UI Access
|
|
|
|
1. Navigate to `https://your-trino-host/`
|
|
2. Click "Sign in" to authenticate with Keycloak
|
|
3. Execute queries in the Web UI
|
|
|
|
### Get Admin Password
|
|
|
|
For JDBC/Metabase connections:
|
|
|
|
```bash
|
|
just trino::admin-password
|
|
```
|
|
|
|
Returns the password for username `admin`.
|
|
|
|
### Metabase Integration
|
|
|
|
**Important**: Trino requires TLS/SSL for password authentication. You must use the external hostname (not the internal Kubernetes service name).
|
|
|
|
1. In Metabase, go to Admin → Databases → Add database
|
|
2. Select **Database type**: Starburst
|
|
3. Configure connection:
|
|
|
|
```plain
|
|
Host: your-trino-host (e.g., trino.example.com)
|
|
Port: 443
|
|
Username: admin
|
|
Password: [from just trino::admin-password]
|
|
Catalog: postgresql (or iceberg for Iceberg tables)
|
|
SSL: Yes
|
|
```
|
|
|
|
**Catalog Selection**:
|
|
|
|
- Use `postgresql` to query PostgreSQL database tables
|
|
- Use `iceberg` to query Iceberg tables via Lakekeeper
|
|
- You can create multiple Metabase connections, one for each catalog
|
|
|
|
**Note**: Do NOT use internal Kubernetes hostnames like `trino.trino.svc.cluster.local` as they do not have valid TLS certificates for password authentication.
|
|
|
|
### Example Queries
|
|
|
|
**Query TPCH sample data:**
|
|
|
|
```sql
|
|
SELECT * FROM tpch.tiny.customer LIMIT 10;
|
|
```
|
|
|
|
**Query PostgreSQL:**
|
|
|
|
```sql
|
|
SELECT * FROM postgresql.public.pg_tables;
|
|
```
|
|
|
|
**Query Iceberg tables:**
|
|
|
|
```sql
|
|
-- Show schemas in Iceberg catalog
|
|
SHOW SCHEMAS FROM iceberg;
|
|
|
|
-- Show tables in a namespace
|
|
SHOW TABLES FROM iceberg.ecommerce;
|
|
|
|
-- Query Iceberg table
|
|
SELECT * FROM iceberg.ecommerce.products LIMIT 10;
|
|
```
|
|
|
|
**Show all catalogs:**
|
|
|
|
```sql
|
|
SHOW CATALOGS;
|
|
```
|
|
|
|
**Show schemas in a catalog:**
|
|
|
|
```sql
|
|
SHOW SCHEMAS FROM postgresql;
|
|
SHOW SCHEMAS FROM iceberg;
|
|
```
|
|
|
|
## Catalogs
|
|
|
|
### TPCH (Always Available)
|
|
|
|
Sample TPC-H benchmark data for testing:
|
|
|
|
- `tpch.tiny.*` - Small dataset
|
|
- `tpch.sf1.*` - 1GB dataset
|
|
|
|
Tables: customer, orders, lineitem, part, supplier, nation, region
|
|
|
|
### PostgreSQL
|
|
|
|
Queries your CloudNativePG cluster:
|
|
|
|
- Catalog: `postgresql`
|
|
- Default schema: `public`
|
|
- Database: `trino`
|
|
|
|
### Iceberg (Optional)
|
|
|
|
Queries Iceberg tables via Lakekeeper REST Catalog:
|
|
|
|
- **Catalog**: `iceberg`
|
|
- **Storage**: MinIO S3-compatible object storage
|
|
- **REST Catalog**: Lakekeeper (Apache Iceberg REST Catalog implementation)
|
|
- **Authentication**: OAuth2 client credentials flow with Keycloak
|
|
|
|
**How It Works**:
|
|
|
|
1. Trino authenticates to Lakekeeper using OAuth2 (client credentials flow)
|
|
2. Lakekeeper provides Iceberg table metadata from its catalog
|
|
3. Trino reads actual data files directly from MinIO using static S3 credentials
|
|
4. Vended credentials are disabled; Trino uses pre-configured MinIO access keys
|
|
|
|
**Configuration**:
|
|
|
|
The following settings are automatically configured during installation when MinIO storage is enabled:
|
|
|
|
- Service account enabled on Trino Keycloak client
|
|
- `lakekeeper` client scope added to Trino client
|
|
- Audience mapper configured to include `aud: lakekeeper` in JWT tokens
|
|
- S3 file system factory enabled (`fs.native-s3.enabled=true`)
|
|
- Static MinIO credentials provided via Kubernetes secrets
|
|
|
|
**Example Usage**:
|
|
|
|
```sql
|
|
-- List all namespaces (schemas)
|
|
SHOW SCHEMAS FROM iceberg;
|
|
|
|
-- Create a namespace
|
|
CREATE SCHEMA iceberg.analytics;
|
|
|
|
-- List tables in a namespace
|
|
SHOW TABLES FROM iceberg.ecommerce;
|
|
|
|
-- Query table
|
|
SELECT * FROM iceberg.ecommerce.products LIMIT 10;
|
|
|
|
-- Create table
|
|
CREATE TABLE iceberg.analytics.sales (
|
|
date DATE,
|
|
product VARCHAR,
|
|
amount DECIMAL(10,2)
|
|
);
|
|
```
|
|
|
|
## Management
|
|
|
|
### Upgrade Trino
|
|
|
|
```bash
|
|
just trino::upgrade
|
|
```
|
|
|
|
Updates the Helm deployment with current configuration.
|
|
|
|
### Uninstall
|
|
|
|
```bash
|
|
# Keep PostgreSQL database
|
|
just trino::uninstall false
|
|
|
|
# Delete PostgreSQL database too
|
|
just trino::uninstall true
|
|
```
|
|
|
|
### Cleanup All Resources
|
|
|
|
```bash
|
|
just trino::cleanup
|
|
```
|
|
|
|
Removes:
|
|
|
|
- PostgreSQL database
|
|
- Vault secrets
|
|
- Keycloak OAuth client
|
|
|
|
## Authentication
|
|
|
|
### Web UI (OAuth2)
|
|
|
|
- Uses Keycloak for authentication
|
|
- Requires valid user in the configured realm
|
|
- Automatic redirect to Keycloak login
|
|
|
|
### JDBC/Metabase (Password)
|
|
|
|
- Username: `admin`
|
|
- Password: Retrieved via `just trino::admin-password`
|
|
- Stored in Vault at `trino/password`
|
|
|
|
## Architecture
|
|
|
|
```
|
|
External Users
|
|
↓
|
|
Cloudflare Tunnel (HTTPS)
|
|
↓
|
|
Traefik Ingress
|
|
↓
|
|
Trino Coordinator (HTTP:8080)
|
|
├─ OAuth2 → Keycloak (Web UI auth)
|
|
└─ Password file (JDBC auth)
|
|
↓
|
|
Trino Workers (HTTP:8080)
|
|
↓
|
|
Data Sources:
|
|
- PostgreSQL (CloudNativePG)
|
|
└─ Direct SQL connection
|
|
|
|
- Iceberg Tables
|
|
├─ Metadata: Lakekeeper (REST Catalog)
|
|
│ └─ OAuth2 → Keycloak (client credentials)
|
|
└─ Data: MinIO (S3)
|
|
└─ Static credentials
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Check Pod Status
|
|
|
|
```bash
|
|
kubectl get pods -n trino
|
|
```
|
|
|
|
### View Coordinator Logs
|
|
|
|
```bash
|
|
kubectl logs -n trino -l app.kubernetes.io/component=coordinator --tail=100
|
|
```
|
|
|
|
### View Worker Logs
|
|
|
|
```bash
|
|
kubectl logs -n trino -l app.kubernetes.io/component=worker --tail=100
|
|
```
|
|
|
|
### Test Authentication
|
|
|
|
```bash
|
|
# From inside coordinator pod
|
|
kubectl exec -n trino deployment/trino-coordinator -- \
|
|
curl -u admin:PASSWORD http://localhost:8080/v1/info
|
|
```
|
|
|
|
### Common Issues
|
|
|
|
#### Metabase Sync Fails
|
|
|
|
- Ensure catalog is specified in connection settings (e.g., `postgresql` or `iceberg`)
|
|
- For Iceberg catalog, verify Lakekeeper is running: `kubectl get pods -n lakekeeper`
|
|
- Check Trino coordinator logs for errors
|
|
- Verify PostgreSQL/Iceberg connectivity
|
|
- For Iceberg issues, check OAuth2 token: Service account should be enabled on Trino client
|
|
|
|
#### OAuth2 Login Fails
|
|
|
|
- Verify Keycloak OAuth client exists: `just keycloak::list-clients`
|
|
- Check redirect URL matches Trino host
|
|
- Ensure Keycloak is accessible from Trino pods
|
|
|
|
#### Password Authentication Fails
|
|
|
|
- Retrieve current password: `just trino::admin-password`
|
|
- Ensure SSL/TLS is enabled in JDBC URL
|
|
- For internal testing, HTTP is supported via `http-server.authentication.allow-insecure-over-http=true`
|
|
|
|
## References
|
|
|
|
- [Trino Documentation](https://trino.io/docs/current/)
|
|
- [Trino Helm Chart](https://github.com/trinodb/charts)
|
|
- [OAuth2 Authentication](https://trino.io/docs/current/security/oauth2.html)
|
|
- [Password Authentication](https://trino.io/docs/current/security/password-file.html)
|
|
- [PostgreSQL Connector](https://trino.io/docs/current/connector/postgresql.html)
|
|
- [Iceberg Connector](https://trino.io/docs/current/connector/iceberg.html)
|
|
- [Lakekeeper (Iceberg REST Catalog)](https://lakekeeper.io/)
|
|
- [Apache Iceberg](https://iceberg.apache.org/)
|