docs(trino): add Trino doc
This commit is contained in:
13
README.md
13
README.md
@@ -32,6 +32,7 @@ A remotely accessible Kubernetes home lab with OIDC authentication. Build a mode
|
||||
### Data & Analytics (Optional)
|
||||
|
||||
- **Interactive Computing**: [JupyterHub](https://jupyter.org/hub) for collaborative notebooks
|
||||
- **SQL Query Engine**: [Trino](https://trino.io/) for distributed SQL queries across multiple data sources
|
||||
- **Analytics Database**: [ClickHouse](https://clickhouse.com/) for high-performance analytics
|
||||
- **Vector Database**: [Qdrant](https://qdrant.tech/) for vector search and AI/ML applications
|
||||
- **Iceberg REST Catalog**: [Lakekeeper](https://lakekeeper.io/) for Apache Iceberg table management
|
||||
@@ -148,6 +149,17 @@ Business intelligence and data visualization platform with PostgreSQL integratio
|
||||
|
||||
[📖 See Metabase Documentation](./metabase/README.md)
|
||||
|
||||
### Trino
|
||||
|
||||
Fast distributed SQL query engine for big data analytics with:
|
||||
|
||||
- **Multi-Source Queries**: Query PostgreSQL, Iceberg, and other data sources in a single query
|
||||
- **Keycloak Authentication**: OAuth2 for Web UI and password authentication for JDBC clients
|
||||
- **Metabase Integration**: Connect via Starburst driver for data visualization
|
||||
- **Sample Data**: TPCH catalog with benchmark data for testing
|
||||
|
||||
[📖 See Trino Documentation](./trino/README.md)
|
||||
|
||||
### DataHub
|
||||
|
||||
Modern data catalog and metadata management platform with OIDC integration.
|
||||
@@ -283,6 +295,7 @@ kubectl --context yourpc-oidc get nodes
|
||||
# Web interfaces
|
||||
# Vault: https://vault.yourdomain.com
|
||||
# Keycloak: https://auth.yourdomain.com
|
||||
# Trino: https://trino.yourdomain.com
|
||||
# Metabase: https://metabase.yourdomain.com
|
||||
# Airflow: https://airflow.yourdomain.com
|
||||
# JupyterHub: https://jupyter.yourdomain.com
|
||||
|
||||
271
trino/README.md
Normal file
271
trino/README.md
Normal file
@@ -0,0 +1,271 @@
|
||||
# Trino
|
||||
|
||||
Fast distributed SQL query engine for big data analytics with Keycloak authentication.
|
||||
|
||||
## Overview
|
||||
|
||||
This module deploys Trino using the official Helm chart with:
|
||||
|
||||
- **Keycloak OAuth2 authentication** for Web UI access
|
||||
- **Password authentication** for JDBC clients (Metabase, etc.)
|
||||
- **PostgreSQL catalog** for querying PostgreSQL databases
|
||||
- **Iceberg catalog** with Lakekeeper (optional)
|
||||
- **TPCH catalog** with sample data for testing
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Kubernetes cluster (k3s)
|
||||
- Keycloak installed and configured
|
||||
- PostgreSQL cluster (CloudNativePG)
|
||||
- MinIO (optional, for Iceberg catalog)
|
||||
- External Secrets Operator (optional, for Vault integration)
|
||||
|
||||
## Installation
|
||||
|
||||
### Basic Installation
|
||||
|
||||
```bash
|
||||
just trino::install
|
||||
```
|
||||
|
||||
You will be prompted for:
|
||||
|
||||
1. **Trino host (FQDN)**: e.g., `trino.example.com`
|
||||
2. **PostgreSQL catalog setup**: Recommended for production use
|
||||
3. **MinIO storage setup**: Optional, for Iceberg/Hive catalogs
|
||||
|
||||
### What Gets Installed
|
||||
|
||||
- Trino coordinator (1 instance)
|
||||
- Trino workers (2 instances by default)
|
||||
- OAuth2 client in Keycloak
|
||||
- Password authentication for JDBC access
|
||||
- PostgreSQL catalog (if selected)
|
||||
- Iceberg catalog with Lakekeeper (if MinIO selected)
|
||||
- TPCH catalog with sample data
|
||||
|
||||
## Configuration
|
||||
|
||||
Environment variables (set in `.env.local` or override):
|
||||
|
||||
```bash
|
||||
TRINO_NAMESPACE=trino # Kubernetes namespace
|
||||
TRINO_CHART_VERSION=1.41.0 # Helm chart version
|
||||
TRINO_IMAGE_TAG=477 # Trino version
|
||||
TRINO_COORDINATOR_MEMORY=4Gi # Coordinator memory
|
||||
TRINO_COORDINATOR_CPU=2 # Coordinator CPU
|
||||
TRINO_WORKER_MEMORY=4Gi # Worker memory
|
||||
TRINO_WORKER_CPU=2 # Worker CPU
|
||||
TRINO_WORKER_COUNT=2 # Number of workers
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### Web UI Access
|
||||
|
||||
1. Navigate to `https://your-trino-host/`
|
||||
2. Click "Sign in" to authenticate with Keycloak
|
||||
3. Execute queries in the Web UI
|
||||
|
||||
### Get Admin Password
|
||||
|
||||
For JDBC/Metabase connections:
|
||||
|
||||
```bash
|
||||
just trino::admin-password
|
||||
```
|
||||
|
||||
Returns the password for username `admin`.
|
||||
|
||||
### Metabase Integration
|
||||
|
||||
**Important**: Trino requires TLS/SSL for password authentication. You must use the external hostname (not the internal Kubernetes service name).
|
||||
|
||||
1. In Metabase, go to Admin → Databases → Add database
|
||||
2. Select **Database type**: Starburst
|
||||
3. Configure connection:
|
||||
|
||||
```plain
|
||||
Host: your-trino-host (e.g., trino.example.com)
|
||||
Port: 443
|
||||
Username: admin
|
||||
Password: [from just trino::admin-password]
|
||||
Catalog: postgresql
|
||||
SSL: Yes
|
||||
```
|
||||
|
||||
**Note**: Do NOT use internal Kubernetes hostnames like `trino.trino.svc.cluster.local` as they do not have valid TLS certificates for password authentication.
|
||||
|
||||
### Example Queries
|
||||
|
||||
**Query TPCH sample data:**
|
||||
|
||||
```sql
|
||||
SELECT * FROM tpch.tiny.customer LIMIT 10;
|
||||
```
|
||||
|
||||
**Query PostgreSQL:**
|
||||
|
||||
```sql
|
||||
SELECT * FROM postgresql.public.pg_tables;
|
||||
```
|
||||
|
||||
**Show all catalogs:**
|
||||
|
||||
```sql
|
||||
SHOW CATALOGS;
|
||||
```
|
||||
|
||||
**Show schemas in a catalog:**
|
||||
|
||||
```sql
|
||||
SHOW SCHEMAS FROM postgresql;
|
||||
```
|
||||
|
||||
## Catalogs
|
||||
|
||||
### TPCH (Always Available)
|
||||
|
||||
Sample TPC-H benchmark data for testing:
|
||||
|
||||
- `tpch.tiny.*` - Small dataset
|
||||
- `tpch.sf1.*` - 1GB dataset
|
||||
|
||||
Tables: customer, orders, lineitem, part, supplier, nation, region
|
||||
|
||||
### PostgreSQL
|
||||
|
||||
Queries your CloudNativePG cluster:
|
||||
|
||||
- Catalog: `postgresql`
|
||||
- Default schema: `public`
|
||||
- Database: `trino`
|
||||
|
||||
### Iceberg (Optional)
|
||||
|
||||
Queries Iceberg tables via Lakekeeper:
|
||||
|
||||
- Catalog: `iceberg`
|
||||
- Storage: MinIO S3-compatible
|
||||
|
||||
## Management
|
||||
|
||||
### Upgrade Trino
|
||||
|
||||
```bash
|
||||
just trino::upgrade
|
||||
```
|
||||
|
||||
Updates the Helm deployment with current configuration.
|
||||
|
||||
### Uninstall
|
||||
|
||||
```bash
|
||||
# Keep PostgreSQL database
|
||||
just trino::uninstall false
|
||||
|
||||
# Delete PostgreSQL database too
|
||||
just trino::uninstall true
|
||||
```
|
||||
|
||||
### Cleanup All Resources
|
||||
|
||||
```bash
|
||||
just trino::cleanup
|
||||
```
|
||||
|
||||
Removes:
|
||||
|
||||
- PostgreSQL database
|
||||
- Vault secrets
|
||||
- Keycloak OAuth client
|
||||
|
||||
## Authentication
|
||||
|
||||
### Web UI (OAuth2)
|
||||
|
||||
- Uses Keycloak for authentication
|
||||
- Requires valid user in the configured realm
|
||||
- Automatic redirect to Keycloak login
|
||||
|
||||
### JDBC/Metabase (Password)
|
||||
|
||||
- Username: `admin`
|
||||
- Password: Retrieved via `just trino::admin-password`
|
||||
- Stored in Vault at `trino/password`
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
External Users
|
||||
↓
|
||||
Cloudflare Tunnel (HTTPS)
|
||||
↓
|
||||
Traefik Ingress
|
||||
↓
|
||||
Trino Coordinator (HTTP:8080)
|
||||
↓
|
||||
Trino Workers (HTTP:8080)
|
||||
↓
|
||||
Data Sources:
|
||||
- PostgreSQL (CloudNativePG)
|
||||
- MinIO (S3)
|
||||
- Iceberg (Lakekeeper)
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Check Pod Status
|
||||
|
||||
```bash
|
||||
kubectl get pods -n trino
|
||||
```
|
||||
|
||||
### View Coordinator Logs
|
||||
|
||||
```bash
|
||||
kubectl logs -n trino -l app.kubernetes.io/component=coordinator --tail=100
|
||||
```
|
||||
|
||||
### View Worker Logs
|
||||
|
||||
```bash
|
||||
kubectl logs -n trino -l app.kubernetes.io/component=worker --tail=100
|
||||
```
|
||||
|
||||
### Test Authentication
|
||||
|
||||
```bash
|
||||
# From inside coordinator pod
|
||||
kubectl exec -n trino deployment/trino-coordinator -- \
|
||||
curl -u admin:PASSWORD http://localhost:8080/v1/info
|
||||
```
|
||||
|
||||
### Common Issues
|
||||
|
||||
#### Metabase Sync Fails
|
||||
|
||||
- Ensure catalog is specified in connection settings (e.g., `postgresql`)
|
||||
- Check Trino coordinator logs for errors
|
||||
- Verify PostgreSQL/Iceberg connectivity
|
||||
|
||||
#### OAuth2 Login Fails
|
||||
|
||||
- Verify Keycloak OAuth client exists: `just keycloak::list-clients`
|
||||
- Check redirect URL matches Trino host
|
||||
- Ensure Keycloak is accessible from Trino pods
|
||||
|
||||
#### Password Authentication Fails
|
||||
|
||||
- Retrieve current password: `just trino::admin-password`
|
||||
- Ensure SSL/TLS is enabled in JDBC URL
|
||||
- For internal testing, HTTP is supported via `http-server.authentication.allow-insecure-over-http=true`
|
||||
|
||||
## References
|
||||
|
||||
- [Trino Documentation](https://trino.io/docs/current/)
|
||||
- [Trino Helm Chart](https://github.com/trinodb/charts)
|
||||
- [OAuth2 Authentication](https://trino.io/docs/current/security/oauth2.html)
|
||||
- [Password Authentication](https://trino.io/docs/current/security/password-file.html)
|
||||
- [PostgreSQL Connector](https://trino.io/docs/current/connector/postgresql.html)
|
||||
- [Iceberg Connector](https://trino.io/docs/current/connector/iceberg.html)
|
||||
Reference in New Issue
Block a user