From f6093ea386f21876de75841cae68f97e27fb04ee Mon Sep 17 00:00:00 2001 From: Masaki Yatsu Date: Wed, 15 Oct 2025 20:42:51 +0900 Subject: [PATCH] docs(trino): add Trino doc --- README.md | 13 +++ trino/README.md | 271 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 284 insertions(+) create mode 100644 trino/README.md diff --git a/README.md b/README.md index 71950c2..aaf1593 100644 --- a/README.md +++ b/README.md @@ -32,6 +32,7 @@ A remotely accessible Kubernetes home lab with OIDC authentication. Build a mode ### Data & Analytics (Optional) - **Interactive Computing**: [JupyterHub](https://jupyter.org/hub) for collaborative notebooks +- **SQL Query Engine**: [Trino](https://trino.io/) for distributed SQL queries across multiple data sources - **Analytics Database**: [ClickHouse](https://clickhouse.com/) for high-performance analytics - **Vector Database**: [Qdrant](https://qdrant.tech/) for vector search and AI/ML applications - **Iceberg REST Catalog**: [Lakekeeper](https://lakekeeper.io/) for Apache Iceberg table management @@ -148,6 +149,17 @@ Business intelligence and data visualization platform with PostgreSQL integratio [📖 See Metabase Documentation](./metabase/README.md) +### Trino + +Fast distributed SQL query engine for big data analytics with: + +- **Multi-Source Queries**: Query PostgreSQL, Iceberg, and other data sources in a single query +- **Keycloak Authentication**: OAuth2 for Web UI and password authentication for JDBC clients +- **Metabase Integration**: Connect via Starburst driver for data visualization +- **Sample Data**: TPCH catalog with benchmark data for testing + +[📖 See Trino Documentation](./trino/README.md) + ### DataHub Modern data catalog and metadata management platform with OIDC integration. @@ -283,6 +295,7 @@ kubectl --context yourpc-oidc get nodes # Web interfaces # Vault: https://vault.yourdomain.com # Keycloak: https://auth.yourdomain.com +# Trino: https://trino.yourdomain.com # Metabase: https://metabase.yourdomain.com # Airflow: https://airflow.yourdomain.com # JupyterHub: https://jupyter.yourdomain.com diff --git a/trino/README.md b/trino/README.md new file mode 100644 index 0000000..7163525 --- /dev/null +++ b/trino/README.md @@ -0,0 +1,271 @@ +# Trino + +Fast distributed SQL query engine for big data analytics with Keycloak authentication. + +## Overview + +This module deploys Trino using the official Helm chart with: + +- **Keycloak OAuth2 authentication** for Web UI access +- **Password authentication** for JDBC clients (Metabase, etc.) +- **PostgreSQL catalog** for querying PostgreSQL databases +- **Iceberg catalog** with Lakekeeper (optional) +- **TPCH catalog** with sample data for testing + +## Prerequisites + +- Kubernetes cluster (k3s) +- Keycloak installed and configured +- PostgreSQL cluster (CloudNativePG) +- MinIO (optional, for Iceberg catalog) +- External Secrets Operator (optional, for Vault integration) + +## Installation + +### Basic Installation + +```bash +just trino::install +``` + +You will be prompted for: + +1. **Trino host (FQDN)**: e.g., `trino.example.com` +2. **PostgreSQL catalog setup**: Recommended for production use +3. **MinIO storage setup**: Optional, for Iceberg/Hive catalogs + +### What Gets Installed + +- Trino coordinator (1 instance) +- Trino workers (2 instances by default) +- OAuth2 client in Keycloak +- Password authentication for JDBC access +- PostgreSQL catalog (if selected) +- Iceberg catalog with Lakekeeper (if MinIO selected) +- TPCH catalog with sample data + +## Configuration + +Environment variables (set in `.env.local` or override): + +```bash +TRINO_NAMESPACE=trino # Kubernetes namespace +TRINO_CHART_VERSION=1.41.0 # Helm chart version +TRINO_IMAGE_TAG=477 # Trino version +TRINO_COORDINATOR_MEMORY=4Gi # Coordinator memory +TRINO_COORDINATOR_CPU=2 # Coordinator CPU +TRINO_WORKER_MEMORY=4Gi # Worker memory +TRINO_WORKER_CPU=2 # Worker CPU +TRINO_WORKER_COUNT=2 # Number of workers +``` + +## Usage + +### Web UI Access + +1. Navigate to `https://your-trino-host/` +2. Click "Sign in" to authenticate with Keycloak +3. Execute queries in the Web UI + +### Get Admin Password + +For JDBC/Metabase connections: + +```bash +just trino::admin-password +``` + +Returns the password for username `admin`. + +### Metabase Integration + +**Important**: Trino requires TLS/SSL for password authentication. You must use the external hostname (not the internal Kubernetes service name). + +1. In Metabase, go to Admin → Databases → Add database +2. Select **Database type**: Starburst +3. Configure connection: + + ```plain + Host: your-trino-host (e.g., trino.example.com) + Port: 443 + Username: admin + Password: [from just trino::admin-password] + Catalog: postgresql + SSL: Yes + ``` + +**Note**: Do NOT use internal Kubernetes hostnames like `trino.trino.svc.cluster.local` as they do not have valid TLS certificates for password authentication. + +### Example Queries + +**Query TPCH sample data:** + +```sql +SELECT * FROM tpch.tiny.customer LIMIT 10; +``` + +**Query PostgreSQL:** + +```sql +SELECT * FROM postgresql.public.pg_tables; +``` + +**Show all catalogs:** + +```sql +SHOW CATALOGS; +``` + +**Show schemas in a catalog:** + +```sql +SHOW SCHEMAS FROM postgresql; +``` + +## Catalogs + +### TPCH (Always Available) + +Sample TPC-H benchmark data for testing: + +- `tpch.tiny.*` - Small dataset +- `tpch.sf1.*` - 1GB dataset + +Tables: customer, orders, lineitem, part, supplier, nation, region + +### PostgreSQL + +Queries your CloudNativePG cluster: + +- Catalog: `postgresql` +- Default schema: `public` +- Database: `trino` + +### Iceberg (Optional) + +Queries Iceberg tables via Lakekeeper: + +- Catalog: `iceberg` +- Storage: MinIO S3-compatible + +## Management + +### Upgrade Trino + +```bash +just trino::upgrade +``` + +Updates the Helm deployment with current configuration. + +### Uninstall + +```bash +# Keep PostgreSQL database +just trino::uninstall false + +# Delete PostgreSQL database too +just trino::uninstall true +``` + +### Cleanup All Resources + +```bash +just trino::cleanup +``` + +Removes: + +- PostgreSQL database +- Vault secrets +- Keycloak OAuth client + +## Authentication + +### Web UI (OAuth2) + +- Uses Keycloak for authentication +- Requires valid user in the configured realm +- Automatic redirect to Keycloak login + +### JDBC/Metabase (Password) + +- Username: `admin` +- Password: Retrieved via `just trino::admin-password` +- Stored in Vault at `trino/password` + +## Architecture + +``` +External Users + ↓ +Cloudflare Tunnel (HTTPS) + ↓ +Traefik Ingress + ↓ +Trino Coordinator (HTTP:8080) + ↓ +Trino Workers (HTTP:8080) + ↓ +Data Sources: + - PostgreSQL (CloudNativePG) + - MinIO (S3) + - Iceberg (Lakekeeper) +``` + +## Troubleshooting + +### Check Pod Status + +```bash +kubectl get pods -n trino +``` + +### View Coordinator Logs + +```bash +kubectl logs -n trino -l app.kubernetes.io/component=coordinator --tail=100 +``` + +### View Worker Logs + +```bash +kubectl logs -n trino -l app.kubernetes.io/component=worker --tail=100 +``` + +### Test Authentication + +```bash +# From inside coordinator pod +kubectl exec -n trino deployment/trino-coordinator -- \ + curl -u admin:PASSWORD http://localhost:8080/v1/info +``` + +### Common Issues + +#### Metabase Sync Fails + +- Ensure catalog is specified in connection settings (e.g., `postgresql`) +- Check Trino coordinator logs for errors +- Verify PostgreSQL/Iceberg connectivity + +#### OAuth2 Login Fails + +- Verify Keycloak OAuth client exists: `just keycloak::list-clients` +- Check redirect URL matches Trino host +- Ensure Keycloak is accessible from Trino pods + +#### Password Authentication Fails + +- Retrieve current password: `just trino::admin-password` +- Ensure SSL/TLS is enabled in JDBC URL +- For internal testing, HTTP is supported via `http-server.authentication.allow-insecure-over-http=true` + +## References + +- [Trino Documentation](https://trino.io/docs/current/) +- [Trino Helm Chart](https://github.com/trinodb/charts) +- [OAuth2 Authentication](https://trino.io/docs/current/security/oauth2.html) +- [Password Authentication](https://trino.io/docs/current/security/password-file.html) +- [PostgreSQL Connector](https://trino.io/docs/current/connector/postgresql.html) +- [Iceberg Connector](https://trino.io/docs/current/connector/iceberg.html)