# Langfuse Open source LLM observability and analytics platform with Keycloak OIDC authentication. ## Overview This module deploys Langfuse using the official Helm chart with: - **Keycloak OIDC authentication** for user login - **PostgreSQL backend** for application data - **ClickHouse database** for analytics and traces - **Redis (Valkey)** for caching and queues - **MinIO/S3 storage** for event uploads and batch exports - **Traefik ingress** for HTTPS access - **External Secrets Operator integration** for secure credential management ## Prerequisites - Kubernetes cluster (k3s) - Keycloak installed and configured - PostgreSQL cluster (CloudNativePG) - ClickHouse cluster - MinIO object storage - External Secrets Operator (optional, for Vault integration) ## Installation ### Basic Installation ```bash just langfuse::install ``` You will be prompted for: - **Langfuse host (FQDN)**: e.g., `langfuse.example.com` ### What Gets Installed - Langfuse web application (1 replica) - Langfuse worker (background job processor) - Redis (Valkey) for caching and queues - PostgreSQL database `langfuse` with dedicated user - ClickHouse database `langfuse` with dedicated user - MinIO bucket `langfuse` for storage - Keycloak OAuth client (confidential client) - Keycloak user `langfuse` for system access - Vault secrets (if External Secrets Operator is available) ## Configuration Environment variables (set in `.env.local` or override): ```bash LANGFUSE_NAMESPACE=langfuse # Kubernetes namespace LANGFUSE_CHART_VERSION= # Helm chart version LANGFUSE_HOST=langfuse.example.com # External hostname LANGFUSE_OIDC_CLIENT_ID=langfuse # Keycloak client ID ``` ### Architecture Notes **Langfuse**: - Next.js application with FastAPI backend - Redis/Valkey for session management and job queues - ClickHouse for analytics queries - PostgreSQL for application metadata - S3-compatible storage for file uploads **Authentication Flow**: - OIDC via Keycloak with Authorization Code flow - Username/password authentication disabled (`AUTH_DISABLE_USERNAME_PASSWORD=true`) - Account linking enabled (`AUTH_KEYCLOAK_ALLOW_ACCOUNT_LINKING=true`) - New users automatically provisioned on first SSO login - Sign-up disabled for anonymous users **Database Structure**: - `langfuse` PostgreSQL database: Application data, experiments, projects - `langfuse` ClickHouse database: Traces, observations, scores for analytics - Redis: Session storage, job queues, caching ## Usage ### Access Langfuse 1. Navigate to `https://your-langfuse-host/` 2. Click "Keycloak" button to authenticate via SSO 3. On first login, your account will be automatically created 4. Access the dashboard and start tracking LLM applications ### Create API Keys 1. Log in to Langfuse UI 2. Navigate to **Settings** → **API Keys** 3. Click **Create new API key** 4. Copy the public and secret keys 5. Use these keys in your LLM applications ## Architecture ```plain External Users ↓ Cloudflare Tunnel (HTTPS) ↓ Traefik Ingress (HTTPS) ↓ Langfuse Web (HTTP inside cluster) ├─ Next.js ├─ OAuth → Keycloak (authentication) ├─ PostgreSQL (metadata) ├─ ClickHouse (analytics) ├─ Redis/Valkey (cache & queues) └─ MinIO (file storage) ↓ Langfuse Worker (background jobs) ├─ Job queues (Redis) ├─ Data processing └─ Analytics aggregation ``` **Key Components**: - **Web UI**: Next.js application for dashboard and API - **Worker**: Background job processor for async tasks - **Redis**: Session management, job queues, caching - **PostgreSQL**: Application data (projects, users, API keys) - **ClickHouse**: Analytics data (traces, observations, scores) - **MinIO**: S3-compatible storage for event uploads and batch exports ## Authentication ### User Login (OIDC) - Users authenticate via Keycloak - Standard OIDC flow with Authorization Code grant - Users automatically created on first login - Username/password authentication is disabled - Account linking enabled for users with same email ### API Authentication - Public/Secret key pairs for programmatic access - API keys are created per user in the Langfuse UI - Keys are stored securely and can be rotated - Each key is associated with a specific project ### Access Control - Project-based access control - Users can be invited to specific projects - Role-based permissions (Owner, Admin, Member, Viewer) - API keys are scoped to specific projects ## Management ### Upgrade Langfuse To upgrade Langfuse to a new version: ```bash just langfuse::upgrade ``` ### Uninstall ```bash just langfuse::uninstall ``` This removes: - Helm release and all Kubernetes resources - Namespace - Keycloak client and Vault secrets **Note**: The following resources are NOT deleted and must be removed manually if needed: - PostgreSQL user and database - ClickHouse user and database - MinIO user and bucket - Keycloak user ### Clean Up Specific Resources ```bash # Delete PostgreSQL user and database just langfuse::delete-postgres-user-and-db # Delete ClickHouse user and database just langfuse::delete-clickhouse-user # Delete MinIO user and bucket just langfuse::delete-minio-user # Delete Keycloak user just langfuse::delete-keycloak-user ``` ## Troubleshooting ### Check Pod Status ```bash kubectl get pods -n langfuse ``` Expected pods: - `langfuse-web-*` - Web application (1 replica) - `langfuse-worker-*` - Background worker (1 replica) - `langfuse-redis-primary-0` - Redis/Valkey instance ### OAuth Login Fails **Error**: `OAuthCallback: Invalid client or Invalid client credentials` **Cause**: Client secret mismatch between Keycloak and Langfuse **Solution**: Verify client secret is synchronized: ```bash # Get secret from Keycloak just keycloak::get-client-secret langfuse # Compare with Vault just vault::get keycloak/client/langfuse client_secret # If mismatched, update Vault and restart pods just vault::put keycloak/client/langfuse client_id=langfuse client_secret= kubectl rollout restart deployment/langfuse-web -n langfuse ``` **Error**: `Sign up is disabled` **Cause**: New SSO users cannot be created due to configuration **Solution**: This should not occur with the current configuration (`signUpDisabled: false`). If it does, verify Helm values: ```bash helm get values langfuse -n langfuse | grep signUpDisabled # Should show: signUpDisabled: false ``` ### Redis Connection Errors (Startup Only) **Symptoms**: Logs show `Redis error connect ECONNREFUSED` during pod startup **Cause**: Timing issue where web/worker pods start before Redis is ready **Impact**: None - these are transient errors during startup. Once Redis is ready, connections succeed and the application functions normally. **Solution**: No action needed. If you want to eliminate these startup errors, Redis pod can be deployed with a headstart, or init containers can be added to wait for Redis readiness. ### Database Connection Issues Check PostgreSQL connectivity: ```bash kubectl exec -n langfuse deployment/langfuse-web -- \ psql -h postgres-cluster-rw.postgres -U langfuse -d langfuse -c "SELECT 1" ``` Check ClickHouse connectivity: ```bash kubectl exec -n clickhouse clickhouse-clickhouse-0 -- \ clickhouse-client --user=langfuse --password=$(just vault::get clickhouse/user/langfuse password) \ --query "SELECT 1" ``` ### Storage Issues Check MinIO credentials: ```bash kubectl get secret minio-auth -n langfuse -o yaml ``` Verify bucket exists: ```bash just minio::bucket-exists langfuse ``` ### Check Logs ```bash # Web application logs kubectl logs -n langfuse deployment/langfuse-web --tail=100 # Worker logs kubectl logs -n langfuse deployment/langfuse-worker --tail=100 # Redis logs kubectl logs -n langfuse langfuse-redis-primary-0 --tail=100 # Real-time logs kubectl logs -n langfuse deployment/langfuse-web -f ``` ### Common Issues **Blank page after login**: Check browser console for errors. Ensure `NEXTAUTH_URL` matches the actual hostname. **API requests fail**: Verify API keys are correct and associated with the correct project. **Slow dashboard**: Check ClickHouse query performance. Large trace volumes may require index optimization. **Missing traces**: Ensure SDK is configured with correct host and API keys. Check network connectivity from application to Langfuse. ## Configuration Files Key configuration files: - `langfuse-values.gomplate.yaml` - Helm values template - `keycloak-auth-external-secret.yaml` - Keycloak credentials - `postgres-auth-external-secret.gomplate.yaml` - PostgreSQL credentials - `clickhouse-auth-external-secret.gomplate.yaml` - ClickHouse credentials - `redis-auth-external-secret.yaml` - Redis password - `minio-auth-external-secret.yaml` - MinIO credentials ## Security Considerations - **Pod Security Standards**: Namespace configured with **restricted** enforcement - **Secrets Management**: All credentials stored in Vault and synced via External Secrets Operator - **OIDC Authentication**: No local password storage, authentication delegated to Keycloak - **API Key Security**: Keys are hashed and stored securely in PostgreSQL - **TLS/HTTPS**: All external traffic encrypted via Traefik Ingress - **Network Isolation**: Internal services communicate via cluster network - **Database Credentials**: Unique user per application with minimal privileges ### Pod Security Standards The Langfuse namespace is configured with **restricted** Pod Security Standards: - `pod-security.kubernetes.io/enforce=restricted` - `pod-security.kubernetes.io/warn=restricted` All pods (Langfuse web, worker, and Valkey) run with restricted-compliant security contexts: - `runAsNonRoot: true` - Prevents containers from running as root - `allowPrivilegeEscalation: false` - Blocks privilege escalation - `seccompProfile.type: RuntimeDefault` - Enables seccomp filtering - `capabilities.drop: [ALL]` - Drops all Linux capabilities ## References - [Langfuse Documentation](https://langfuse.com/docs) - [Langfuse GitHub](https://github.com/langfuse/langfuse) - [Langfuse Helm Chart](https://github.com/langfuse/langfuse-k8s) - [Langfuse Python SDK](https://langfuse.com/docs/sdk/python) - [Langfuse OpenAI Integration](https://langfuse.com/docs/integrations/openai) - [Keycloak OIDC](https://www.keycloak.org/docs/latest/securing_apps/#_oidc)