buun-stack
A remotely accessible Kubernetes home lab with OIDC authentication. Build a modern development environment with integrated data analytics and AI capabilities. Includes an open data stack for data ingestion, transformation, serving, and orchestration—built on open-source components you can run locally and port to any cloud.
- 📺 Remote-Accessible Kubernetes Home Lab (YouTube playlist)
- 📝 Building a Remote-Accessible Kubernetes Home Lab with k3s (Dev.to article)
Architecture
Foundation
- k3s: Lightweight Kubernetes distribution
- Just: Task runner with templated configurations
- Cloudflare Tunnel: Secure internet connectivity
Core Components (Required)
- PostgreSQL: Database cluster with pgvector extension
- Keycloak: Identity and access management with OIDC authentication
Recommended Components
- HashiCorp Vault: Centralized secrets management
- Used by most stack modules for secure credential storage
- Can be deployed without, but highly recommended
- External Secrets Operator: Kubernetes secret synchronization from Vault
- Automatically syncs secrets from Vault to Kubernetes Secrets
- Provides secure secret rotation and lifecycle management
Observability (Optional)
- Prometheus: Metrics collection and alerting
- Grafana: Metrics visualization and dashboards
- Goldilocks: Resource recommendation dashboard powered by VPA
Storage (Optional)
Data & Analytics (Optional)
- JupyterHub: Interactive computing with collaborative notebooks
- MLflow: Machine learning lifecycle management with experiment tracking and model registry
- Trino: Distributed SQL query engine for querying multiple data sources
- Querybook: Big data querying UI with notebook interface
- ClickHouse: High-performance columnar analytics database
- Qdrant: Vector database for AI/ML applications
- Lakekeeper: Apache Iceberg REST Catalog for data lake management
- Apache Superset: BI platform with rich chart types and high customizability
- Metabase: Lightweight BI with simple configuration and clean, modern interface
- DataHub: Data catalog and metadata management
Orchestration (Optional)
- Dagster: Modern data orchestration platform
- Apache Airflow: Workflow orchestration and task scheduling
Security & Compliance (Optional)
- OAuth2 Proxy: Authentication proxy for adding Keycloak authentication
- Fairwinds Polaris: Kubernetes configuration validation and security auditing
Quick Start
For detailed step-by-step instructions, see the Installation Guide.
-
Clone and configure
git clone https://github.com/buun-ch/buun-stack cd buun-stack mise install just env::setup -
Deploy cluster and services
just k8s::install just longhorn::install just vault::install just postgres::install just keycloak::install -
Configure authentication
just keycloak::create-realm just vault::setup-oidc-auth just keycloak::create-user just k8s::setup-oidc-auth
Component Details
k3s
Lightweight Kubernetes distribution optimized for edge computing:
- Resource Efficient: Runs on resource-constrained environments
- Production Ready: Full Kubernetes functionality with minimal overhead
- Easy Deployment: Single binary installation with built-in ingress
Longhorn
Enterprise-grade distributed storage system:
- Highly Available: Block storage with no single point of failure
- Backup & Recovery: Built-in disaster recovery capabilities
- NFS Support: Persistent volumes with NFS compatibility
HashiCorp Vault
Centralized secrets management:
- Secure Storage: Encrypted secret storage with access control
- Dynamic Secrets: Automatic credential generation and rotation
- External Secrets Integration: Syncs with Kubernetes via External Secrets Operator
Keycloak
Open-source identity and access management:
- Single Sign-On: OIDC/OAuth2 authentication across all services
- User Federation: Identity brokering and external provider integration
- Group-Based Access: Role and permission management
PostgreSQL
Production-ready relational database:
- High Availability: Clustered deployment with CloudNativePG
- pgvector Extension: Vector similarity search for AI/ML workloads
- Multi-Tenant: Shared database for Keycloak and applications
Prometheus and Grafana
Comprehensive monitoring and observability stack:
- Metrics Collection: Prometheus server with Prometheus Operator
- Visualization: Grafana with customizable dashboards
- Alerting: Alertmanager for alert routing and management
- Namespace-Based Control: Explicit monitoring via labels
- OIDC Integration: Optional Keycloak authentication for Grafana
📖 See Prometheus Documentation
External Secrets Operator
Kubernetes operator for secret synchronization:
- Vault Integration: Automatically syncs secrets from Vault to Kubernetes
- Multiple Backends: Supports various secret management systems
- Secure Rotation: Automatic secret lifecycle management
MinIO
S3-compatible object storage:
- S3 API: Drop-in replacement for AWS S3
- High Performance: Distributed object storage with erasure coding
- Multi-Tenancy: Isolated storage buckets per application
JupyterHub
Multi-user platform for interactive computing:
- Keycloak Authentication: OAuth2 integration with SSO
- Persistent Storage: User notebooks stored in Longhorn volumes
- Collaborative: Shared computing environment for teams
📖 See JupyterHub Documentation
MLflow
Machine learning lifecycle management platform:
- Experiment Tracking: Log parameters, metrics, and artifacts for ML experiments
- Model Registry: Version and manage ML models with deployment lifecycle
- Keycloak Authentication: OAuth2 integration with group-based access control
Apache Superset
Modern business intelligence platform:
- Rich Visualizations: 40+ chart types including mixed charts, treemaps, and heatmaps
- SQL Lab: Powerful editor for complex queries and dataset creation
- Keycloak & Trino: OAuth2 authentication and Iceberg data lake integration
Metabase
Lightweight business intelligence:
- Simple Setup: Quick configuration with clean, modern UI
- Multiple Databases: Connect to PostgreSQL, Trino, and more
- Keycloak Authentication: OAuth2 integration for user management
Querybook
Big data querying UI with notebook interface:
- Trino Integration: SQL queries against multiple data sources with user impersonation
- Notebook Interface: Shareable datadocs with queries and visualizations
- Real-time Execution: WebSocket-based query progress updates
Trino
Fast distributed SQL query engine:
- Multi-Source Queries: Query PostgreSQL, Iceberg, and other sources in single query
- Keycloak Authentication: OAuth2 for Web UI, password auth for JDBC clients
- Sample Data: TPCH catalog with benchmark data for testing
DataHub
Modern data catalog and metadata management:
- OIDC Integration: Keycloak authentication for unified access
- Metadata Discovery: Search and browse data assets across platforms
- Lineage Tracking: Visualize data flow and dependencies
ClickHouse
High-performance columnar OLAP database:
- Fast Analytics: Optimized for analytical queries on large datasets
- Compression: Efficient storage with columnar format
- Real-time Ingestion: Stream data from Kafka and other sources
📖 See ClickHouse Documentation
Qdrant
High-performance vector database:
- Similarity Search: Fast vector search for AI/ML applications
- Rich Filtering: Combine vector search with structured filters
- Scalable: Distributed deployment for large-scale embeddings
Lakekeeper
Apache Iceberg REST Catalog:
- OIDC Authentication: Keycloak integration for secure access
- Table Management: Manages Iceberg tables with ACID transactions
- Multi-Engine: Compatible with Trino, Spark, and other query engines
📖 See Lakekeeper Documentation
Apache Airflow
Workflow orchestration platform:
- DAG-Based: Define data pipelines as code with Python
- JupyterHub Integration: Develop and test workflows in notebooks
- Keycloak Authentication: OAuth2 for user management
Dagster
Modern data orchestration platform:
- Asset-Centric: Define data assets and their dependencies
- Integrated Development: Built-in UI for development and monitoring
- Testing & Validation: Data quality checks and pipeline testing
Fairwinds Polaris
Kubernetes configuration validation and best practices auditing:
- Security Checks: Validates security configurations against best practices
- Efficiency Analysis: Identifies missing resource requests and limits
- Real-time Auditing: Continuous cluster configuration scanning
- Dashboard Interface: Visual reporting of issues by severity
📖 See Fairwinds Polaris Documentation
Goldilocks
Resource recommendation dashboard for right-sizing workloads:
- VPA Integration: Powered by Vertical Pod Autoscaler for metrics-based recommendations
- Visual Dashboard: User-friendly interface for viewing resource recommendations
- QoS Guidance: Recommendations for Guaranteed, Burstable, and BestEffort classes
- Monitoring-Only Mode: Observes workloads without automatic scaling
- Namespace-Based: Enable monitoring per namespace with labels
📖 See Goldilocks Documentation
Common Operations
User Management
Create additional users:
just keycloak::create-user
Add user to group:
just keycloak::add-user-to-group <username> <group>
Database Management
Create database:
just postgres::create-db <dbname>
Create database user:
just postgres::create-user <username>
Grant privileges:
just postgres::grant <dbname> <username>
Secret Management
Store secrets in Vault:
just vault::put <path> <key>=<value>
Retrieve secrets:
just vault::get <path> <field>
Security & Authentication
OAuth2 Proxy Integration
For applications that don't natively support Keycloak/OIDC authentication, buun-stack provides OAuth2 Proxy integration to add Keycloak authentication to any application:
- Universal Authentication: Add Keycloak SSO to any web application
- Automatic Setup: Configures Keycloak client, secrets, and proxy deployment
- Security: Prevents unauthorized access by routing all traffic through authentication
- Easy Management: Simple recipes for setup and removal
Setup OAuth2 authentication for any application:
# For CH-UI (included in installation prompt)
just ch-ui::setup-oauth2-proxy
# For any custom application
just oauth2-proxy::setup-for-app <app-name> <app-host> [namespace] [upstream-service]
Remove OAuth2 authentication:
just ch-ui::remove-oauth2-proxy
just oauth2-proxy::remove-for-app <app-name> [namespace]
The OAuth2 Proxy automatically:
- Creates a Keycloak client with proper audience mapping
- Generates secure secrets and stores them in Vault
- Deploys proxy with Traefik ingress routing
- Disables direct application access to ensure security
Remote Access
Once configured, you can access your cluster from anywhere:
# SSH access
ssh ssh.yourdomain.com
# Kubernetes API
kubectl --context yourpc-oidc get nodes
# Web interfaces
# Vault: https://vault.yourdomain.com
# Keycloak: https://auth.yourdomain.com
# Grafana: https://grafana.yourdomain.com
# Trino: https://trino.yourdomain.com
# Querybook: https://querybook.yourdomain.com
# Superset: https://superset.yourdomain.com
# Metabase: https://metabase.yourdomain.com
# Airflow: https://airflow.yourdomain.com
# JupyterHub: https://jupyter.yourdomain.com
# MLflow: https://mlflow.yourdomain.com
Customization
Adding Custom Recipes
You can extend buun-stack with your own Just recipes and services:
-
Copy the example files:
cp custom-example.just custom.just cp -r custom-example custom -
Use the custom recipes:
# Install reddit-rss just custom::reddit-rss::install # Install Miniflux feed reader just custom::miniflux::install -
Create your own recipes:
Add new modules to the custom/ directory following the same pattern as the examples. Each module should have its own justfile with install, uninstall, and other relevant recipes.
The custom.just file is automatically imported by the main Justfile if it exists, allowing you to maintain your custom workflows separately from the core stack.
Demo Projects
The following demo projects showcase end-to-end data workflows using buun-stack:
Salesforce to Iceberg REST Catalog
dlt-salesforce-iceberg-rest-demo
Demonstrates Salesforce data ingestion into an Iceberg data lake:
- dlt extracts data from Salesforce API (Account, Contact, Opportunity, etc.)
- Custom Iceberg destination loads data into Lakekeeper REST Catalog
- Automatic schema conversion from dlt to Iceberg with PyArrow
- Orchestration with Dagster or Apache Airflow
- Query with Trino and visualize in Superset/Metabase
Key technologies: dlt, Iceberg, Lakekeeper, Trino, MinIO
E-commerce Lakehouse Analytics
payload-ecommerce-lakehouse-demo
Full-stack e-commerce application with integrated lakehouse analytics:
- Next.js + Payload CMS for e-commerce application
- dlt ingests data incrementally from Payload API to Iceberg
- dbt transforms raw data into analytics-ready star schema
- Trino queries across all data layers (raw, staging, marts)
- Superset/Metabase for dashboards and business intelligence
Key technologies: Next.js, Payload CMS, dlt, dbt, Iceberg, Lakekeeper, Trino, Superset, Metabase
Both projects demonstrate the medallion architecture (raw → staging → marts) and showcase how buun-stack components work together for production data workflows.
License
MIT License - See LICENSE file for details