Files
buun-stack/README.md
2025-11-21 00:42:45 +09:00

533 lines
18 KiB
Markdown

# buun-stack
A remotely accessible Kubernetes home lab with OIDC authentication. Build a modern development environment with integrated data analytics and AI capabilities. Includes an open data stack for data ingestion, transformation, serving, and orchestration—built on open-source components you can run locally and port to any cloud.
- 📺 [Remote-Accessible Kubernetes Home Lab](https://www.youtube.com/playlist?list=PLbAvvJK22Y6vJPrUC6GrfNMXneYspckAo) (YouTube playlist)
- 📝 [Building a Remote-Accessible Kubernetes Home Lab with k3s](https://dev.to/buun-ch/building-a-remote-accessible-kubernetes-home-lab-with-k3s-5g05) (Dev.to article)
## Architecture
### Foundation
- **[k3s](https://k3s.io/)**: Lightweight Kubernetes distribution
- **[Just](https://just.systems/)**: Task runner with templated configurations
- **[Cloudflare Tunnel](https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/)**: Secure internet connectivity
### Core Components (Required)
- **[PostgreSQL](https://www.postgresql.org/)**: Database cluster with pgvector extension
- **[Keycloak](https://www.keycloak.org/)**: Identity and access management with OIDC authentication
### Recommended Components
- **[HashiCorp Vault](https://www.vaultproject.io/)**: Centralized secrets management
- Used by most stack modules for secure credential storage
- Can be deployed without, but highly recommended
- **[External Secrets Operator](https://external-secrets.io/)**: Kubernetes secret synchronization from Vault
- Automatically syncs secrets from Vault to Kubernetes Secrets
- Provides secure secret rotation and lifecycle management
### Observability (Optional)
- **[Prometheus](https://prometheus.io/)**: Metrics collection and alerting
- **[Grafana](https://grafana.com/)**: Metrics visualization and dashboards
- **[Goldilocks](https://goldilocks.docs.fairwinds.com/)**: Resource recommendation dashboard powered by VPA
### Storage (Optional)
- **[Longhorn](https://longhorn.io/)**: Distributed block storage
- **[MinIO](https://min.io/)**: S3-compatible object storage
### GPU Support (Optional)
- **[NVIDIA Device Plugin](https://github.com/NVIDIA/k8s-device-plugin)**: NVIDIA GPU support for Kubernetes
### Data & Analytics (Optional)
- **[JupyterHub](https://jupyter.org/hub)**: Interactive computing with collaborative notebooks
- **[Trino](https://trino.io/)**: Distributed SQL query engine for querying multiple data sources
- **[Querybook](https://www.querybook.org/)**: Big data querying UI with notebook interface
- **[ClickHouse](https://clickhouse.com/)**: High-performance columnar analytics database
- **[Qdrant](https://qdrant.tech/)**: Vector database for AI/ML applications
- **[Lakekeeper](https://lakekeeper.io/)**: Apache Iceberg REST Catalog for data lake management
- **[Apache Superset](https://superset.apache.org/)**: BI platform with rich chart types and high customizability
- **[Metabase](https://www.metabase.com/)**: Lightweight BI with simple configuration and clean, modern interface
- **[DataHub](https://datahubproject.io/)**: Data catalog and metadata management
### Machine Learning (Optional)
- **[MLflow](https://mlflow.org/)**: Machine learning lifecycle management with experiment tracking and model registry
- **[KServe](https://kserve.github.io/)**: Model serving platform for deploying ML models on Kubernetes
### LLM & AI Applications (Optional)
- **[Langfuse](https://langfuse.com/)**: LLM observability and analytics platform for tracking and debugging AI applications
### Orchestration (Optional)
- **[Dagster](https://dagster.io/)**: Modern data orchestration platform
- **[Apache Airflow](https://airflow.apache.org/)**: Workflow orchestration and task scheduling
### Security & Compliance (Optional)
- **[OAuth2 Proxy](https://oauth2-proxy.github.io/oauth2-proxy/)**: Authentication proxy for adding Keycloak authentication
- **[Fairwinds Polaris](https://polaris.docs.fairwinds.com/)**: Kubernetes configuration validation and security auditing
## Quick Start
For detailed step-by-step instructions, see the [Installation Guide](./INSTALLATION.md).
1. **Clone and configure**
```bash
git clone https://github.com/buun-ch/buun-stack
cd buun-stack
mise install
just env::setup
```
2. **Deploy cluster and services**
```bash
just k8s::install
just longhorn::install
just vault::install
just postgres::install
just keycloak::install
```
3. **Configure authentication**
```bash
just keycloak::create-realm
just vault::setup-oidc-auth
just keycloak::create-user
just k8s::setup-oidc-auth
```
## Component Details
### k3s
Lightweight Kubernetes distribution optimized for edge computing:
- **Resource Efficient**: Runs on resource-constrained environments
- **Production Ready**: Full Kubernetes functionality with minimal overhead
- **Easy Deployment**: Single binary installation with built-in ingress
### Longhorn
Enterprise-grade distributed storage system:
- **Highly Available**: Block storage with no single point of failure
- **Backup & Recovery**: Built-in disaster recovery capabilities
- **NFS Support**: Persistent volumes with NFS compatibility
### HashiCorp Vault
Centralized secrets management:
- **Secure Storage**: Encrypted secret storage with access control
- **Dynamic Secrets**: Automatic credential generation and rotation
- **External Secrets Integration**: Syncs with Kubernetes via External Secrets Operator
### Keycloak
Open-source identity and access management:
- **Single Sign-On**: OIDC/OAuth2 authentication across all services
- **User Federation**: Identity brokering and external provider integration
- **Group-Based Access**: Role and permission management
### PostgreSQL
Production-ready relational database:
- **High Availability**: Clustered deployment with CloudNativePG
- **pgvector Extension**: Vector similarity search for AI/ML workloads
- **Multi-Tenant**: Shared database for Keycloak and applications
### Prometheus and Grafana
Comprehensive monitoring and observability stack:
- **Metrics Collection**: Prometheus server with Prometheus Operator
- **Visualization**: Grafana with customizable dashboards
- **Alerting**: Alertmanager for alert routing and management
- **Namespace-Based Control**: Explicit monitoring via labels
- **OIDC Integration**: Optional Keycloak authentication for Grafana
[📖 See Prometheus Documentation](./prometheus/README.md)
### External Secrets Operator
Kubernetes operator for secret synchronization:
- **Vault Integration**: Automatically syncs secrets from Vault to Kubernetes
- **Multiple Backends**: Supports various secret management systems
- **Secure Rotation**: Automatic secret lifecycle management
### MinIO
S3-compatible object storage:
- **S3 API**: Drop-in replacement for AWS S3
- **High Performance**: Distributed object storage with erasure coding
- **Multi-Tenancy**: Isolated storage buckets per application
### JupyterHub
Multi-user platform for interactive computing:
- **Keycloak Authentication**: OAuth2 integration with SSO
- **Persistent Storage**: User notebooks stored in Longhorn volumes
- **Collaborative**: Shared computing environment for teams
- **GPU Support**: CUDA-enabled notebooks with nvidia-device-plugin integration
[📖 See JupyterHub Documentation](./jupyterhub/README.md)
### MLflow
Machine learning lifecycle management platform:
- **Experiment Tracking**: Log parameters, metrics, and artifacts for ML experiments
- **Model Registry**: Version and manage ML models with deployment lifecycle
- **Keycloak Authentication**: OAuth2 integration with group-based access control
[📖 See MLflow Documentation](./mlflow/README.md)
### KServe
Model serving platform for deploying ML models on Kubernetes:
- **Multi-Framework Support**: TensorFlow, PyTorch, scikit-learn, XGBoost, MLflow, and more
- **MLflow Integration**: Deploy models directly from MLflow Model Registry
- **Inference Protocols**: REST and gRPC with v2 Open Inference Protocol
- **RawDeployment Mode**: Uses native Kubernetes Deployments without Knative dependency
[📖 See KServe Documentation](./kserve/README.md)
### Langfuse
LLM observability and analytics platform:
- **Trace Tracking**: Monitor LLM calls, chains, and agent executions with detailed traces
- **Prompt Management**: Version and test prompts with playground interface
- **Analytics**: Track costs, latency, and token usage across all LLM applications
- **Keycloak Authentication**: OAuth2 integration with automatic user provisioning
[📖 See Langfuse Documentation](./langfuse/README.md)
### Apache Superset
Modern business intelligence platform:
- **Rich Visualizations**: 40+ chart types including mixed charts, treemaps, and heatmaps
- **SQL Lab**: Powerful editor for complex queries and dataset creation
- **Keycloak & Trino**: OAuth2 authentication and Iceberg data lake integration
[📖 See Superset Documentation](./superset/README.md)
### Metabase
Lightweight business intelligence:
- **Simple Setup**: Quick configuration with clean, modern UI
- **Multiple Databases**: Connect to PostgreSQL, Trino, and more
- **Keycloak Authentication**: OAuth2 integration for user management
[📖 See Metabase Documentation](./metabase/README.md)
### Querybook
Big data querying UI with notebook interface:
- **Trino Integration**: SQL queries against multiple data sources with user impersonation
- **Notebook Interface**: Shareable datadocs with queries and visualizations
- **Real-time Execution**: WebSocket-based query progress updates
[📖 See Querybook Documentation](./querybook/README.md)
### Trino
Fast distributed SQL query engine:
- **Multi-Source Queries**: Query PostgreSQL, Iceberg, and other sources in single query
- **Keycloak Authentication**: OAuth2 for Web UI, password auth for JDBC clients
- **Sample Data**: TPCH catalog with benchmark data for testing
[📖 See Trino Documentation](./trino/README.md)
### DataHub
Modern data catalog and metadata management:
- **OIDC Integration**: Keycloak authentication for unified access
- **Metadata Discovery**: Search and browse data assets across platforms
- **Lineage Tracking**: Visualize data flow and dependencies
[📖 See DataHub Documentation](./datahub/README.md)
### ClickHouse
High-performance columnar OLAP database:
- **Fast Analytics**: Optimized for analytical queries on large datasets
- **Compression**: Efficient storage with columnar format
- **Real-time Ingestion**: Stream data from Kafka and other sources
[📖 See ClickHouse Documentation](./clickhouse/README.md)
### Qdrant
High-performance vector database:
- **Similarity Search**: Fast vector search for AI/ML applications
- **Rich Filtering**: Combine vector search with structured filters
- **Scalable**: Distributed deployment for large-scale embeddings
[📖 See Qdrant Documentation](./qdrant/README.md)
### Lakekeeper
Apache Iceberg REST Catalog:
- **OIDC Authentication**: Keycloak integration for secure access
- **Table Management**: Manages Iceberg tables with ACID transactions
- **Multi-Engine**: Compatible with Trino, Spark, and other query engines
[📖 See Lakekeeper Documentation](./lakekeeper/README.md)
### Apache Airflow
Workflow orchestration platform:
- **DAG-Based**: Define data pipelines as code with Python
- **JupyterHub Integration**: Develop and test workflows in notebooks
- **Keycloak Authentication**: OAuth2 for user management
[📖 See Airflow Documentation](./airflow/README.md)
### Dagster
Modern data orchestration platform:
- **Asset-Centric**: Define data assets and their dependencies
- **Integrated Development**: Built-in UI for development and monitoring
- **Testing & Validation**: Data quality checks and pipeline testing
[📖 See Dagster Documentation](./dagster/README.md)
### Fairwinds Polaris
Kubernetes configuration validation and best practices auditing:
- **Security Checks**: Validates security configurations against best practices
- **Efficiency Analysis**: Identifies missing resource requests and limits
- **Real-time Auditing**: Continuous cluster configuration scanning
- **Dashboard Interface**: Visual reporting of issues by severity
[📖 See Fairwinds Polaris Documentation](./fairwinds-polaris/README.md)
### Goldilocks
Resource recommendation dashboard for right-sizing workloads:
- **VPA Integration**: Powered by Vertical Pod Autoscaler for metrics-based recommendations
- **Visual Dashboard**: User-friendly interface for viewing resource recommendations
- **QoS Guidance**: Recommendations for Guaranteed, Burstable, and BestEffort classes
- **Monitoring-Only Mode**: Observes workloads without automatic scaling
- **Namespace-Based**: Enable monitoring per namespace with labels
[📖 See Goldilocks Documentation](./goldilocks/README.md)
[📖 See VPA Documentation](./vpa/README.md)
## Common Operations
### User Management
Create additional users:
```bash
just keycloak::create-user
```
Add user to group:
```bash
just keycloak::add-user-to-group <username> <group>
```
### Database Management
Create database:
```bash
just postgres::create-db <dbname>
```
Create database user:
```bash
just postgres::create-user <username>
```
Grant privileges:
```bash
just postgres::grant <dbname> <username>
```
### Secret Management
Store secrets in Vault:
```bash
just vault::put <path> <key>=<value>
```
Retrieve secrets:
```bash
just vault::get <path> <field>
```
## Security & Authentication
### OAuth2 Proxy Integration
For applications that don't natively support Keycloak/OIDC authentication, buun-stack provides OAuth2 Proxy integration to add Keycloak authentication to any application:
- **Universal Authentication**: Add Keycloak SSO to any web application
- **Automatic Setup**: Configures Keycloak client, secrets, and proxy deployment
- **Security**: Prevents unauthorized access by routing all traffic through authentication
- **Easy Management**: Simple recipes for setup and removal
**Setup OAuth2 authentication for any application**:
```bash
# For CH-UI (included in installation prompt)
just ch-ui::setup-oauth2-proxy
# For any custom application
just oauth2-proxy::setup-for-app <app-name> <app-host> [namespace] [upstream-service]
```
**Remove OAuth2 authentication**:
```bash
just ch-ui::remove-oauth2-proxy
just oauth2-proxy::remove-for-app <app-name> [namespace]
```
The OAuth2 Proxy automatically:
- Creates a Keycloak client with proper audience mapping
- Generates secure secrets and stores them in Vault
- Deploys proxy with Traefik ingress routing
- Disables direct application access to ensure security
## Remote Access
Once configured, you can access your cluster from anywhere:
```bash
# SSH access
ssh ssh.yourdomain.com
# Kubernetes API
kubectl --context yourpc-oidc get nodes
# Web interfaces
# Vault: https://vault.yourdomain.com
# Keycloak: https://auth.yourdomain.com
# Grafana: https://grafana.yourdomain.com
# Trino: https://trino.yourdomain.com
# Querybook: https://querybook.yourdomain.com
# Superset: https://superset.yourdomain.com
# Metabase: https://metabase.yourdomain.com
# Airflow: https://airflow.yourdomain.com
# JupyterHub: https://jupyter.yourdomain.com
# MLflow: https://mlflow.yourdomain.com
# Langfuse: https://langfuse.yourdomain.com
```
## Customization
### Adding Custom Recipes
You can extend buun-stack with your own Just recipes and services:
1. Copy the example files:
```bash
cp custom-example.just custom.just
cp -r custom-example custom
```
2. Use the custom recipes:
```bash
# Install reddit-rss
just custom::reddit-rss::install
# Install Miniflux feed reader
just custom::miniflux::install
```
3. Create your own recipes:
Add new modules to the `custom/` directory following the same pattern as the examples. Each module should have its own `justfile` with install, uninstall, and other relevant recipes.
The `custom.just` file is automatically imported by the main Justfile if it exists, allowing you to maintain your custom workflows separately from the core stack.
## Demo Projects
The following demo projects showcase end-to-end data workflows using buun-stack:
### ML Model Serving with MLflow and KServe
[**examples/kserve-mlflow-iris**](./examples/kserve-mlflow-iris/README.md)
End-to-end machine learning workflow demonstrating JupyterHub, MLflow, and KServe integration:
- **JupyterHub** for model training and testing
- **MLflow** for experiment tracking and model registry
- **KServe** for model deployment and inference
Key technologies: MLflow, KServe, MinIO, JupyterHub
### Salesforce to Iceberg REST Catalog
[**dlt-salesforce-iceberg-rest-demo**](https://github.com/buun-ch/dlt-salesforce-iceberg-rest-demo)
Demonstrates Salesforce data ingestion into an Iceberg data lake:
- **dlt** extracts data from Salesforce API (Account, Contact, Opportunity, etc.)
- **Custom Iceberg destination** loads data into Lakekeeper REST Catalog
- **Automatic schema conversion** from dlt to Iceberg with PyArrow
- **Orchestration** with Dagster or Apache Airflow
Key technologies: dlt, Iceberg, Lakekeeper, MinIO
### E-commerce Lakehouse Analytics
[**payload-ecommerce-lakehouse-demo**](https://github.com/buun-ch/payload-ecommerce-lakehouse-demo)
Full-stack e-commerce application with integrated lakehouse analytics:
- **Next.js + Payload CMS** for e-commerce application
- **dlt** ingests data incrementally from Payload API to Iceberg
- **dbt** transforms raw data into analytics-ready star schema
- **Trino** queries across all data layers (raw, staging, marts)
- **Superset/Metabase** for dashboards and business intelligence
Key technologies: Next.js, Payload CMS, dlt, dbt, Iceberg, Lakekeeper, Trino, Superset, Metabase
Both projects demonstrate the medallion architecture (raw → staging → marts) and showcase how buun-stack components work together for production data workflows.
## License
MIT License - See LICENSE file for details