Files
buun-stack/README.md
2025-09-13 21:36:46 +09:00

418 lines
13 KiB
Markdown

# buun-stack
A Kubernetes development stack for self-hosted environments, designed to run on a Linux machine in your home or office that you can access from anywhere via the internet.
📺 [Watch the setup tutorial on YouTube](https://youtu.be/Ezv4dEjLeKo) | 📝 [Read the detailed guide on Dev.to](https://dev.to/buun-ch/building-a-remote-accessible-kubernetes-home-lab-with-k3s-5g05)
## Features
- **Kubernetes Distribution**: [k3s](https://k3s.io/) lightweight Kubernetes
- **Block Storage**: [Longhorn](https://longhorn.io/) distributed block storage
- **Object Storage**: [MinIO](https://min.io/) S3-compatible storage
- **Identity & Access**: [Keycloak](https://www.keycloak.org/) for OIDC authentication
- **Secrets Management**: [HashiCorp Vault](https://www.vaultproject.io/) with [External Secrets Operator](https://external-secrets.io/)
- **Interactive Computing**: [JupyterHub](https://jupyter.org/hub) for collaborative notebooks
- **Business Intelligence**: [Metabase](https://www.metabase.com/) for business intelligence and data visualization
- **Data Catalog**: [DataHub](https://datahubproject.io/) for metadata management and data discovery
- **Database**: [PostgreSQL](https://www.postgresql.org/) cluster
- **Analytics Engine/Database**: [ClickHouse](https://clickhouse.com/) for high-performance analytics and data warehousing
- **Data Integration**: [Airbyte](https://airbyte.com/) for ELT data pipelines and ingestion
- **Workflow Orchestration**: [Apache Airflow](https://airflow.apache.org/) for data pipeline automation and task scheduling
- **Authentication Proxy**: [OAuth2 Proxy](https://oauth2-proxy.github.io/oauth2-proxy/) for adding Keycloak authentication to any application
- **Remote Access**: [Cloudflare Tunnel](https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/) for secure internet connectivity
- **Automation**: [Just](https://just.systems/) task runner with templated configurations
## Quick Start
For detailed step-by-step instructions, see the [Installation Guide](./INSTALLATION.md).
1. **Clone and configure**
```bash
git clone https://github.com/buun-ch/buun-stack
cd buun-stack
mise install
just env::setup
```
2. **Deploy cluster and services**
```bash
just k8s::install
just longhorn::install
just vault::install
just postgres::install
just keycloak::install
```
3. **Configure authentication**
```bash
just keycloak::create-realm
just vault::setup-oidc-auth
just keycloak::create-user
just k8s::setup-oidc-auth
```
## Core Components
### k3s
Lightweight Kubernetes distribution optimized for edge computing and resource-constrained environments.
### Longhorn
Enterprise-grade distributed storage system providing:
- Highly available block storage
- Backup and disaster recovery
- No single point of failure
- Support for NFS persistent volumes
### HashiCorp Vault
Centralized secrets management offering:
- Secure secret storage
- Dynamic secrets generation
- Encryption as a service
- Integration with External Secrets Operator for automatic Kubernetes Secret synchronization
### Keycloak
Open-source identity and access management providing:
- Single Sign-On (SSO)
- OIDC/OAuth2 authentication
- User federation and identity brokering
### PostgreSQL
Production-ready relational database for:
- Keycloak data storage
- Application databases
- Vector similarity search with [pgvector](https://github.com/pgvector/pgvector) extension for AI/ML workloads
### External Secrets Operator
Kubernetes operator for syncing secrets from external systems:
- Automatically syncs secrets from Vault to Kubernetes Secrets
- Supports multiple secret backends
- Provides secure secret rotation and lifecycle management
### MinIO
S3-compatible object storage system providing:
- High-performance distributed object storage
- AWS S3 API compatibility
- Erasure coding for data protection
- Multi-tenancy support
### JupyterHub
Multi-user platform for interactive computing:
- Collaborative Jupyter notebook environment
- Integrated with Keycloak for OIDC authentication
- Persistent storage for user workspaces
- Support for multiple kernels and environments
- Vault integration for secure secrets management
See [JupyterHub Documentation](./docs/jupyterhub.md) for detailed setup and configuration.
### Metabase
Business intelligence and data visualization platform:
- Open-source analytics and dashboards
- Interactive data exploration
- PostgreSQL integration for data storage
- Automated setup with Helm
- Session management through Vault/External Secrets
- Simplified deployment (no OIDC dependency)
Installation:
```bash
just metabase::install
```
Access Metabase at `https://metabase.yourdomain.com` and complete the initial setup wizard to create an admin account.
### DataHub
Modern data catalog and metadata management platform:
- Centralized data discovery and documentation
- Data lineage tracking and impact analysis
- Schema evolution monitoring
- OIDC integration with Keycloak for secure access
- Elasticsearch-powered search and indexing
- Kafka-based real-time metadata streaming
- PostgreSQL backend for metadata storage
Installation:
```bash
just datahub::install
```
> **⚠️ Resource Requirements:** DataHub is resource-intensive, requiring approximately **4-5GB of RAM** and 1+ CPU cores across multiple components (Elasticsearch, Kafka, Zookeeper, and DataHub services). Deployment typically takes 15-20 minutes to complete. Ensure your cluster has sufficient resources before installation.
Access DataHub at `https://datahub.yourdomain.com` and use "Sign in with SSO" to authenticate via Keycloak.
### ClickHouse
High-performance columnar OLAP database for analytics and data warehousing:
- Columnar storage for fast analytical queries
- Real-time data ingestion and processing
- Horizontal scaling for large datasets
- SQL interface with advanced analytics functions
- Integration with External Secrets for secure credential management
- Support for various data formats (CSV, JSON, Parquet, etc.)
Installation:
```bash
just clickhouse::install
```
Access ClickHouse at `https://clickhouse.yourdomain.com` using the admin credentials stored in Vault.
**CH-UI Web Interface**: An optional web-based query interface for ClickHouse is available:
```bash
just ch-ui::install
```
### Apache Airflow
Modern workflow orchestration platform for data pipelines and task automation:
- Airflow 3 with modern SDK components and FastAPI integration
- DAG Development: Integrated with JupyterHub for seamless workflow creation and editing
- OIDC Authentication: Secure access through Keycloak integration
- Shared Storage: DAG files shared between JupyterHub and Airflow for direct editing
- Role-based Access Control: Multiple user roles (Admin, Operator, User, Viewer)
- REST API: Ful API access for programmatic DAG management
Installation:
```bash
just airflow::install
```
**JupyterHub Integration**: After installing both JupyterHub and Airflow, DAG files are automatically shared:
- Edit DAG files directly in JupyterHub: `~/airflow-dags/*.py`
- Changes appear in Airflow UI within 1-2 minutes
- Full Python development environment with syntax checking
- Template files available for quick DAG creation
**User Management**:
```bash
# Assign roles to users
just airflow::assign-role <username> <role>
# Available roles: airflow_admin, airflow_op, airflow_user, airflow_viewer
just airflow::assign-role myuser airflow_admin
```
**API Access**: Create API users for programmatic access:
```bash
just airflow::create-api-user <username> <role>
```
> **💡 Development Workflow**: Create DAGs in JupyterHub using `~/airflow-dags/dag_template.py` as a starting point. Use `.tmp` extension during development to avoid import errors, then rename to `.py` when ready.
Access Airflow at `https://airflow.yourdomain.com` and authenticate via Keycloak.
### Airbyte
Open-source data integration platform for building ELT pipelines:
- **600+ Connectors**: Pre-built connectors for databases, APIs, files, and SaaS applications
- **Change Data Capture (CDC)**: Real-time data replication with PostgreSQL logical replication
- **Schema Management**: Automatic schema detection and evolution handling
- **Incremental Sync**: Efficient data synchronization with deduplication
- **Storage Options**: Flexible storage with MinIO (S3-compatible) or local persistent volumes
- **OAuth2 Authentication**: Secure access through Keycloak via OAuth2 Proxy
Installation:
```bash
just airbyte::install
```
**PostgreSQL CDC Setup**: Enable Change Data Capture for real-time data replication:
```bash
# Setup CDC with user tables only (recommended)
just postgres::setup-cdc <database> <slot_name> <publication_name> <username>
# Example for database 'mydb' with user 'etl_user'
just postgres::setup-cdc mydb airbyte_slot airbyte_pub etl_user
```
**Storage Configuration**:
- **MinIO**: S3-compatible object storage for scalable data staging
- **Local**: Persistent volumes with automatic Longhorn RWX detection
**Authentication**: Airbyte OSS uses OAuth2 Proxy for Keycloak integration:
- During installation, optionally enable OAuth2 authentication
- Access control through Keycloak groups and roles
- **Note**: All authenticated users share the same internal Airbyte account (OSS limitation)
> **⚠️ Multi-user Limitation**: Airbyte OSS does not support individual user accounts or role-based permissions within the application. All users authenticated through Keycloak will share the same internal workspace and have access to all connections and configurations. Use naming conventions and team coordination for shared usage.
Access Airbyte at `https://airbyte.yourdomain.com` and authenticate via Keycloak (if OAuth2 is enabled).
## Common Operations
### User Management
Create additional users:
```bash
just keycloak::create-user
```
Add user to group:
```bash
just keycloak::add-user-to-group <username> <group>
```
### Database Management
Create database:
```bash
just postgres::create-db <dbname>
```
Create database user:
```bash
just postgres::create-user <username>
```
Grant privileges:
```bash
just postgres::grant <dbname> <username>
```
### Secret Management
Store secrets in Vault:
```bash
just vault::put <path> <key>=<value>
```
Retrieve secrets:
```bash
just vault::get <path> <field>
```
## Security & Authentication
### OAuth2 Proxy Integration
For applications that don't natively support Keycloak/OIDC authentication, buun-stack provides OAuth2 Proxy integration to add Keycloak authentication to any application:
- **Universal Authentication**: Add Keycloak SSO to any web application
- **Automatic Setup**: Configures Keycloak client, secrets, and proxy deployment
- **Security**: Prevents unauthorized access by routing all traffic through authentication
- **Easy Management**: Simple recipes for setup and removal
**Setup OAuth2 authentication for any application**:
```bash
# For CH-UI (included in installation prompt)
just ch-ui::setup-oauth2-proxy
# For any custom application
just oauth2-proxy::setup-for-app <app-name> <app-host> [namespace] [upstream-service]
```
**Remove OAuth2 authentication**:
```bash
just ch-ui::remove-oauth2-proxy
just oauth2-proxy::remove-for-app <app-name> [namespace]
```
The OAuth2 Proxy automatically:
- Creates a Keycloak client with proper audience mapping
- Generates secure secrets and stores them in Vault
- Deploys proxy with Traefik ingress routing
- Disables direct application access to ensure security
## Remote Access
Once configured, you can access your cluster from anywhere:
```bash
# SSH access
ssh ssh.yourdomain.com
# Kubernetes API
kubectl --context yourpc-oidc get nodes
# Web interfaces
# Vault: https://vault.yourdomain.com
# Keycloak: https://auth.yourdomain.com
# Metabase: https://metabase.yourdomain.com
# Airflow: https://airflow.yourdomain.com
# JupyterHub: https://jupyter.yourdomain.com
```
## Customization
### Adding Custom Recipes
You can extend buun-stack with your own Just recipes and services:
1. Copy the example files:
```bash
cp custom-example.just custom.just
cp -r custom-example custom
```
2. Use the custom recipes:
```bash
# Install reddit-rss
just custom::reddit-rss::install
# Install Miniflux feed reader
just custom::miniflux::install
```
3. Create your own recipes:
Add new modules to the `custom/` directory following the same pattern as the examples. Each module should have its own `justfile` with install, uninstall, and other relevant recipes.
The `custom.just` file is automatically imported by the main Justfile if it exists, allowing you to maintain your custom workflows separately from the core stack.
## Troubleshooting
- Check logs: `kubectl logs -n <namespace> <pod-name>`
## License
MIT License - See LICENSE file for details