docs: write about Airbyte
This commit is contained in:
39
README.md
39
README.md
@@ -16,6 +16,7 @@ A Kubernetes development stack for self-hosted environments, designed to run on
|
|||||||
- **Data Catalog**: [DataHub](https://datahubproject.io/) for metadata management and data discovery
|
- **Data Catalog**: [DataHub](https://datahubproject.io/) for metadata management and data discovery
|
||||||
- **Database**: [PostgreSQL](https://www.postgresql.org/) cluster
|
- **Database**: [PostgreSQL](https://www.postgresql.org/) cluster
|
||||||
- **Analytics Engine/Database**: [ClickHouse](https://clickhouse.com/) for high-performance analytics and data warehousing
|
- **Analytics Engine/Database**: [ClickHouse](https://clickhouse.com/) for high-performance analytics and data warehousing
|
||||||
|
- **Data Integration**: [Airbyte](https://airbyte.com/) for ELT data pipelines and ingestion
|
||||||
- **Workflow Orchestration**: [Apache Airflow](https://airflow.apache.org/) for data pipeline automation and task scheduling
|
- **Workflow Orchestration**: [Apache Airflow](https://airflow.apache.org/) for data pipeline automation and task scheduling
|
||||||
- **Authentication Proxy**: [OAuth2 Proxy](https://oauth2-proxy.github.io/oauth2-proxy/) for adding Keycloak authentication to any application
|
- **Authentication Proxy**: [OAuth2 Proxy](https://oauth2-proxy.github.io/oauth2-proxy/) for adding Keycloak authentication to any application
|
||||||
- **Remote Access**: [Cloudflare Tunnel](https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/) for secure internet connectivity
|
- **Remote Access**: [Cloudflare Tunnel](https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/) for secure internet connectivity
|
||||||
@@ -232,6 +233,44 @@ just airflow::create-api-user <username> <role>
|
|||||||
|
|
||||||
Access Airflow at `https://airflow.yourdomain.com` and authenticate via Keycloak.
|
Access Airflow at `https://airflow.yourdomain.com` and authenticate via Keycloak.
|
||||||
|
|
||||||
|
### Airbyte
|
||||||
|
|
||||||
|
Open-source data integration platform for building ELT pipelines:
|
||||||
|
|
||||||
|
- **600+ Connectors**: Pre-built connectors for databases, APIs, files, and SaaS applications
|
||||||
|
- **Change Data Capture (CDC)**: Real-time data replication with PostgreSQL logical replication
|
||||||
|
- **Schema Management**: Automatic schema detection and evolution handling
|
||||||
|
- **Incremental Sync**: Efficient data synchronization with deduplication
|
||||||
|
- **Storage Options**: Flexible storage with MinIO (S3-compatible) or local persistent volumes
|
||||||
|
- **OAuth2 Authentication**: Secure access through Keycloak via OAuth2 Proxy
|
||||||
|
|
||||||
|
Installation:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
just airbyte::install
|
||||||
|
```
|
||||||
|
|
||||||
|
**PostgreSQL CDC Setup**: Enable Change Data Capture for real-time data replication:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Setup CDC with user tables only (recommended)
|
||||||
|
just postgres::setup-cdc <database> <slot_name> <publication_name> <username>
|
||||||
|
|
||||||
|
# Example for database 'mydb' with user 'etl_user'
|
||||||
|
just postgres::setup-cdc mydb airbyte_slot airbyte_pub etl_user
|
||||||
|
```
|
||||||
|
|
||||||
|
**Storage Configuration**:
|
||||||
|
- **MinIO**: S3-compatible object storage for scalable data staging
|
||||||
|
- **Local**: Persistent volumes with automatic Longhorn RWX detection
|
||||||
|
|
||||||
|
**Authentication**: Airbyte OSS uses OAuth2 Proxy for Keycloak integration:
|
||||||
|
- During installation, optionally enable OAuth2 authentication
|
||||||
|
- Access control through Keycloak groups and roles
|
||||||
|
- Shared internal account for all authenticated users
|
||||||
|
|
||||||
|
Access Airbyte at `https://airbyte.yourdomain.com` and authenticate via Keycloak (if OAuth2 is enabled).
|
||||||
|
|
||||||
## Common Operations
|
## Common Operations
|
||||||
|
|
||||||
### User Management
|
### User Management
|
||||||
|
|||||||
Reference in New Issue
Block a user