Dagster Documentation
Overview
This document covers Dagster installation, deployment, and debugging in the buun-stack environment.
Installation
Prerequisites
- Kubernetes cluster with buun-stack components
- PostgreSQL database cluster
- MinIO object storage (optional, for MinIO-based storage)
- External Secrets Operator (optional, for Vault integration)
- Keycloak (for authentication)
Installation Steps
-
Setup Environment Secrets (if needed):
- See Environment Variables Setup section below for configuration options
- Create ExternalSecret or Secret before installation if you want environment variables available immediately
-
Install Dagster:
# Interactive installation with configuration prompts just dagster::install -
Access Dagster Web UI:
- Navigate to your Dagster instance (e.g.,
https://dagster.buun.dev) - Login with your Keycloak credentials
- Navigate to your Dagster instance (e.g.,
Uninstalling
# Remove Dagster (keeps database by default)
just dagster::uninstall false
# Remove Dagster and delete database
just dagster::uninstall true
Project Deployment
Deploy Projects to Shared PVC
Dagster supports deploying Python projects to a shared PVC that allows ReadWriteMany access with Longhorn storage.
-
Prepare Project Directory:
- Ensure your project has a
definitions.pyfile in the main module - Project name must not contain hyphens (use underscores instead)
- Ensure your project has a
-
Deploy Project:
# Deploy a local project directory just dagster::deploy-project /path/to/your/project # Interactive deployment (will prompt for project path) just dagster::deploy-project -
Verify Deployment:
- Access Dagster Web UI
- Check that your assets appear in the Asset Catalog
- The project will be automatically added to the workspace configuration
Remove Projects
# Remove a deployed project
just dagster::remove-project project_name
# Interactive removal (will prompt for project name)
just dagster::remove-project
Storage Configuration
Local PVC Storage (Default)
Uses Kubernetes PersistentVolumeClaims for storage:
- dagster-storage-pvc: Main Dagster storage (ReadWriteOnce)
- dagster-user-code-pvc: Shared user code storage (ReadWriteMany with Longhorn)
MinIO Storage (Optional)
When MinIO is available, Dagster can use S3-compatible storage:
- dagster-data: Data files bucket
- dagster-logs: Compute logs bucket
The storage type is selected during installation via interactive prompt.
Environment Variables Setup
Environment variables are provided to Dagster through Kubernetes Secrets. You have several options:
Option 1: Customize the Example Template
-
Create the example environment secrets template:
just dagster::create-env-secrets-example -
Important: This creates a template with sample values. You must customize it:
- If using External Secrets: Edit
dagster-env-external-secret.gomplate.yamlto reference your actual Vault paths - If using Direct Secrets: Update the created
dagster-env-secretwith your actual credentials
- If using External Secrets: Edit
Option 2: Create ExternalSecret Manually
Create an ExternalSecret that references your Vault credentials:
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
name: dagster-env-external-secret
namespace: dagster
spec:
refreshInterval: 1h
secretStoreRef:
name: vault-secret-store
kind: ClusterSecretStore
target:
name: dagster-env-secret
data:
- secretKey: AWS_ACCESS_KEY_ID
remoteRef:
key: minio/credentials
property: access_key
- secretKey: AWS_SECRET_ACCESS_KEY
remoteRef:
key: minio/credentials
property: secret_key
- secretKey: POSTGRES_URL
remoteRef:
key: postgres/admin
property: connection_string
# Add more variables as needed
Option 3: Create Kubernetes Secret Directly
kubectl create secret generic dagster-env-secret -n dagster \
--from-literal=AWS_ACCESS_KEY_ID="your-access-key" \
--from-literal=AWS_SECRET_ACCESS_KEY="your-secret-key" \
--from-literal=AWS_ENDPOINT_URL="http://minio.minio.svc.cluster.local:9000" \
--from-literal=POSTGRES_URL="postgresql://user:pass@postgres-cluster-rw.postgres:5432"
After creating the environment secrets, redeploy Dagster to pick up the new configuration.
Example Projects
CSV to PostgreSQL Project
The examples/csv_to_postgres project demonstrates a complete ETL pipeline that loads data from MinIO object storage into PostgreSQL using dlt (data load tool).
Dataset Information
MovieLens 20M Dataset
This project processes the MovieLens 20M dataset from GroupLens Research. The dataset contains:
- 27,278 movies with metadata
- 20 million ratings from 138,493 users
- 465,564 tags applied by users
- Additional genome data for content-based filtering
MinIO Storage Structure
The dataset files are stored in MinIO under the movie-lens bucket:
mc alias set buun https://minio.your-domain.com access-key secret-key
mc ls buun/movie-lens
[2025-09-14 12:13:09 JST] 309MiB STANDARD genome-scores.csv
[2025-09-14 12:12:37 JST] 18KiB STANDARD genome-tags.csv
[2025-09-14 12:12:38 JST] 557KiB STANDARD links.csv
[2025-09-14 12:12:38 JST] 1.3MiB STANDARD movies.csv
[2025-09-14 12:13:15 JST] 509MiB STANDARD ratings.csv
[2025-09-14 12:12:42 JST] 16MiB STANDARD tags.csv
The project processes:
- movies.csv (1.3MiB) - Movie metadata
- tags.csv (16MiB) - User-generated tags
- ratings.csv (509MiB) - User ratings
Project Features
Assets Processed
- movies_pipeline: MovieLens movies data with primary key
movieId - ratings_pipeline: User ratings with composite primary key
[userId, movieId] - tags_pipeline: User tags with composite primary key
[userId, movieId, timestamp] - movielens_summary: Generates metadata summary of all processed assets
Smart Processing
- Table Existence Check: Uses DuckDB PostgreSQL scanner to check if tables already exist
- Skip Logic: If a table already contains data, the asset will skip processing to avoid reprocessing large files
- Write Disposition: Uses
replacemode for initial loads
Dependencies
dlt[duckdb,filesystem,postgres,s3]>=1.12.1dagsterand related libraries- duckdb (for table existence checking)
Environment Variables Required
The project expects the following environment variables to be set:
POSTGRES_URL: PostgreSQL connection string (format:postgresql://user:password@host:port/database)AWS_ACCESS_KEY_ID: MinIO/S3 access keyAWS_SECRET_ACCESS_KEY: MinIO/S3 secret keyAWS_ENDPOINT_URL: MinIO endpoint URL- Additional dlt-specific environment variables for advanced configuration
Debugging and Troubleshooting
Debug Commands
Check Dagster component logs using kubectl:
Pod Status and Logs
# Check Dagster pods status
kubectl get pods -n dagster
# View webserver logs
kubectl logs -n dagster deployment/dagster-dagster-webserver -c dagster-webserver --tail=100
# View daemon logs
kubectl logs -n dagster deployment/dagster-daemon -c dagster-daemon --tail=100
# View user code deployment logs (if using code servers)
kubectl logs -n dagster deployment/dagster-user-code -c dagster --tail=100
Configuration and Secrets
# Check workspace configuration
kubectl get configmap dagster-workspace-yaml -n dagster -o yaml
# Check database secret
kubectl describe secret dagster-database-secret -n dagster
# Check environment secrets (if configured)
kubectl describe secret dagster-env-secret -n dagster
# Check OAuth secrets
kubectl describe secret dagster-oauth-secret -n dagster
Common Issues
Assets Not Appearing
Symptoms: Project deployed but assets not visible in Dagster UI
Debugging Steps:
-
Check webserver logs for import errors:
kubectl logs -n dagster deployment/dagster-dagster-webserver -c dagster-webserver --tail=100 | grep -i error -
Verify workspace configuration:
kubectl get configmap dagster-workspace-yaml -n dagster -o jsonpath='{.data.workspace\.yaml}' -
Check project files in PVC:
WEBSERVER_POD=$(kubectl get pods -n dagster -l component=dagster-webserver -o jsonpath='{.items[0].metadata.name}') kubectl exec $WEBSERVER_POD -n dagster -- ls -la /opt/dagster/user-code/
Common Causes:
- Python syntax errors in project files
- Missing
definitions.pyfile - Incorrect module structure
- Project name contains hyphens
Asset Execution Failures
Symptoms: Assets appear but fail during materialization
Debugging Steps:
-
Check daemon logs for execution errors:
kubectl logs -n dagster deployment/dagster-daemon -c dagster-daemon --tail=100 -
Check environment variables in webserver:
WEBSERVER_POD=$(kubectl get pods -n dagster -l component=dagster-webserver -o jsonpath='{.items[0].metadata.name}') kubectl exec $WEBSERVER_POD -n dagster -- env | grep -E "(AWS|POSTGRES|DLT)" -
Test connectivity from pods:
# Test MinIO connectivity kubectl exec $WEBSERVER_POD -n dagster -- ping minio.minio.svc.cluster.local # Test PostgreSQL connectivity kubectl exec $WEBSERVER_POD -n dagster -- nc -zv postgres-cluster-rw.postgres 5432
Environment Variables Issues
Symptoms: Assets fail with authentication or connection errors
Debugging Steps:
-
Verify secret exists and contains data:
kubectl describe secret dagster-env-secret -n dagster -
Check if ExternalSecret is syncing (if using External Secrets):
kubectl get externalsecret dagster-env-external-secret -n dagster kubectl describe externalsecret dagster-env-external-secret -n dagster -
Verify environment variables are loaded in pods:
WEBSERVER_POD=$(kubectl get pods -n dagster -l component=dagster-webserver -o jsonpath='{.items[0].metadata.name}') kubectl exec $WEBSERVER_POD -n dagster -- printenv | grep -E "(AWS|POSTGRES|DLT)"
Authentication Issues
Symptoms: Cannot access Dagster UI or authentication failures
Debugging Steps:
-
Check OAuth2 proxy status:
kubectl get pods -n dagster -l app=oauth2-proxy kubectl logs -n dagster deployment/oauth2-proxy-dagster --tail=100 -
Verify OAuth client configuration in Keycloak:
- Ensure client
dagsterexists in the realm - Check redirect URIs are correctly configured
- Verify client secret matches
- Ensure client
-
Check OAuth secret:
kubectl describe secret dagster-oauth-secret -n dagster
Database Connection Issues
Symptoms: Database-related errors or connection failures
Debugging Steps:
-
Test database connectivity:
WEBSERVER_POD=$(kubectl get pods -n dagster -l component=dagster-webserver -o jsonpath='{.items[0].metadata.name}') kubectl exec $WEBSERVER_POD -n dagster -- python3 -c " import os import psycopg2 conn = psycopg2.connect( host='postgres-cluster-rw.postgres', port=5432, database='dagster', user=os.getenv('POSTGRES_USER', 'dagster'), password=os.getenv('POSTGRES_PASSWORD', '') ) print('Database connection successful') conn.close() " -
Check database secret:
kubectl describe secret dagster-database-secret -n dagster -
Verify database exists:
just postgres::psql -c "\l" | grep dagster
Connection Testing
MinIO Connectivity
# Test MinIO access from Dagster pod
WEBSERVER_POD=$(kubectl get pods -n dagster -l component=dagster-webserver -o jsonpath='{.items[0].metadata.name}')
kubectl exec $WEBSERVER_POD -n dagster -- python3 -c "
import boto3
import os
client = boto3.client('s3',
endpoint_url=os.getenv('AWS_ENDPOINT_URL'),
aws_access_key_id=os.getenv('AWS_ACCESS_KEY_ID'),
aws_secret_access_key=os.getenv('AWS_SECRET_ACCESS_KEY')
)
print('Buckets:', [b['Name'] for b in client.list_buckets()['Buckets']])
"
Log Analysis Tips
-
Filter logs by timestamp:
kubectl logs -n dagster deployment/dagster-dagster-webserver --since=10m -
Search for specific errors:
kubectl logs -n dagster deployment/dagster-daemon | grep -i "error\|exception\|failed" -
Monitor logs in real-time:
kubectl logs -n dagster deployment/dagster-dagster-webserver -f -
Check resource usage:
kubectl top pods -n dagster