From 9871b84d66edefb04b2736419bd68d147be55bb6 Mon Sep 17 00:00:00 2001 From: Masaki Yatsu Date: Sat, 8 Nov 2025 23:26:00 +0900 Subject: [PATCH] docs(prometheus): write docs about Prometheus and Grafana --- README.md | 18 ++ prometheus/README.md | 439 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 457 insertions(+) create mode 100644 prometheus/README.md diff --git a/README.md b/README.md index 849c7d9..99806a1 100644 --- a/README.md +++ b/README.md @@ -27,6 +27,11 @@ A remotely accessible Kubernetes home lab with OIDC authentication. Build a mode - Automatically syncs secrets from Vault to Kubernetes Secrets - Provides secure secret rotation and lifecycle management +### Observability (Optional) + +- **[Prometheus](https://prometheus.io/)**: Metrics collection and alerting +- **[Grafana](https://grafana.com/)**: Metrics visualization and dashboards + ### Storage (Optional) - **[Longhorn](https://longhorn.io/)**: Distributed block storage @@ -127,6 +132,18 @@ Production-ready relational database: - **pgvector Extension**: Vector similarity search for AI/ML workloads - **Multi-Tenant**: Shared database for Keycloak and applications +### Prometheus and Grafana + +Comprehensive monitoring and observability stack: + +- **Metrics Collection**: Prometheus server with Prometheus Operator +- **Visualization**: Grafana with customizable dashboards +- **Alerting**: Alertmanager for alert routing and management +- **Namespace-Based Control**: Explicit monitoring via labels +- **OIDC Integration**: Optional Keycloak authentication for Grafana + +[📖 See Prometheus Documentation](./prometheus/README.md) + ### External Secrets Operator Kubernetes operator for secret synchronization: @@ -352,6 +369,7 @@ kubectl --context yourpc-oidc get nodes # Web interfaces # Vault: https://vault.yourdomain.com # Keycloak: https://auth.yourdomain.com +# Grafana: https://grafana.yourdomain.com # Trino: https://trino.yourdomain.com # Querybook: https://querybook.yourdomain.com # Superset: https://superset.yourdomain.com diff --git a/prometheus/README.md b/prometheus/README.md new file mode 100644 index 0000000..d3b8e1f --- /dev/null +++ b/prometheus/README.md @@ -0,0 +1,439 @@ +# Prometheus + +Comprehensive monitoring and observability stack for Kubernetes: + +- **Prometheus Operator**: Manages Prometheus instances via CRDs +- **Prometheus**: Time-series database and metrics collection +- **Grafana**: Visualization and dashboarding +- **Alertmanager**: Alert routing and management +- **Node Exporter**: Hardware and OS metrics +- **Kube State Metrics**: Kubernetes cluster state metrics +- **Namespace-based monitoring**: Explicit control via labels +- **OIDC authentication**: Optional Keycloak integration for Grafana + +## Prerequisites + +- Kubernetes cluster (k3s) +- External Secrets Operator (optional, for Vault integration) +- Vault (optional, for credential storage) +- Keycloak (optional, for Grafana OIDC authentication) + +## Installation + +```bash +just prometheus::install +``` + +You will be prompted for: + +1. **Grafana host (FQDN)**: e.g., `grafana.example.com` +2. **Grafana admin password**: Auto-generated if not provided + +### What Gets Installed + +- Prometheus Operator and CRDs +- Prometheus server with namespace selector +- Grafana with ingress +- Alertmanager +- Node Exporter (DaemonSet) +- Kube State Metrics +- Default ServiceMonitors for Kubernetes components + +The stack uses the official [kube-prometheus-stack Helm chart](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack). + +## Access + +### Grafana + +Access Grafana at `https://your-grafana-host/` + +**Default Credentials**: + +- Username: `admin` +- Password: Retrieved via `just prometheus::admin-password` + +### Prometheus + +Prometheus Web UI is accessible internally within the cluster. For external access, set up port forwarding: + +```bash +kubectl port-forward -n monitoring svc/kube-prometheus-stack-prometheus 9090:9090 +``` + +Then access at `http://localhost:9090` + +### Alertmanager + +Alertmanager is accessible internally within the cluster. For external access, set up port forwarding: + +```bash +kubectl port-forward -n monitoring svc/kube-prometheus-stack-alertmanager 9093:9093 +``` + +Then access at `http://localhost:9093` + +## Configuration + +Environment variables (set in `.env.local` or override): + +```bash +PROMETHEUS_NAMESPACE=monitoring # Kubernetes namespace +PROMETHEUS_CHART_VERSION=79.4.0 # Helm chart version +GRAFANA_HOST=grafana.example.com # Grafana FQDN +PROMETHEUS_HOST=prometheus.example.com # Prometheus FQDN (optional) +ALERTMANAGER_HOST=alertmanager.example.com # Alertmanager FQDN (optional) +GRAFANA_ADMIN_PASSWORD= # Grafana admin password +GRAFANA_OIDC_ENABLED=false # Enable Keycloak OIDC +GRAFANA_OIDC_CLIENT_SECRET= # Keycloak client secret +KEYCLOAK_NAMESPACE=keycloak # Keycloak namespace +KEYCLOAK_REALM= # Keycloak realm +KEYCLOAK_HOST= # Keycloak host +``` + +## Features + +### Namespace-Based Monitoring Control + +By default, Prometheus only monitors namespaces with the label `buun.channel/enable-monitoring=true`. This provides explicit control over which resources are monitored. + +**Enable monitoring for a namespace**: + +```bash +kubectl label namespace buun.channel/enable-monitoring=true +``` + +**Disable monitoring for a namespace**: + +```bash +kubectl label namespace buun.channel/enable-monitoring- +``` + +The monitoring namespace is automatically labeled during installation. + +### ServiceMonitor and PodMonitor + +Prometheus Operator uses `ServiceMonitor` and `PodMonitor` CRDs to configure metric scraping. + +**Requirements for automatic discovery**: + +1. ServiceMonitor/PodMonitor must be in a namespace with label `buun.channel/enable-monitoring=true` +2. ServiceMonitor/PodMonitor must have label `release=kube-prometheus-stack` + +**Example ServiceMonitor**: + +```yaml +apiVersion: monitoring.coreos.com/v1 +kind: ServiceMonitor +metadata: + name: my-service + namespace: my-namespace + labels: + release: kube-prometheus-stack +spec: + selector: + matchLabels: + app: my-service + endpoints: + - port: metrics + path: /metrics + interval: 30s +``` + +**Example PodMonitor**: + +```yaml +apiVersion: monitoring.coreos.com/v1 +kind: PodMonitor +metadata: + name: my-pods + namespace: my-namespace + labels: + release: kube-prometheus-stack +spec: + selector: + matchLabels: + app: my-app + podMetricsEndpoints: + - port: metrics + path: /metrics + interval: 30s +``` + +### Metric Relabeling + +Use `metricRelabelings` to transform metric names and labels before storing in Prometheus. + +**Example: Rename metrics**: + +```yaml +apiVersion: monitoring.coreos.com/v1 +kind: ServiceMonitor +metadata: + name: keycloak + namespace: keycloak + labels: + release: kube-prometheus-stack +spec: + selector: + matchLabels: + app: keycloak + endpoints: + - port: management + path: /metrics + interval: 30s + metricRelabelings: + - sourceLabels: [__name__] + regex: 'vendor_(.*)' + targetLabel: __name__ + replacement: 'keycloak_$1' +``` + +This configuration converts `vendor_*` metrics to `keycloak_*` for better discoverability. + +## OIDC Authentication + +### Setup Keycloak OIDC for Grafana + +```bash +just prometheus::setup-oidc +``` + +This will: + +1. Create Keycloak client `grafana` +2. Create `grafana-admins` group in Keycloak +3. Update Grafana configuration to use Keycloak OIDC +4. Restart Grafana with new settings + +**Grant admin access to a user**: + +```bash +just keycloak::add-user-to-group grafana-admins +``` + +Users in the `grafana-admins` group will have Grafana Admin role. + +### Disable OIDC + +```bash +just prometheus::disable-oidc +``` + +This will revert Grafana to local authentication. + +## Management + +### Get Grafana Admin Password + +```bash +just prometheus::admin-password +``` + +### Upgrade Stack + +```bash +# Update Helm values and upgrade +gomplate -f prometheus/values.gomplate.yaml -o prometheus/values.yaml +helm upgrade kube-prometheus-stack prometheus-community/kube-prometheus-stack \ + --version 79.4.0 \ + -n monitoring \ + -f prometheus/values.yaml +``` + +### Uninstall + +```bash +just prometheus::uninstall +``` + +This will remove: + +- Helm release +- All Prometheus Operator CRDs +- Namespace + +## Monitoring Examples + +### PostgreSQL (CloudNativePG) + +Enable monitoring for PostgreSQL cluster: + +```bash +just postgres::enable-monitoring +``` + +This creates a PodMonitor for the PostgreSQL cluster with proper labels. + +### Keycloak + +Enable monitoring for Keycloak: + +```bash +just keycloak::enable-monitoring +``` + +This creates a ServiceMonitor that: + +- Scrapes metrics from Keycloak management port (9000) +- Converts `vendor_*` metrics to `keycloak_*` for better discoverability + +### Custom Services + +For services not managed by buun-stack justfiles: + +1. **Label the namespace**: + + ```bash + kubectl label namespace buun.channel/enable-monitoring=true + ``` + +2. **Create ServiceMonitor with proper labels**: + + ```yaml + apiVersion: monitoring.coreos.com/v1 + kind: ServiceMonitor + metadata: + name: my-service + namespace: my-namespace + labels: + release: kube-prometheus-stack + spec: + selector: + matchLabels: + app: my-service + endpoints: + - port: metrics + path: /metrics + interval: 30s + ``` + +3. **Verify target is discovered**: + + ```bash + kubectl port-forward -n monitoring svc/kube-prometheus-stack-prometheus 9090:9090 + # Open http://localhost:9090/targets in browser + ``` + +## Grafana Dashboards + +The stack includes default dashboards for: + +- Kubernetes cluster overview +- Node metrics +- Pod metrics +- Persistent volumes +- StatefulSets + +**Import additional dashboards**: + +1. Go to Grafana → Dashboards → Import +2. Enter dashboard ID from [Grafana Dashboard Library](https://grafana.com/grafana/dashboards/) +3. Select Prometheus data source +4. Click Import + +**Popular dashboard IDs**: + +- `15757` - Kubernetes / Views / Global +- `15758` - Kubernetes / Views / Namespaces +- `15759` - Kubernetes / Views / Pods +- `3662` - Prometheus 2.0 Stats +- `12006` - Kubernetes API Server + +## Troubleshooting + +### ServiceMonitor Not Discovered + +**Check namespace label**: + +```bash +kubectl get namespace --show-labels +``` + +Should have `buun.channel/enable-monitoring=true`. + +**Check ServiceMonitor labels**: + +```bash +kubectl get servicemonitor -n --show-labels +``` + +Should have `release=kube-prometheus-stack`. + +**Check Prometheus targets**: + +```bash +kubectl port-forward -n monitoring svc/kube-prometheus-stack-prometheus 9090:9090 +# Open http://localhost:9090/targets +``` + +### Metrics Not Appearing in Grafana + +**Refresh Grafana metrics list**: + +1. Hard refresh browser: Cmd+Shift+R (Mac) or Ctrl+Shift+R (Windows/Linux) +2. Wait a few minutes for Grafana's metric cache to update +3. Query metrics directly in Explore tab + +**Verify metrics in Prometheus**: + +```bash +kubectl port-forward -n monitoring svc/kube-prometheus-stack-prometheus 9090:9090 +# Open http://localhost:9090/graph +# Query your metrics directly +``` + +**Check metricRelabelings**: + +```bash +# View Prometheus scrape config +kubectl exec -n monitoring prometheus-kube-prometheus-stack-prometheus-0 -- \ + cat /etc/prometheus/config_out/prometheus.env.yaml | grep -A 20 "job_name: serviceMonitor//" +``` + +### OIDC Authentication Issues + +**Verify Keycloak client exists**: + +```bash +just keycloak::list-clients +``` + +Should show `grafana` client. + +**Check redirect URL**: + +The redirect URL should be `https://your-grafana-host/login/generic_oauth`. + +**Verify user is in grafana-admins group**: + +```bash +just keycloak::add-user-to-group grafana-admins +``` + +### Check Pod Status + +```bash +kubectl get pods -n monitoring +``` + +### View Prometheus Logs + +```bash +kubectl logs -n monitoring prometheus-kube-prometheus-stack-prometheus-0 +``` + +### View Grafana Logs + +```bash +kubectl logs -n monitoring deployment/kube-prometheus-stack-grafana +``` + +## References + +- [kube-prometheus-stack Helm Chart](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack) +- [Prometheus Operator Documentation](https://prometheus-operator.dev/) +- [Prometheus Documentation](https://prometheus.io/docs/) +- [Grafana Documentation](https://grafana.com/docs/) +- [ServiceMonitor CRD](https://prometheus-operator.dev/docs/operator/api/#servicemonitor) +- [PodMonitor CRD](https://prometheus-operator.dev/docs/operator/api/#podmonitor) +- [Alertmanager Documentation](https://prometheus.io/docs/alerting/latest/alertmanager/)