docs(prometheus): write docs about Prometheus and Grafana
This commit is contained in:
18
README.md
18
README.md
@@ -27,6 +27,11 @@ A remotely accessible Kubernetes home lab with OIDC authentication. Build a mode
|
|||||||
- Automatically syncs secrets from Vault to Kubernetes Secrets
|
- Automatically syncs secrets from Vault to Kubernetes Secrets
|
||||||
- Provides secure secret rotation and lifecycle management
|
- Provides secure secret rotation and lifecycle management
|
||||||
|
|
||||||
|
### Observability (Optional)
|
||||||
|
|
||||||
|
- **[Prometheus](https://prometheus.io/)**: Metrics collection and alerting
|
||||||
|
- **[Grafana](https://grafana.com/)**: Metrics visualization and dashboards
|
||||||
|
|
||||||
### Storage (Optional)
|
### Storage (Optional)
|
||||||
|
|
||||||
- **[Longhorn](https://longhorn.io/)**: Distributed block storage
|
- **[Longhorn](https://longhorn.io/)**: Distributed block storage
|
||||||
@@ -127,6 +132,18 @@ Production-ready relational database:
|
|||||||
- **pgvector Extension**: Vector similarity search for AI/ML workloads
|
- **pgvector Extension**: Vector similarity search for AI/ML workloads
|
||||||
- **Multi-Tenant**: Shared database for Keycloak and applications
|
- **Multi-Tenant**: Shared database for Keycloak and applications
|
||||||
|
|
||||||
|
### Prometheus and Grafana
|
||||||
|
|
||||||
|
Comprehensive monitoring and observability stack:
|
||||||
|
|
||||||
|
- **Metrics Collection**: Prometheus server with Prometheus Operator
|
||||||
|
- **Visualization**: Grafana with customizable dashboards
|
||||||
|
- **Alerting**: Alertmanager for alert routing and management
|
||||||
|
- **Namespace-Based Control**: Explicit monitoring via labels
|
||||||
|
- **OIDC Integration**: Optional Keycloak authentication for Grafana
|
||||||
|
|
||||||
|
[📖 See Prometheus Documentation](./prometheus/README.md)
|
||||||
|
|
||||||
### External Secrets Operator
|
### External Secrets Operator
|
||||||
|
|
||||||
Kubernetes operator for secret synchronization:
|
Kubernetes operator for secret synchronization:
|
||||||
@@ -352,6 +369,7 @@ kubectl --context yourpc-oidc get nodes
|
|||||||
# Web interfaces
|
# Web interfaces
|
||||||
# Vault: https://vault.yourdomain.com
|
# Vault: https://vault.yourdomain.com
|
||||||
# Keycloak: https://auth.yourdomain.com
|
# Keycloak: https://auth.yourdomain.com
|
||||||
|
# Grafana: https://grafana.yourdomain.com
|
||||||
# Trino: https://trino.yourdomain.com
|
# Trino: https://trino.yourdomain.com
|
||||||
# Querybook: https://querybook.yourdomain.com
|
# Querybook: https://querybook.yourdomain.com
|
||||||
# Superset: https://superset.yourdomain.com
|
# Superset: https://superset.yourdomain.com
|
||||||
|
|||||||
439
prometheus/README.md
Normal file
439
prometheus/README.md
Normal file
@@ -0,0 +1,439 @@
|
|||||||
|
# Prometheus
|
||||||
|
|
||||||
|
Comprehensive monitoring and observability stack for Kubernetes:
|
||||||
|
|
||||||
|
- **Prometheus Operator**: Manages Prometheus instances via CRDs
|
||||||
|
- **Prometheus**: Time-series database and metrics collection
|
||||||
|
- **Grafana**: Visualization and dashboarding
|
||||||
|
- **Alertmanager**: Alert routing and management
|
||||||
|
- **Node Exporter**: Hardware and OS metrics
|
||||||
|
- **Kube State Metrics**: Kubernetes cluster state metrics
|
||||||
|
- **Namespace-based monitoring**: Explicit control via labels
|
||||||
|
- **OIDC authentication**: Optional Keycloak integration for Grafana
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
- Kubernetes cluster (k3s)
|
||||||
|
- External Secrets Operator (optional, for Vault integration)
|
||||||
|
- Vault (optional, for credential storage)
|
||||||
|
- Keycloak (optional, for Grafana OIDC authentication)
|
||||||
|
|
||||||
|
## Installation
|
||||||
|
|
||||||
|
```bash
|
||||||
|
just prometheus::install
|
||||||
|
```
|
||||||
|
|
||||||
|
You will be prompted for:
|
||||||
|
|
||||||
|
1. **Grafana host (FQDN)**: e.g., `grafana.example.com`
|
||||||
|
2. **Grafana admin password**: Auto-generated if not provided
|
||||||
|
|
||||||
|
### What Gets Installed
|
||||||
|
|
||||||
|
- Prometheus Operator and CRDs
|
||||||
|
- Prometheus server with namespace selector
|
||||||
|
- Grafana with ingress
|
||||||
|
- Alertmanager
|
||||||
|
- Node Exporter (DaemonSet)
|
||||||
|
- Kube State Metrics
|
||||||
|
- Default ServiceMonitors for Kubernetes components
|
||||||
|
|
||||||
|
The stack uses the official [kube-prometheus-stack Helm chart](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack).
|
||||||
|
|
||||||
|
## Access
|
||||||
|
|
||||||
|
### Grafana
|
||||||
|
|
||||||
|
Access Grafana at `https://your-grafana-host/`
|
||||||
|
|
||||||
|
**Default Credentials**:
|
||||||
|
|
||||||
|
- Username: `admin`
|
||||||
|
- Password: Retrieved via `just prometheus::admin-password`
|
||||||
|
|
||||||
|
### Prometheus
|
||||||
|
|
||||||
|
Prometheus Web UI is accessible internally within the cluster. For external access, set up port forwarding:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl port-forward -n monitoring svc/kube-prometheus-stack-prometheus 9090:9090
|
||||||
|
```
|
||||||
|
|
||||||
|
Then access at `http://localhost:9090`
|
||||||
|
|
||||||
|
### Alertmanager
|
||||||
|
|
||||||
|
Alertmanager is accessible internally within the cluster. For external access, set up port forwarding:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl port-forward -n monitoring svc/kube-prometheus-stack-alertmanager 9093:9093
|
||||||
|
```
|
||||||
|
|
||||||
|
Then access at `http://localhost:9093`
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
Environment variables (set in `.env.local` or override):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
PROMETHEUS_NAMESPACE=monitoring # Kubernetes namespace
|
||||||
|
PROMETHEUS_CHART_VERSION=79.4.0 # Helm chart version
|
||||||
|
GRAFANA_HOST=grafana.example.com # Grafana FQDN
|
||||||
|
PROMETHEUS_HOST=prometheus.example.com # Prometheus FQDN (optional)
|
||||||
|
ALERTMANAGER_HOST=alertmanager.example.com # Alertmanager FQDN (optional)
|
||||||
|
GRAFANA_ADMIN_PASSWORD= # Grafana admin password
|
||||||
|
GRAFANA_OIDC_ENABLED=false # Enable Keycloak OIDC
|
||||||
|
GRAFANA_OIDC_CLIENT_SECRET= # Keycloak client secret
|
||||||
|
KEYCLOAK_NAMESPACE=keycloak # Keycloak namespace
|
||||||
|
KEYCLOAK_REALM= # Keycloak realm
|
||||||
|
KEYCLOAK_HOST= # Keycloak host
|
||||||
|
```
|
||||||
|
|
||||||
|
## Features
|
||||||
|
|
||||||
|
### Namespace-Based Monitoring Control
|
||||||
|
|
||||||
|
By default, Prometheus only monitors namespaces with the label `buun.channel/enable-monitoring=true`. This provides explicit control over which resources are monitored.
|
||||||
|
|
||||||
|
**Enable monitoring for a namespace**:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl label namespace <namespace> buun.channel/enable-monitoring=true
|
||||||
|
```
|
||||||
|
|
||||||
|
**Disable monitoring for a namespace**:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl label namespace <namespace> buun.channel/enable-monitoring-
|
||||||
|
```
|
||||||
|
|
||||||
|
The monitoring namespace is automatically labeled during installation.
|
||||||
|
|
||||||
|
### ServiceMonitor and PodMonitor
|
||||||
|
|
||||||
|
Prometheus Operator uses `ServiceMonitor` and `PodMonitor` CRDs to configure metric scraping.
|
||||||
|
|
||||||
|
**Requirements for automatic discovery**:
|
||||||
|
|
||||||
|
1. ServiceMonitor/PodMonitor must be in a namespace with label `buun.channel/enable-monitoring=true`
|
||||||
|
2. ServiceMonitor/PodMonitor must have label `release=kube-prometheus-stack`
|
||||||
|
|
||||||
|
**Example ServiceMonitor**:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
apiVersion: monitoring.coreos.com/v1
|
||||||
|
kind: ServiceMonitor
|
||||||
|
metadata:
|
||||||
|
name: my-service
|
||||||
|
namespace: my-namespace
|
||||||
|
labels:
|
||||||
|
release: kube-prometheus-stack
|
||||||
|
spec:
|
||||||
|
selector:
|
||||||
|
matchLabels:
|
||||||
|
app: my-service
|
||||||
|
endpoints:
|
||||||
|
- port: metrics
|
||||||
|
path: /metrics
|
||||||
|
interval: 30s
|
||||||
|
```
|
||||||
|
|
||||||
|
**Example PodMonitor**:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
apiVersion: monitoring.coreos.com/v1
|
||||||
|
kind: PodMonitor
|
||||||
|
metadata:
|
||||||
|
name: my-pods
|
||||||
|
namespace: my-namespace
|
||||||
|
labels:
|
||||||
|
release: kube-prometheus-stack
|
||||||
|
spec:
|
||||||
|
selector:
|
||||||
|
matchLabels:
|
||||||
|
app: my-app
|
||||||
|
podMetricsEndpoints:
|
||||||
|
- port: metrics
|
||||||
|
path: /metrics
|
||||||
|
interval: 30s
|
||||||
|
```
|
||||||
|
|
||||||
|
### Metric Relabeling
|
||||||
|
|
||||||
|
Use `metricRelabelings` to transform metric names and labels before storing in Prometheus.
|
||||||
|
|
||||||
|
**Example: Rename metrics**:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
apiVersion: monitoring.coreos.com/v1
|
||||||
|
kind: ServiceMonitor
|
||||||
|
metadata:
|
||||||
|
name: keycloak
|
||||||
|
namespace: keycloak
|
||||||
|
labels:
|
||||||
|
release: kube-prometheus-stack
|
||||||
|
spec:
|
||||||
|
selector:
|
||||||
|
matchLabels:
|
||||||
|
app: keycloak
|
||||||
|
endpoints:
|
||||||
|
- port: management
|
||||||
|
path: /metrics
|
||||||
|
interval: 30s
|
||||||
|
metricRelabelings:
|
||||||
|
- sourceLabels: [__name__]
|
||||||
|
regex: 'vendor_(.*)'
|
||||||
|
targetLabel: __name__
|
||||||
|
replacement: 'keycloak_$1'
|
||||||
|
```
|
||||||
|
|
||||||
|
This configuration converts `vendor_*` metrics to `keycloak_*` for better discoverability.
|
||||||
|
|
||||||
|
## OIDC Authentication
|
||||||
|
|
||||||
|
### Setup Keycloak OIDC for Grafana
|
||||||
|
|
||||||
|
```bash
|
||||||
|
just prometheus::setup-oidc
|
||||||
|
```
|
||||||
|
|
||||||
|
This will:
|
||||||
|
|
||||||
|
1. Create Keycloak client `grafana`
|
||||||
|
2. Create `grafana-admins` group in Keycloak
|
||||||
|
3. Update Grafana configuration to use Keycloak OIDC
|
||||||
|
4. Restart Grafana with new settings
|
||||||
|
|
||||||
|
**Grant admin access to a user**:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
just keycloak::add-user-to-group <username> grafana-admins
|
||||||
|
```
|
||||||
|
|
||||||
|
Users in the `grafana-admins` group will have Grafana Admin role.
|
||||||
|
|
||||||
|
### Disable OIDC
|
||||||
|
|
||||||
|
```bash
|
||||||
|
just prometheus::disable-oidc
|
||||||
|
```
|
||||||
|
|
||||||
|
This will revert Grafana to local authentication.
|
||||||
|
|
||||||
|
## Management
|
||||||
|
|
||||||
|
### Get Grafana Admin Password
|
||||||
|
|
||||||
|
```bash
|
||||||
|
just prometheus::admin-password
|
||||||
|
```
|
||||||
|
|
||||||
|
### Upgrade Stack
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Update Helm values and upgrade
|
||||||
|
gomplate -f prometheus/values.gomplate.yaml -o prometheus/values.yaml
|
||||||
|
helm upgrade kube-prometheus-stack prometheus-community/kube-prometheus-stack \
|
||||||
|
--version 79.4.0 \
|
||||||
|
-n monitoring \
|
||||||
|
-f prometheus/values.yaml
|
||||||
|
```
|
||||||
|
|
||||||
|
### Uninstall
|
||||||
|
|
||||||
|
```bash
|
||||||
|
just prometheus::uninstall
|
||||||
|
```
|
||||||
|
|
||||||
|
This will remove:
|
||||||
|
|
||||||
|
- Helm release
|
||||||
|
- All Prometheus Operator CRDs
|
||||||
|
- Namespace
|
||||||
|
|
||||||
|
## Monitoring Examples
|
||||||
|
|
||||||
|
### PostgreSQL (CloudNativePG)
|
||||||
|
|
||||||
|
Enable monitoring for PostgreSQL cluster:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
just postgres::enable-monitoring
|
||||||
|
```
|
||||||
|
|
||||||
|
This creates a PodMonitor for the PostgreSQL cluster with proper labels.
|
||||||
|
|
||||||
|
### Keycloak
|
||||||
|
|
||||||
|
Enable monitoring for Keycloak:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
just keycloak::enable-monitoring
|
||||||
|
```
|
||||||
|
|
||||||
|
This creates a ServiceMonitor that:
|
||||||
|
|
||||||
|
- Scrapes metrics from Keycloak management port (9000)
|
||||||
|
- Converts `vendor_*` metrics to `keycloak_*` for better discoverability
|
||||||
|
|
||||||
|
### Custom Services
|
||||||
|
|
||||||
|
For services not managed by buun-stack justfiles:
|
||||||
|
|
||||||
|
1. **Label the namespace**:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl label namespace <namespace> buun.channel/enable-monitoring=true
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Create ServiceMonitor with proper labels**:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
apiVersion: monitoring.coreos.com/v1
|
||||||
|
kind: ServiceMonitor
|
||||||
|
metadata:
|
||||||
|
name: my-service
|
||||||
|
namespace: my-namespace
|
||||||
|
labels:
|
||||||
|
release: kube-prometheus-stack
|
||||||
|
spec:
|
||||||
|
selector:
|
||||||
|
matchLabels:
|
||||||
|
app: my-service
|
||||||
|
endpoints:
|
||||||
|
- port: metrics
|
||||||
|
path: /metrics
|
||||||
|
interval: 30s
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Verify target is discovered**:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl port-forward -n monitoring svc/kube-prometheus-stack-prometheus 9090:9090
|
||||||
|
# Open http://localhost:9090/targets in browser
|
||||||
|
```
|
||||||
|
|
||||||
|
## Grafana Dashboards
|
||||||
|
|
||||||
|
The stack includes default dashboards for:
|
||||||
|
|
||||||
|
- Kubernetes cluster overview
|
||||||
|
- Node metrics
|
||||||
|
- Pod metrics
|
||||||
|
- Persistent volumes
|
||||||
|
- StatefulSets
|
||||||
|
|
||||||
|
**Import additional dashboards**:
|
||||||
|
|
||||||
|
1. Go to Grafana → Dashboards → Import
|
||||||
|
2. Enter dashboard ID from [Grafana Dashboard Library](https://grafana.com/grafana/dashboards/)
|
||||||
|
3. Select Prometheus data source
|
||||||
|
4. Click Import
|
||||||
|
|
||||||
|
**Popular dashboard IDs**:
|
||||||
|
|
||||||
|
- `15757` - Kubernetes / Views / Global
|
||||||
|
- `15758` - Kubernetes / Views / Namespaces
|
||||||
|
- `15759` - Kubernetes / Views / Pods
|
||||||
|
- `3662` - Prometheus 2.0 Stats
|
||||||
|
- `12006` - Kubernetes API Server
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### ServiceMonitor Not Discovered
|
||||||
|
|
||||||
|
**Check namespace label**:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl get namespace <namespace> --show-labels
|
||||||
|
```
|
||||||
|
|
||||||
|
Should have `buun.channel/enable-monitoring=true`.
|
||||||
|
|
||||||
|
**Check ServiceMonitor labels**:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl get servicemonitor <name> -n <namespace> --show-labels
|
||||||
|
```
|
||||||
|
|
||||||
|
Should have `release=kube-prometheus-stack`.
|
||||||
|
|
||||||
|
**Check Prometheus targets**:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl port-forward -n monitoring svc/kube-prometheus-stack-prometheus 9090:9090
|
||||||
|
# Open http://localhost:9090/targets
|
||||||
|
```
|
||||||
|
|
||||||
|
### Metrics Not Appearing in Grafana
|
||||||
|
|
||||||
|
**Refresh Grafana metrics list**:
|
||||||
|
|
||||||
|
1. Hard refresh browser: Cmd+Shift+R (Mac) or Ctrl+Shift+R (Windows/Linux)
|
||||||
|
2. Wait a few minutes for Grafana's metric cache to update
|
||||||
|
3. Query metrics directly in Explore tab
|
||||||
|
|
||||||
|
**Verify metrics in Prometheus**:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl port-forward -n monitoring svc/kube-prometheus-stack-prometheus 9090:9090
|
||||||
|
# Open http://localhost:9090/graph
|
||||||
|
# Query your metrics directly
|
||||||
|
```
|
||||||
|
|
||||||
|
**Check metricRelabelings**:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# View Prometheus scrape config
|
||||||
|
kubectl exec -n monitoring prometheus-kube-prometheus-stack-prometheus-0 -- \
|
||||||
|
cat /etc/prometheus/config_out/prometheus.env.yaml | grep -A 20 "job_name: serviceMonitor/<namespace>/<name>"
|
||||||
|
```
|
||||||
|
|
||||||
|
### OIDC Authentication Issues
|
||||||
|
|
||||||
|
**Verify Keycloak client exists**:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
just keycloak::list-clients
|
||||||
|
```
|
||||||
|
|
||||||
|
Should show `grafana` client.
|
||||||
|
|
||||||
|
**Check redirect URL**:
|
||||||
|
|
||||||
|
The redirect URL should be `https://your-grafana-host/login/generic_oauth`.
|
||||||
|
|
||||||
|
**Verify user is in grafana-admins group**:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
just keycloak::add-user-to-group <username> grafana-admins
|
||||||
|
```
|
||||||
|
|
||||||
|
### Check Pod Status
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl get pods -n monitoring
|
||||||
|
```
|
||||||
|
|
||||||
|
### View Prometheus Logs
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl logs -n monitoring prometheus-kube-prometheus-stack-prometheus-0
|
||||||
|
```
|
||||||
|
|
||||||
|
### View Grafana Logs
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl logs -n monitoring deployment/kube-prometheus-stack-grafana
|
||||||
|
```
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- [kube-prometheus-stack Helm Chart](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack)
|
||||||
|
- [Prometheus Operator Documentation](https://prometheus-operator.dev/)
|
||||||
|
- [Prometheus Documentation](https://prometheus.io/docs/)
|
||||||
|
- [Grafana Documentation](https://grafana.com/docs/)
|
||||||
|
- [ServiceMonitor CRD](https://prometheus-operator.dev/docs/operator/api/#servicemonitor)
|
||||||
|
- [PodMonitor CRD](https://prometheus-operator.dev/docs/operator/api/#podmonitor)
|
||||||
|
- [Alertmanager Documentation](https://prometheus.io/docs/alerting/latest/alertmanager/)
|
||||||
Reference in New Issue
Block a user