feat(goldilocks): install Goldilocks and VPA
This commit is contained in:
289
goldilocks/README.md
Normal file
289
goldilocks/README.md
Normal file
@@ -0,0 +1,289 @@
|
|||||||
|
# Goldilocks
|
||||||
|
|
||||||
|
Kubernetes resource recommendation dashboard powered by VPA:
|
||||||
|
|
||||||
|
- **Visual dashboard**: User-friendly web interface for VPA recommendations
|
||||||
|
- **Automatic VPA management**: Creates and manages VPA resources per namespace
|
||||||
|
- **Multi-workload view**: See all workload recommendations in one place
|
||||||
|
- **Quality of Service guidance**: Recommendations for Guaranteed, Burstable, and BestEffort QoS classes
|
||||||
|
- **OAuth2 authentication**: Keycloak integration for secure access
|
||||||
|
- **Namespace-based monitoring**: Explicit opt-in via labels
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
- Kubernetes cluster (k3s)
|
||||||
|
- VPA (Vertical Pod Autoscaler) installed
|
||||||
|
- Keycloak (for OAuth2 authentication)
|
||||||
|
|
||||||
|
Install VPA first:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
just vpa::install
|
||||||
|
```
|
||||||
|
|
||||||
|
## Installation
|
||||||
|
|
||||||
|
```bash
|
||||||
|
just goldilocks::install
|
||||||
|
```
|
||||||
|
|
||||||
|
You will be prompted for:
|
||||||
|
|
||||||
|
1. **Goldilocks host (FQDN)**: e.g., `goldilocks.example.com`
|
||||||
|
2. **OAuth2 Proxy setup**: Optional Keycloak authentication
|
||||||
|
|
||||||
|
### What Gets Installed
|
||||||
|
|
||||||
|
- Goldilocks controller (creates VPA resources)
|
||||||
|
- Goldilocks dashboard (web UI)
|
||||||
|
- OAuth2 Proxy (for authentication)
|
||||||
|
- IngressRoute (Traefik ingress)
|
||||||
|
|
||||||
|
## Access
|
||||||
|
|
||||||
|
Access the dashboard at `https://your-goldilocks-host/`
|
||||||
|
|
||||||
|
Authentication is handled via OAuth2 Proxy with Keycloak.
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
Environment variables (set in `.env.local` or override):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
GOLDILOCKS_NAMESPACE=goldilocks # Goldilocks namespace
|
||||||
|
GOLDILOCKS_HOST=goldilocks.example.com # Dashboard FQDN
|
||||||
|
VPA_NAMESPACE=vpa # VPA namespace
|
||||||
|
KEYCLOAK_REALM=buunstack # Keycloak realm
|
||||||
|
```
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
### Enable Monitoring for a Namespace
|
||||||
|
|
||||||
|
Goldilocks uses namespace labels to determine which namespaces to monitor:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
just goldilocks::enable-namespace <namespace>
|
||||||
|
```
|
||||||
|
|
||||||
|
Or directly with kubectl:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl label namespace <namespace> goldilocks.fairwinds.com/enabled=true
|
||||||
|
```
|
||||||
|
|
||||||
|
### Disable Monitoring for a Namespace
|
||||||
|
|
||||||
|
```bash
|
||||||
|
just goldilocks::disable-namespace <namespace>
|
||||||
|
```
|
||||||
|
|
||||||
|
### View Monitored Namespaces
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl get namespaces -l goldilocks.fairwinds.com/enabled=true
|
||||||
|
```
|
||||||
|
|
||||||
|
### Check Status
|
||||||
|
|
||||||
|
```bash
|
||||||
|
just goldilocks::status
|
||||||
|
```
|
||||||
|
|
||||||
|
### View Logs
|
||||||
|
|
||||||
|
```bash
|
||||||
|
just goldilocks::logs-controller # Controller logs
|
||||||
|
just goldilocks::logs-dashboard # Dashboard logs
|
||||||
|
```
|
||||||
|
|
||||||
|
## Dashboard Features
|
||||||
|
|
||||||
|
### Recommendation Views
|
||||||
|
|
||||||
|
The dashboard shows recommendations for three Quality of Service (QoS) classes:
|
||||||
|
|
||||||
|
1. **Guaranteed QoS**: Requests = Limits (highest priority, no overcommit)
|
||||||
|
2. **Burstable QoS**: Requests < Limits (allows bursting, some overcommit)
|
||||||
|
3. **BestEffort QoS**: No requests/limits (lowest priority, full overcommit)
|
||||||
|
|
||||||
|
### Workload Information
|
||||||
|
|
||||||
|
For each workload, the dashboard displays:
|
||||||
|
|
||||||
|
- Current resource settings (requests and limits)
|
||||||
|
- VPA recommendations (based on actual usage)
|
||||||
|
- Quality of Service class
|
||||||
|
- Container-level breakdowns
|
||||||
|
|
||||||
|
### Applying Recommendations
|
||||||
|
|
||||||
|
Goldilocks is read-only and does not modify workloads. To apply recommendations:
|
||||||
|
|
||||||
|
1. Review recommendations in the dashboard
|
||||||
|
2. Manually update your Deployment/StatefulSet manifests
|
||||||
|
3. Apply changes via kubectl or your CI/CD pipeline
|
||||||
|
|
||||||
|
## OAuth2 Proxy Management
|
||||||
|
|
||||||
|
### Setup OAuth2 Proxy
|
||||||
|
|
||||||
|
If not configured during installation:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
just goldilocks::setup-oauth2-proxy
|
||||||
|
```
|
||||||
|
|
||||||
|
### Remove OAuth2 Proxy
|
||||||
|
|
||||||
|
To disable authentication:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
just goldilocks::remove-oauth2-proxy
|
||||||
|
```
|
||||||
|
|
||||||
|
Note: This will make the dashboard publicly accessible. Use port-forward instead:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
just goldilocks::port-forward
|
||||||
|
```
|
||||||
|
|
||||||
|
Then access at `http://localhost:8080`
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
|
||||||
|
### Monitor Application Namespaces
|
||||||
|
|
||||||
|
Enable monitoring for common application namespaces:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
just goldilocks::enable-namespace dagster
|
||||||
|
just goldilocks::enable-namespace trino
|
||||||
|
just goldilocks::enable-namespace metabase
|
||||||
|
just goldilocks::enable-namespace jupyterhub
|
||||||
|
```
|
||||||
|
|
||||||
|
### Bulk Enable Monitoring
|
||||||
|
|
||||||
|
Enable monitoring for multiple namespaces at once:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
for ns in dagster trino metabase superset querybook; do
|
||||||
|
kubectl label namespace $ns goldilocks.fairwinds.com/enabled=true --overwrite
|
||||||
|
done
|
||||||
|
```
|
||||||
|
|
||||||
|
### Verify VPA Resources Created
|
||||||
|
|
||||||
|
After enabling a namespace, Goldilocks automatically creates VPA resources:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl get vpa -n <namespace>
|
||||||
|
```
|
||||||
|
|
||||||
|
Each workload (Deployment, StatefulSet, etc.) should have a corresponding VPA resource.
|
||||||
|
|
||||||
|
## Management
|
||||||
|
|
||||||
|
### Port Forward to Dashboard
|
||||||
|
|
||||||
|
For local access without OAuth2 Proxy:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
just goldilocks::port-forward
|
||||||
|
```
|
||||||
|
|
||||||
|
Access at `http://localhost:8080`
|
||||||
|
|
||||||
|
### Uninstall
|
||||||
|
|
||||||
|
```bash
|
||||||
|
just goldilocks::uninstall
|
||||||
|
```
|
||||||
|
|
||||||
|
This removes:
|
||||||
|
|
||||||
|
- OAuth2 Proxy
|
||||||
|
- Helm release
|
||||||
|
- VPA resources created by Goldilocks
|
||||||
|
- Namespace
|
||||||
|
|
||||||
|
Note: VPA installation itself is not removed. Uninstall separately if needed:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
just vpa::uninstall
|
||||||
|
```
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Dashboard Shows "No namespaces are labelled"
|
||||||
|
|
||||||
|
No namespaces have monitoring enabled. Label at least one namespace:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
just goldilocks::enable-namespace default
|
||||||
|
```
|
||||||
|
|
||||||
|
### No Recommendations Displayed
|
||||||
|
|
||||||
|
VPA needs time to collect metrics and generate recommendations:
|
||||||
|
|
||||||
|
- Wait 5-10 minutes after enabling a namespace
|
||||||
|
- Ensure workloads are actively running
|
||||||
|
- Verify VPA resources exist: `kubectl get vpa -n <namespace>`
|
||||||
|
- Check VPA recommender logs: `just vpa::logs-recommender`
|
||||||
|
|
||||||
|
### VPA Resources Not Created
|
||||||
|
|
||||||
|
Verify Goldilocks controller is running:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl get pods -n goldilocks -l app.kubernetes.io/component=controller
|
||||||
|
```
|
||||||
|
|
||||||
|
Check controller logs:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
just goldilocks::logs-controller
|
||||||
|
```
|
||||||
|
|
||||||
|
Ensure namespace has the correct label:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl get namespace <namespace> --show-labels
|
||||||
|
```
|
||||||
|
|
||||||
|
### OAuth2 Proxy Authentication Issues
|
||||||
|
|
||||||
|
Verify Keycloak client exists:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
just keycloak::list-clients | grep goldilocks
|
||||||
|
```
|
||||||
|
|
||||||
|
Check OAuth2 Proxy logs:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl logs -n goldilocks -l app=goldilocks-oauth2-proxy
|
||||||
|
```
|
||||||
|
|
||||||
|
### Dashboard Not Accessible
|
||||||
|
|
||||||
|
Check IngressRoute:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl get ingressroute -n goldilocks
|
||||||
|
```
|
||||||
|
|
||||||
|
Verify OAuth2 Proxy is running:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl get pods -n goldilocks -l app=goldilocks-oauth2-proxy
|
||||||
|
```
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- [Goldilocks Documentation](https://goldilocks.docs.fairwinds.com/)
|
||||||
|
- [Goldilocks GitHub Repository](https://github.com/FairwindsOps/goldilocks)
|
||||||
|
- [Fairwinds Goldilocks Helm Chart](https://github.com/FairwindsOps/charts/tree/master/stable/goldilocks)
|
||||||
|
- [Kubernetes VPA Documentation](https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler)
|
||||||
161
goldilocks/justfile
Normal file
161
goldilocks/justfile
Normal file
@@ -0,0 +1,161 @@
|
|||||||
|
set fallback := true
|
||||||
|
|
||||||
|
export GOLDILOCKS_NAMESPACE := env("GOLDILOCKS_NAMESPACE", "goldilocks")
|
||||||
|
export VPA_NAMESPACE := env("VPA_NAMESPACE", "vpa")
|
||||||
|
export GOLDILOCKS_HOST := env("GOLDILOCKS_HOST", "")
|
||||||
|
export KEYCLOAK_REALM := env("KEYCLOAK_REALM", "buunstack")
|
||||||
|
|
||||||
|
[private]
|
||||||
|
default:
|
||||||
|
@just --list --unsorted --list-submodules
|
||||||
|
|
||||||
|
# Add Helm repository
|
||||||
|
add-helm-repo:
|
||||||
|
helm repo add fairwinds-stable https://charts.fairwinds.com/stable
|
||||||
|
helm repo update
|
||||||
|
|
||||||
|
# Remove Helm repository
|
||||||
|
remove-helm-repo:
|
||||||
|
helm repo remove fairwinds-stable
|
||||||
|
|
||||||
|
# Create namespace
|
||||||
|
create-namespace:
|
||||||
|
@kubectl get namespace ${GOLDILOCKS_NAMESPACE} &>/dev/null || \
|
||||||
|
kubectl create namespace ${GOLDILOCKS_NAMESPACE}
|
||||||
|
|
||||||
|
# Delete namespace
|
||||||
|
delete-namespace:
|
||||||
|
@kubectl delete namespace ${GOLDILOCKS_NAMESPACE} --ignore-not-found
|
||||||
|
|
||||||
|
# Setup OAuth2 Proxy for Goldilocks authentication
|
||||||
|
setup-oauth2-proxy:
|
||||||
|
#!/bin/bash
|
||||||
|
set -euo pipefail
|
||||||
|
export GOLDILOCKS_HOST=${GOLDILOCKS_HOST:-}
|
||||||
|
while [ -z "${GOLDILOCKS_HOST}" ]; do
|
||||||
|
GOLDILOCKS_HOST=$(
|
||||||
|
gum input --prompt="Goldilocks host (FQDN): " --width=100 \
|
||||||
|
--placeholder="e.g., goldilocks.example.com"
|
||||||
|
)
|
||||||
|
done
|
||||||
|
echo "Setting up OAuth2 Proxy for Goldilocks..."
|
||||||
|
just oauth2-proxy::setup-for-app goldilocks "${GOLDILOCKS_HOST}" "${GOLDILOCKS_NAMESPACE}" "goldilocks-dashboard:80"
|
||||||
|
echo "OAuth2 Proxy setup completed"
|
||||||
|
|
||||||
|
# Install OAuth2 Proxy for Goldilocks authentication
|
||||||
|
install-oauth2-proxy:
|
||||||
|
just setup-oauth2-proxy
|
||||||
|
|
||||||
|
# Remove OAuth2 Proxy
|
||||||
|
remove-oauth2-proxy:
|
||||||
|
just oauth2-proxy::remove-for-app goldilocks ${GOLDILOCKS_NAMESPACE}
|
||||||
|
|
||||||
|
# Install Goldilocks
|
||||||
|
install:
|
||||||
|
#!/bin/bash
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
# Check if VPA is installed
|
||||||
|
if ! helm status vpa -n ${VPA_NAMESPACE} &>/dev/null; then
|
||||||
|
echo "Error: VPA is not installed."
|
||||||
|
echo "Please install VPA first using: just vpa::install"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ -z "${GOLDILOCKS_HOST}" ]; then
|
||||||
|
while [ -z "${GOLDILOCKS_HOST}" ]; do
|
||||||
|
GOLDILOCKS_HOST=$(
|
||||||
|
gum input --prompt="Goldilocks host (FQDN): " --width=100 \
|
||||||
|
--placeholder="e.g., goldilocks.example.com"
|
||||||
|
)
|
||||||
|
done
|
||||||
|
just env::set GOLDILOCKS_HOST="${GOLDILOCKS_HOST}"
|
||||||
|
fi
|
||||||
|
|
||||||
|
just add-helm-repo
|
||||||
|
just create-namespace
|
||||||
|
|
||||||
|
# Generate values.yaml from template
|
||||||
|
gomplate -f values.gomplate.yaml -o values.yaml
|
||||||
|
|
||||||
|
# Install Goldilocks with Helm
|
||||||
|
helm upgrade --install goldilocks fairwinds-stable/goldilocks \
|
||||||
|
--namespace ${GOLDILOCKS_NAMESPACE} \
|
||||||
|
--values values.yaml \
|
||||||
|
--wait
|
||||||
|
|
||||||
|
echo "Goldilocks installed successfully in namespace: ${GOLDILOCKS_NAMESPACE}"
|
||||||
|
echo ""
|
||||||
|
echo "To enable monitoring for a namespace, add a label:"
|
||||||
|
echo " kubectl label namespace <namespace> goldilocks.fairwinds.com/enabled=true"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
if gum confirm "Set up Keycloak authentication with OAuth2 proxy?"; then
|
||||||
|
export GOLDILOCKS_HOST="${GOLDILOCKS_HOST}"
|
||||||
|
just setup-oauth2-proxy
|
||||||
|
else
|
||||||
|
echo "Access Goldilocks at: https://${GOLDILOCKS_HOST}"
|
||||||
|
echo "Post-installation notes:"
|
||||||
|
echo " • Run 'just goldilocks::setup-oauth2-proxy' later to enable Keycloak authentication"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Uninstall Goldilocks
|
||||||
|
uninstall:
|
||||||
|
#!/bin/bash
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
if ! helm status goldilocks -n ${GOLDILOCKS_NAMESPACE} &>/dev/null; then
|
||||||
|
echo "Goldilocks is not installed."
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
if command -v gum &>/dev/null; then
|
||||||
|
if ! gum confirm "Are you sure you want to uninstall Goldilocks?"; then
|
||||||
|
echo "Uninstall cancelled."
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
else
|
||||||
|
read -p "Are you sure you want to uninstall Goldilocks? (y/N) " -n 1 -r
|
||||||
|
echo
|
||||||
|
if [[ ! $REPLY =~ ^[Yy]$ ]]; then
|
||||||
|
echo "Uninstall cancelled."
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "Uninstalling Goldilocks..."
|
||||||
|
just remove-oauth2-proxy
|
||||||
|
helm uninstall goldilocks -n ${GOLDILOCKS_NAMESPACE}
|
||||||
|
just delete-namespace
|
||||||
|
|
||||||
|
echo "Goldilocks uninstalled successfully."
|
||||||
|
|
||||||
|
# Show Goldilocks status
|
||||||
|
status:
|
||||||
|
@echo "=== Goldilocks Components ==="
|
||||||
|
@kubectl get pods -n ${GOLDILOCKS_NAMESPACE} 2>/dev/null || echo "Goldilocks not installed"
|
||||||
|
@echo ""
|
||||||
|
@echo "=== Monitored Namespaces ==="
|
||||||
|
@kubectl get namespaces -l goldilocks.fairwinds.com/enabled=true 2>/dev/null || echo "No namespaces labeled for monitoring"
|
||||||
|
|
||||||
|
# Enable monitoring for a namespace
|
||||||
|
enable-namespace namespace:
|
||||||
|
@kubectl label namespace {{ namespace }} goldilocks.fairwinds.com/enabled=true --overwrite
|
||||||
|
@echo "Monitoring enabled for namespace: {{ namespace }}"
|
||||||
|
|
||||||
|
# Disable monitoring for a namespace
|
||||||
|
disable-namespace namespace:
|
||||||
|
@kubectl label namespace {{ namespace }} goldilocks.fairwinds.com/enabled- --ignore-not-found
|
||||||
|
@echo "Monitoring disabled for namespace: {{ namespace }}"
|
||||||
|
|
||||||
|
# Show controller logs
|
||||||
|
logs-controller:
|
||||||
|
kubectl logs -n ${GOLDILOCKS_NAMESPACE} -l app.kubernetes.io/component=controller -f
|
||||||
|
|
||||||
|
# Show dashboard logs
|
||||||
|
logs-dashboard:
|
||||||
|
kubectl logs -n ${GOLDILOCKS_NAMESPACE} -l app.kubernetes.io/component=dashboard -f
|
||||||
|
|
||||||
|
# Port-forward to dashboard
|
||||||
|
port-forward:
|
||||||
|
kubectl -n ${GOLDILOCKS_NAMESPACE} port-forward svc/goldilocks-dashboard 8080:80
|
||||||
48
goldilocks/values.gomplate.yaml
Normal file
48
goldilocks/values.gomplate.yaml
Normal file
@@ -0,0 +1,48 @@
|
|||||||
|
controller:
|
||||||
|
enabled: true
|
||||||
|
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
cpu: 25m
|
||||||
|
memory: 32Mi
|
||||||
|
limits:
|
||||||
|
cpu: 100m
|
||||||
|
memory: 128Mi
|
||||||
|
|
||||||
|
dashboard:
|
||||||
|
enabled: true
|
||||||
|
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
cpu: 25m
|
||||||
|
memory: 32Mi
|
||||||
|
limits:
|
||||||
|
cpu: 100m
|
||||||
|
memory: 128Mi
|
||||||
|
|
||||||
|
service:
|
||||||
|
type: ClusterIP
|
||||||
|
port: 80
|
||||||
|
|
||||||
|
# Ingress is managed by oauth2-proxy
|
||||||
|
ingress:
|
||||||
|
enabled: false
|
||||||
|
|
||||||
|
rbac:
|
||||||
|
create: true
|
||||||
|
|
||||||
|
serviceAccount:
|
||||||
|
create: true
|
||||||
|
|
||||||
|
podSecurityContext:
|
||||||
|
runAsNonRoot: true
|
||||||
|
runAsUser: 65534
|
||||||
|
seccompProfile:
|
||||||
|
type: RuntimeDefault
|
||||||
|
|
||||||
|
securityContext:
|
||||||
|
readOnlyRootFilesystem: true
|
||||||
|
allowPrivilegeEscalation: false
|
||||||
|
capabilities:
|
||||||
|
drop:
|
||||||
|
- ALL
|
||||||
48
goldilocks/values.yaml
Normal file
48
goldilocks/values.yaml
Normal file
@@ -0,0 +1,48 @@
|
|||||||
|
controller:
|
||||||
|
enabled: true
|
||||||
|
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
cpu: 25m
|
||||||
|
memory: 32Mi
|
||||||
|
limits:
|
||||||
|
cpu: 100m
|
||||||
|
memory: 128Mi
|
||||||
|
|
||||||
|
dashboard:
|
||||||
|
enabled: true
|
||||||
|
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
cpu: 25m
|
||||||
|
memory: 32Mi
|
||||||
|
limits:
|
||||||
|
cpu: 100m
|
||||||
|
memory: 128Mi
|
||||||
|
|
||||||
|
service:
|
||||||
|
type: ClusterIP
|
||||||
|
port: 80
|
||||||
|
|
||||||
|
# Ingress is managed by oauth2-proxy
|
||||||
|
ingress:
|
||||||
|
enabled: false
|
||||||
|
|
||||||
|
rbac:
|
||||||
|
create: true
|
||||||
|
|
||||||
|
serviceAccount:
|
||||||
|
create: true
|
||||||
|
|
||||||
|
podSecurityContext:
|
||||||
|
runAsNonRoot: true
|
||||||
|
runAsUser: 65534
|
||||||
|
seccompProfile:
|
||||||
|
type: RuntimeDefault
|
||||||
|
|
||||||
|
securityContext:
|
||||||
|
readOnlyRootFilesystem: true
|
||||||
|
allowPrivilegeEscalation: false
|
||||||
|
capabilities:
|
||||||
|
drop:
|
||||||
|
- ALL
|
||||||
2
justfile
2
justfile
@@ -13,6 +13,7 @@ mod dagster
|
|||||||
mod datahub
|
mod datahub
|
||||||
mod env
|
mod env
|
||||||
mod external-secrets
|
mod external-secrets
|
||||||
|
mod goldilocks
|
||||||
mod keycloak
|
mod keycloak
|
||||||
mod jupyterhub
|
mod jupyterhub
|
||||||
mod k8s
|
mod k8s
|
||||||
@@ -31,5 +32,6 @@ mod superset
|
|||||||
mod trino
|
mod trino
|
||||||
mod utils
|
mod utils
|
||||||
mod vault
|
mod vault
|
||||||
|
mod vpa
|
||||||
|
|
||||||
import? "custom.just"
|
import? "custom.just"
|
||||||
|
|||||||
238
vpa/README.md
Normal file
238
vpa/README.md
Normal file
@@ -0,0 +1,238 @@
|
|||||||
|
# Vertical Pod Autoscaler (VPA)
|
||||||
|
|
||||||
|
Kubernetes resource monitoring and recommendation system:
|
||||||
|
|
||||||
|
- **Monitoring-only mode**: Observes workloads without automatic scaling
|
||||||
|
- **Prometheus integration**: Metrics collection via Prometheus instead of metrics-server
|
||||||
|
- **Resource recommendations**: Generates CPU and memory suggestions based on actual usage
|
||||||
|
- **Goldilocks integration**: Works with Goldilocks dashboard for visualization
|
||||||
|
- **Non-intrusive**: Does not modify running workloads
|
||||||
|
|
||||||
|
## Important Note
|
||||||
|
|
||||||
|
**This VPA installation is configured for monitoring and recommendation only**:
|
||||||
|
|
||||||
|
- ✅ **Recommender**: Enabled - Analyzes workload metrics and generates recommendations
|
||||||
|
- ❌ **Updater**: Disabled - Does NOT automatically apply recommendations to pods
|
||||||
|
- ❌ **Admission Controller**: Disabled - Does NOT modify pod resources at creation time
|
||||||
|
|
||||||
|
This configuration ensures VPA observes your workloads without affecting them. You can review recommendations and manually adjust resource settings.
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
- Kubernetes cluster (k3s)
|
||||||
|
- Prometheus (kube-prometheus-stack) installed
|
||||||
|
|
||||||
|
VPA requires Prometheus to collect historical metrics data. Install Prometheus first:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
just prometheus::install
|
||||||
|
```
|
||||||
|
|
||||||
|
## Installation
|
||||||
|
|
||||||
|
```bash
|
||||||
|
just vpa::install
|
||||||
|
```
|
||||||
|
|
||||||
|
The installation will automatically detect Prometheus and configure VPA to use it as the metrics source.
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
Environment variables (set in `.env.local` or override):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
VPA_NAMESPACE=vpa # VPA namespace
|
||||||
|
PROMETHEUS_NAMESPACE=monitoring # Prometheus namespace
|
||||||
|
PROMETHEUS_ADDRESS=http://kube-prometheus-stack-prometheus.monitoring.svc:9090 # Prometheus URL
|
||||||
|
```
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
### View VPA Status
|
||||||
|
|
||||||
|
```bash
|
||||||
|
just vpa::status
|
||||||
|
```
|
||||||
|
|
||||||
|
### View Recommender Logs
|
||||||
|
|
||||||
|
```bash
|
||||||
|
just vpa::logs-recommender
|
||||||
|
```
|
||||||
|
|
||||||
|
### View VPA Resources
|
||||||
|
|
||||||
|
List all VPA resources across namespaces:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl get vpa -A
|
||||||
|
```
|
||||||
|
|
||||||
|
View specific VPA recommendations:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl describe vpa <vpa-name> -n <namespace>
|
||||||
|
```
|
||||||
|
|
||||||
|
Get recommendation in JSON format:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl get vpa <vpa-name> -n <namespace> -o jsonpath='{.status.recommendation}' | jq
|
||||||
|
```
|
||||||
|
|
||||||
|
### Manual VPA Resource Creation
|
||||||
|
|
||||||
|
Create a VPA resource for monitoring:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
apiVersion: autoscaling.k8s.io/v1
|
||||||
|
kind: VerticalPodAutoscaler
|
||||||
|
metadata:
|
||||||
|
name: my-app-vpa
|
||||||
|
namespace: default
|
||||||
|
spec:
|
||||||
|
targetRef:
|
||||||
|
apiVersion: apps/v1
|
||||||
|
kind: Deployment
|
||||||
|
name: my-app
|
||||||
|
updatePolicy:
|
||||||
|
updateMode: "Off" # Monitoring only
|
||||||
|
```
|
||||||
|
|
||||||
|
Apply with:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl apply -f vpa-resource.yaml
|
||||||
|
```
|
||||||
|
|
||||||
|
## Integration with Goldilocks
|
||||||
|
|
||||||
|
VPA alone provides raw recommendations through kubectl commands. For a user-friendly dashboard experience, use Goldilocks:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Install Goldilocks
|
||||||
|
just goldilocks::install
|
||||||
|
|
||||||
|
# Enable monitoring for specific namespaces
|
||||||
|
just goldilocks::enable-namespace <namespace>
|
||||||
|
```
|
||||||
|
|
||||||
|
Goldilocks automatically creates VPA resources for all workloads in labeled namespaces and presents recommendations in a web dashboard.
|
||||||
|
|
||||||
|
## Enabling Automatic Scaling
|
||||||
|
|
||||||
|
If you want to enable automatic pod resource updates, modify `values.gomplate.yaml`:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
updater:
|
||||||
|
enabled: true
|
||||||
|
replicaCount: 1
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
cpu: 50m
|
||||||
|
memory: 500Mi
|
||||||
|
limits:
|
||||||
|
cpu: 200m
|
||||||
|
memory: 1Gi
|
||||||
|
podMonitor:
|
||||||
|
enabled: true
|
||||||
|
|
||||||
|
admissionController:
|
||||||
|
enabled: true
|
||||||
|
replicaCount: 1
|
||||||
|
generateCertificate: true
|
||||||
|
mutatingWebhookConfiguration:
|
||||||
|
failurePolicy: Ignore
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
cpu: 50m
|
||||||
|
memory: 200Mi
|
||||||
|
limits:
|
||||||
|
cpu: 200m
|
||||||
|
memory: 500Mi
|
||||||
|
```
|
||||||
|
|
||||||
|
Then reinstall:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
just vpa::install
|
||||||
|
```
|
||||||
|
|
||||||
|
⚠️ **Warning**: Enabling updater and admission controller will cause VPA to automatically modify pod resources. Test thoroughly before enabling in production.
|
||||||
|
|
||||||
|
## VPA Update Modes
|
||||||
|
|
||||||
|
VPA supports three update modes (configured in VPA resource):
|
||||||
|
|
||||||
|
- **Off** (Monitoring only - Current configuration): Generates recommendations but does not apply them
|
||||||
|
- **Initial**: Applies recommendations only when pods are created
|
||||||
|
- **Auto**: Automatically applies recommendations by evicting and recreating pods
|
||||||
|
|
||||||
|
## Management
|
||||||
|
|
||||||
|
### Uninstall
|
||||||
|
|
||||||
|
```bash
|
||||||
|
just vpa::uninstall
|
||||||
|
```
|
||||||
|
|
||||||
|
This removes:
|
||||||
|
|
||||||
|
- Helm release
|
||||||
|
- VPA CRDs
|
||||||
|
- Namespace
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Recommender Not Starting
|
||||||
|
|
||||||
|
Check Prometheus connectivity:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
just vpa::logs-recommender
|
||||||
|
```
|
||||||
|
|
||||||
|
Verify Prometheus is running:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl get pods -n monitoring -l app.kubernetes.io/name=prometheus
|
||||||
|
```
|
||||||
|
|
||||||
|
### No Recommendations Generated
|
||||||
|
|
||||||
|
VPA requires workload metrics over time:
|
||||||
|
|
||||||
|
- Minimum: A few minutes of runtime
|
||||||
|
- Recommended: 24+ hours for accurate recommendations
|
||||||
|
|
||||||
|
Verify workload is running and generating metrics:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl get pods -n <namespace>
|
||||||
|
kubectl top pods -n <namespace>
|
||||||
|
```
|
||||||
|
|
||||||
|
### VPA Resource Not Created
|
||||||
|
|
||||||
|
For Goldilocks-managed VPA resources, ensure:
|
||||||
|
|
||||||
|
1. Namespace has label: `goldilocks.fairwinds.com/enabled=true`
|
||||||
|
2. Workload is managed by a controller (Deployment, StatefulSet, etc.)
|
||||||
|
3. Goldilocks controller is running: `kubectl get pods -n goldilocks`
|
||||||
|
|
||||||
|
### Check VPA Components
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl get pods -n vpa
|
||||||
|
```
|
||||||
|
|
||||||
|
Should show:
|
||||||
|
|
||||||
|
- `vpa-recommender-*`: Running
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- [Kubernetes VPA Documentation](https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler)
|
||||||
|
- [Fairwinds VPA Helm Chart](https://github.com/FairwindsOps/charts/tree/master/stable/vpa)
|
||||||
|
- [VPA Design Proposals](https://github.com/kubernetes/design-proposals-archive/blob/main/autoscaling/vertical-pod-autoscaler.md)
|
||||||
89
vpa/justfile
Normal file
89
vpa/justfile
Normal file
@@ -0,0 +1,89 @@
|
|||||||
|
set fallback := true
|
||||||
|
|
||||||
|
export VPA_NAMESPACE := env("VPA_NAMESPACE", "vpa")
|
||||||
|
export PROMETHEUS_NAMESPACE := env("PROMETHEUS_NAMESPACE", "monitoring")
|
||||||
|
export PROMETHEUS_ADDRESS := env("PROMETHEUS_ADDRESS", "http://kube-prometheus-stack-prometheus." + PROMETHEUS_NAMESPACE + ".svc:9090")
|
||||||
|
|
||||||
|
[private]
|
||||||
|
default:
|
||||||
|
@just --list --unsorted --list-submodules
|
||||||
|
|
||||||
|
# Add Helm repository
|
||||||
|
add-helm-repo:
|
||||||
|
helm repo add fairwinds-stable https://charts.fairwinds.com/stable
|
||||||
|
helm repo update
|
||||||
|
|
||||||
|
# Remove Helm repository
|
||||||
|
remove-helm-repo:
|
||||||
|
helm repo remove fairwinds-stable
|
||||||
|
|
||||||
|
# Create namespace
|
||||||
|
create-namespace:
|
||||||
|
@kubectl get namespace ${VPA_NAMESPACE} &>/dev/null || \
|
||||||
|
kubectl create namespace ${VPA_NAMESPACE}
|
||||||
|
|
||||||
|
# Delete namespace
|
||||||
|
delete-namespace:
|
||||||
|
@kubectl delete namespace ${VPA_NAMESPACE} --ignore-not-found
|
||||||
|
|
||||||
|
# Install Vertical Pod Autoscaler
|
||||||
|
install:
|
||||||
|
#!/bin/bash
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
# Check if Prometheus is installed
|
||||||
|
if ! helm status kube-prometheus-stack -n ${PROMETHEUS_NAMESPACE} &>/dev/null; then
|
||||||
|
echo "Error: Prometheus (kube-prometheus-stack) is not installed."
|
||||||
|
echo "Please install Prometheus first using: just prometheus::install"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
just add-helm-repo
|
||||||
|
just create-namespace
|
||||||
|
|
||||||
|
# Generate values.yaml from template
|
||||||
|
gomplate -f values.gomplate.yaml -o values.yaml
|
||||||
|
|
||||||
|
# Install VPA with Helm
|
||||||
|
helm upgrade --install vpa fairwinds-stable/vpa --namespace ${VPA_NAMESPACE} \
|
||||||
|
--values values.yaml --wait
|
||||||
|
|
||||||
|
echo "VPA installed successfully in namespace: ${VPA_NAMESPACE}"
|
||||||
|
echo ""
|
||||||
|
echo "To verify installation:"
|
||||||
|
echo " kubectl get pods -n ${VPA_NAMESPACE}"
|
||||||
|
echo " kubectl get vpa -A"
|
||||||
|
|
||||||
|
# Uninstall Vertical Pod Autoscaler
|
||||||
|
uninstall:
|
||||||
|
#!/bin/bash
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
if ! helm status vpa -n ${VPA_NAMESPACE} &>/dev/null; then
|
||||||
|
echo "VPA is not installed."
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
if command -v gum &>/dev/null; then
|
||||||
|
if ! gum confirm "Are you sure you want to uninstall VPA?"; then
|
||||||
|
echo "Uninstall cancelled."
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
else
|
||||||
|
read -p "Are you sure you want to uninstall VPA? (y/N) " -n 1 -r
|
||||||
|
echo
|
||||||
|
if [[ ! $REPLY =~ ^[Yy]$ ]]; then
|
||||||
|
echo "Uninstall cancelled."
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
helm uninstall vpa -n ${VPA_NAMESPACE}
|
||||||
|
|
||||||
|
# Delete VPA CRDs
|
||||||
|
kubectl delete crd verticalpodautoscalercheckpoints.autoscaling.k8s.io --ignore-not-found
|
||||||
|
kubectl delete crd verticalpodautoscalers.autoscaling.k8s.io --ignore-not-found
|
||||||
|
|
||||||
|
just delete-namespace
|
||||||
|
|
||||||
|
echo "VPA uninstalled successfully."
|
||||||
59
vpa/values.gomplate.yaml
Normal file
59
vpa/values.gomplate.yaml
Normal file
@@ -0,0 +1,59 @@
|
|||||||
|
# Fairwinds VPA Helm chart values
|
||||||
|
# Optimized for resource monitoring with Prometheus + Goldilocks
|
||||||
|
|
||||||
|
rbac:
|
||||||
|
create: true
|
||||||
|
|
||||||
|
serviceAccount:
|
||||||
|
create: true
|
||||||
|
automountServiceAccountToken: true
|
||||||
|
|
||||||
|
recommender:
|
||||||
|
enabled: true
|
||||||
|
replicaCount: 1
|
||||||
|
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
cpu: 50m
|
||||||
|
memory: 500Mi
|
||||||
|
limits:
|
||||||
|
cpu: 200m
|
||||||
|
memory: 1Gi
|
||||||
|
|
||||||
|
extraArgs:
|
||||||
|
v: '4' # Verbose logging level
|
||||||
|
pod-recommendation-min-cpu-millicores: 15
|
||||||
|
pod-recommendation-min-memory-mb: 100
|
||||||
|
storage: prometheus
|
||||||
|
prometheus-address: '{{ .Env.PROMETHEUS_ADDRESS }}'
|
||||||
|
|
||||||
|
podMonitor:
|
||||||
|
enabled: true
|
||||||
|
|
||||||
|
updater:
|
||||||
|
enabled: false
|
||||||
|
# Disabled for monitoring-only mode
|
||||||
|
# The updater component automatically applies VPA recommendations to pods
|
||||||
|
# Enable this only if you want automatic pod resource updates
|
||||||
|
|
||||||
|
admissionController:
|
||||||
|
enabled: false
|
||||||
|
# Disabled for monitoring-only mode
|
||||||
|
# The admission controller validates and mutates pod resources at creation time
|
||||||
|
# Enable this only if you want automatic resource enforcement
|
||||||
|
|
||||||
|
metrics-server:
|
||||||
|
enabled: false
|
||||||
|
|
||||||
|
podSecurityContext:
|
||||||
|
runAsNonRoot: true
|
||||||
|
runAsUser: 65534
|
||||||
|
seccompProfile:
|
||||||
|
type: RuntimeDefault
|
||||||
|
|
||||||
|
securityContext:
|
||||||
|
readOnlyRootFilesystem: true
|
||||||
|
allowPrivilegeEscalation: false
|
||||||
|
capabilities:
|
||||||
|
drop:
|
||||||
|
- ALL
|
||||||
59
vpa/values.yaml
Normal file
59
vpa/values.yaml
Normal file
@@ -0,0 +1,59 @@
|
|||||||
|
# Fairwinds VPA Helm chart values
|
||||||
|
# Optimized for resource monitoring with Prometheus + Goldilocks
|
||||||
|
|
||||||
|
rbac:
|
||||||
|
create: true
|
||||||
|
|
||||||
|
serviceAccount:
|
||||||
|
create: true
|
||||||
|
automountServiceAccountToken: true
|
||||||
|
|
||||||
|
recommender:
|
||||||
|
enabled: true
|
||||||
|
replicaCount: 1
|
||||||
|
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
cpu: 50m
|
||||||
|
memory: 500Mi
|
||||||
|
limits:
|
||||||
|
cpu: 200m
|
||||||
|
memory: 1Gi
|
||||||
|
|
||||||
|
extraArgs:
|
||||||
|
v: '4' # Verbose logging level
|
||||||
|
pod-recommendation-min-cpu-millicores: 15
|
||||||
|
pod-recommendation-min-memory-mb: 100
|
||||||
|
storage: prometheus
|
||||||
|
prometheus-address: 'http://kube-prometheus-stack-prometheus.monitoring.svc:9090'
|
||||||
|
|
||||||
|
podMonitor:
|
||||||
|
enabled: true
|
||||||
|
|
||||||
|
updater:
|
||||||
|
enabled: false
|
||||||
|
# Disabled for monitoring-only mode
|
||||||
|
# The updater component automatically applies VPA recommendations to pods
|
||||||
|
# Enable this only if you want automatic pod resource updates
|
||||||
|
|
||||||
|
admissionController:
|
||||||
|
enabled: false
|
||||||
|
# Disabled for monitoring-only mode
|
||||||
|
# The admission controller validates and mutates pod resources at creation time
|
||||||
|
# Enable this only if you want automatic resource enforcement
|
||||||
|
|
||||||
|
metrics-server:
|
||||||
|
enabled: false
|
||||||
|
|
||||||
|
podSecurityContext:
|
||||||
|
runAsNonRoot: true
|
||||||
|
runAsUser: 65534
|
||||||
|
seccompProfile:
|
||||||
|
type: RuntimeDefault
|
||||||
|
|
||||||
|
securityContext:
|
||||||
|
readOnlyRootFilesystem: true
|
||||||
|
allowPrivilegeEscalation: false
|
||||||
|
capabilities:
|
||||||
|
drop:
|
||||||
|
- ALL
|
||||||
Reference in New Issue
Block a user