Monitoring OpenShift with Prometheus & Grafana: Metrics That Matter

Introduction

Effective monitoring is critical for maintaining OpenShift cluster health and performance. OpenShift integrates Prometheus for metrics collection and Grafana for visualization, enabling real-time insights and proactive alerting.

📊 Prometheus in OpenShift

Prometheus is deployed via the Cluster Monitoring Operator.

Key Features:

Scrapes metrics from nodes, pods, and services
Stores time-series data
Supports alerting rules

Access Prometheus UI:

bash

oc get route prometheus-k8s -n openshift-monitoring

📈 Grafana Dashboards

Grafana connects to Prometheus and visualizes metrics through customizable dashboards.

Steps to Use:

Deploy Grafana in your namespace
Add Prometheus as a data source
Import OpenShift dashboard templates

Example Dashboard Panels:

Node CPU & memory usage
Pod restarts and uptime
Network throughput
API server latency

🚨 Alerting with Prometheus

Define alert rules to notify on threshold breaches.

Sample Rule:

yaml

groups:

- name: node.rules

rules:

- alert: HighCPUUsage

expr: instance:node_cpu:rate5m > 80

for: 5m

labels:

severity: warning

annotations:

summary: "High CPU usage detected"

Integrate with Alertmanager for email, Slack, or webhook notifications.

🧪 Troubleshooting Tips

Use oc logs on Prometheus pods to inspect scrape errors
Validate Grafana data source connectivity
Check alert rule syntax and firing status

✅ Best Practices

Monitor control plane and worker nodes separately
Set up dashboards for developers and SREs
Use recording rules to optimize query performance

Visit our website to learn more 👉 https://rshnetwork.com/