kube-Prometheus监控k8s集群
- prometheus
- 2025-04-02
- 334热度
- 0评论
1.kube-Prometheus是什么?
kube-Prometheus是一个完善的Kubernetes集群监控解决方案,该方案提供了详尽的Kubernetes清单文件、Grafana仪表板配置和Prometheus规则,辅以详细的文档和脚本,使得在Kubernetes集群中利用Prometheus进行端到端的监控变得异常便捷高效。kube-Prometheus不仅引入了Prometheus Operator模式,还在此基础上进行了增强与拓展,因此可以将其理解为一个经过优化且充分利用operator机制的高级Prometheus部署方案。
包含如下组件:
- The Prometheus Operator
- Highly available Prometheus
- Highly available Alertmanager
- Prometheus node-exporter
- Prometheus blackbox-exporter
- Prometheus Adapter for Kubernetes Metrics APIs
- kube-state-metrics
- Grafana
2.社区活跃度:
Prometheus项目在github截至2024/1/23 star 51.5k,Prometheus operator 8.5k,kube-Prometheus 6k,Prometheus是当前metrics的事实标准,Prometheus operator也是当前K8S operator生态中的标杆,kube-Prometheus作为Prometheus operator的子项目同样非常火热。
3.安装部署
从github上下载
git clone https://github.com/prometheus-operator/kube-prometheus.git
首先根据官方支持矩阵和自己的K8S集群版本进行分支选择
kube-prometheus stack | Kubernetes 1.23 | Kubernetes 1.24 | Kubernetes 1.25 | Kubernetes 1.26 | Kubernetes 1.27 | Kubernetes 1.28 | Kubernetes 1.29 | Kubernetes 1.30 | Kubernetes 1.31 |
---|---|---|---|---|---|---|---|---|---|
release-0.11 |
✔ | ✔ | ✗ | x | x | x | x | x | x |
release-0.12 |
✗ | ✔ | ✔ | x | x | x | x | x | x |
release-0.13 |
✗ | ✗ | x | ✔ | ✔ | ✔ | x | x | x |
release-0.14 |
✗ | ✗ | x | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
main |
✗ | ✗ | x | x | ✔ | ✔ | ✔ | ✔ | ✔ |
进入文件夹,执行
kubectl apply --server-side -f manifests/setup
主要是创建了monitoring命名空间和一些需要的CRD模板,通常会很快
然后进行kube-prometheus的安装
kubectl apply -f manifests/
注意下面的“数据持久化”
4.提权
仅对部分(默认是monitoring,default,kube-system)命名空间可以使用servicemonitor等CRD,对一些operator和helm安装的应用会创建exporter或者servicemonitor不友好,可能提示权限错误或者,现在直接给Prometheus给到最高的权限
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
app.kubernetes.io/component: prometheus
app.kubernetes.io/instance: k8s
app.kubernetes.io/name: prometheus
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 2.44.0
name: prometheus-k8s-cluster-wide
rules:
- apiGroups: [""]
resources:
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
app.kubernetes.io/component: prometheus
app.kubernetes.io/instance: k8s
app.kubernetes.io/name: prometheus
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 2.44.0
name: prometheus-k8s-cluster-wide-binding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus-k8s-cluster-wide
subjects:
- kind: ServiceAccount
name: prometheus-k8s
namespace: monitoring
5.数据持久化
在manifests/prometheus-prometheus.yaml文件中,默认的Prometheus CR是不带持久化的,这里我们根据官方文档给它声明一个存储类,如果需要调整副本数,也可以在这里进行操作。
在manifests/prometheus-prometheus.yaml文件的末尾,添加如下内容:
### 主要增加了这里
### 注意 storage 不是顶格
storage:
### retention默认1d,改为30d,按需更新
retention: 30d
volumeClaimTemplate:
spec:
### 存储类,配置为csi-nfs
storageClassName: csi-nfs
resoures:
requests:
storage: 300Gi
在manifests/grafana-deployment.yaml文件中, 默认的Grafana是不带持久化的,这里我们根据官方文档给它声明一个存储类。
volumeMounts:
- mountPath: /var/lib/grafana
name: grafana-storage
readOnly: false
volumes:
- emptyDir: {}
name: grafana-storage
将grafana-storagej卷,改成存储类 storageClassName: csi-nfs
volumes:
- name: grafana-storage
persistentVolumeClaim:
claimName: csi-nfs