1 Prometheus是什么
摘自Prometheus官網:
From metrics to insight
Power your metrics and alerting with a leading open-source monitoring solution.
翻譯過來就是,Prometheus是一個領先的開源監控解決方案
2 Prometheus特點
2.1 高維數據模型(Highly Dimensional Data Model)及時序列(Time Series)
Prometheus把所有的數據按時序列進行存儲, 所有采集上來的數據在都被打了時間戳并按時間先后順序進行流化,這些數據屬于相同的指標名以及一組標簽維度(labeled dimensions)
2.1.1 時序列
A time series is a series of data points indexed (or listed or graphed) in time order.
時序列是按照時間順序排列的一系列數據點
2.1.2 指標名(Metric names)和標簽(Labels)
每個時序列被唯一識別為它的一個指標名(metric name)以及一組鍵值對(key-value pairs),鍵值對就是我們所說的標簽(labels)
指標名指定了被測量系統的一個常規特征,比如:http_requests_total - 收到的HTTP請求總數。指標名的命名必須符合正則表達式規則[a-zA-Z_:][a-zA-Z0-9_:]*
標簽使Prometheus的維度數據模型成為可能,對于一個指標名給定任意一組標簽的組合都能標識一個特定的指標維度實例(particular dimensional instantiation of that metric),例如:通過POST方法訪問 /api/tracks 的HTTP請求。標簽名的命名必須符合正則表達式規則[a-zA-Z_][a-zA-Z0-9_]*
2.1.3 記號(Notation)
給定一個指標名和一組標簽,時序列通過Notation被標記:
<metric name>{<label name>=<label value>, ...}
例如,一個指標名為 api_http_requests_total 且標簽為 method="POST" 和 handler="/messages" 的時序列可以寫成:
api_http_requests_total{method="POST", handler="/messages"}
2.2 PromQL
建立在高維數據模型上的查詢語言,這里暫不展開
2.3 高效存儲
Prometheus存儲時序列數據于內存和本地磁盤中,不依賴分布式存儲,單節點工作。擴展通過功能分片和聯邦來實現
2.4 可視化效果出眾
通過與Grafana集成,能夠為使用者提供非常直觀且漂亮的可視化效果
2.5 通過拉取方式采集數據,或者通過中間網關推送方式采集數據
2.6 通過服務發現或者靜態配置來發現監控目標
3 架構及組件說明
3.1 架構圖
3.2 組件說明
3.2.1 Prometheus Server
負責數據采集和存儲,提供PromQL查詢語言的支持
3.2.2 Push gateway
負責支持short-lived jobs,push gateway能夠讓臨時(ephemeral)或批處理job暴露他們的指標給Prometheus。因為臨時和批處理job很可能并不長期存在,所以Prometheus無法抓到相應的數據,取而代之,我們讓這樣的job把指標數據主動推送給push gateway,之后push gateway再把這些指標數據暴露給Prometheus。push gateway就像一個指標緩存,并不負責計算。
3.2.3 Exporters
Exporters幫助把第三方系統的既有指標輸出為Prometheus指標。
我們可以在 exporter default port wiki上查看Exporters的列表,也可以在
EXPORTERS AND INTEGRATIONS上查看列表
3.2.4 Alertmanager
Alertmanager負責處理從客戶端應用(例如:Prometheus)發送過來的警報,把接受到的信息去重、分組,并把他們路由到正確的接收器,如PagerDuty, OpsGenie。Alertmanager還負責警報信息的消聲或抑制
4 Prometheus優缺點
Prometheus對于采集純數字值的時間序列非常在行,所以它既適合以物理機為中心的監控,也適合監控高度動態的面向服務的架構體。在微服務領域,它的多維數據采集以及查詢非常獨到且很有競爭力。
Prometheus最大的價值在于可靠性,用戶可以在任何時候看到整個被監控系統的統計信息,即使在系統有問題的時候。但它不能做到100%的精確,比如如果你要按request數計費,那么Prometheus未必能采集到所有的請求,這個時候Prometheus就不太合適了。
5 Prometheus與Kubernetes
Prometheus是Kubernetes的近親,Google公布的Kubernetes派生于他們的Borg集群系統,而Prometheus與Borgmon共享基礎設計概念, Borgmon是與Borg的監控系統。現在Prometheus和kubernetes都被云原生計算基金會(CNCF)所掌管。技術層面上Kubernetes會把它內部的指標數據以Prometheus可以接受的格式暴露出來。
5.1 Prometheus與Kubernetes集成的方式
- Prometheus Operator
- kube-prometheus
- kubernetes addon
5.1.1 Prometheus Operator
Operator是CoreOS引入的一種操作其他軟件,并把人們收集到的操作知識集成到部署過程的軟件。
Prometheus Operator可以方便的讓用戶安裝Prometheus,并用簡單的聲明式配置來管理和配置Prometheus實例。其核心思想在于解耦Prometheus實例的部署與針對被監控實體的配置,使Prometheus運行在Kubernetes上的步驟盡可能的簡單。
Prometheus向Kubernetes中引入了額外的資源用于聲明期望Prometheus及Alertmanager集群所達到的狀態,這些資源包括:
- Prometheus
- Alertmanager
- ServiceMonitor
- PrometheusRule
Altermanager不在本文范圍之內,以后的文章單獨陳述。
Prometheus資源聲明式的描述了部署Prometheus部署時所期望達到的狀態,而ServiceMonitor描述了一組被Prometheus所監控的目標。
上圖的Operator用來確保在任意時間對于每個處于Kubernetes集群中的Prometheus資源都有一組按照期望配置的Prometheus Server在運行。每個Prometheus實例又與各自的配置綁定在一起,這些配置指定了該監視哪些目標從而抓取指標。
用戶可以手動指定這些配置,或者讓Operator基于ServiceMonitor生成出來。ServiceMonitor資源指定如何從一組服務中獲取指標。而Prometheus資源對象可以通過標簽(labels)動態的引入ServiceMonitor對象,Operator設置Prometheus實例來監控所有被ServiceMonitor所覆蓋的服務并保持配置與集群中的變化同步。
5.1.2 kube-prometheus
kube-prometheus把Prometheus Operator和一系列manifests結合起來,幫助用戶從監控Kubernetes自身及跑在上面的應用開始,提供全棧的監控配置。
6 部署Prometheus
6.1 部署環境
序號 | 節點名 | 角色 | 內存 | IP | 版本 |
---|---|---|---|---|---|
1 | master.mike.com | master | 2GB | 192.168.56.101 | v1.13.4 |
2 | node1.mike.com | node | 2GB | 192.168.56.102 | v1.13.4 |
3 | node2.mike.com | node | 2GB | 192.168.56.103 | v1.13.4 |
6.2 通過kube-prometheus快速部署Prometheus
6.2.1 前提
- 準備好一個kubernetes集群,這里參見章節6.1
- 確認以下flag在kubernetes集群中被設置好,目的在于告訴kubelet使用token來認證及鑒權,這可以允許更細粒度及更簡單的訪問控制:
- ---authentication-token-webhook=true
- --authorization-mode=Webhook
- 因為kube-prometheus已經包含了資源指標API服務(resource metrics API server),這與metrics-server功能相同,所以,如果kubernetes集群要是已經部署了metrics-server,則先卸載掉metrics-server,如果沒有,則略過此項。
6.2.2 克隆kube-prometheus倉庫
[root@master ~]# git clone https://github.com/coreos/kube-prometheus.git
Cloning into 'kube-prometheus'...
remote: Enumerating objects: 49, done.
remote: Counting objects: 100% (49/49), done.
remote: Compressing objects: 100% (42/42), done.
remote: Total 5763 (delta 19), reused 19 (delta 2), pack-reused 5714
Receiving objects: 100% (5763/5763), 3.72 MiB | 714.00 KiB/s, done.
Resolving deltas: 100% (3396/3396), done.
6.2.3 快速部署監控棧
筆者的網絡環境位于圍墻之外,所以當通過資源文件創建kubernetes中的各種資源時,kubernetes自動的從各種庫中下載docker所需要的鏡像文件而未受到任何的阻礙,如果部署的環境位于墻內,則需要預先下載好所有資源文件中涉及的鏡像文件到集群中的節點上。
創建資源
[root@master kube-prometheus]# kubectl create -f manifests/
namespace/monitoring created
customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com created
clusterrole.rbac.authorization.k8s.io/prometheus-operator created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-operator created
deployment.apps/prometheus-operator created
service/prometheus-operator created
serviceaccount/prometheus-operator created
servicemonitor.monitoring.coreos.com/prometheus-operator created
alertmanager.monitoring.coreos.com/main created
secret/alertmanager-main created
service/alertmanager-main created
serviceaccount/alertmanager-main created
servicemonitor.monitoring.coreos.com/alertmanager created
secret/grafana-datasources created
configmap/grafana-dashboard-k8s-cluster-rsrc-use created
configmap/grafana-dashboard-k8s-node-rsrc-use created
configmap/grafana-dashboard-k8s-resources-cluster created
configmap/grafana-dashboard-k8s-resources-namespace created
configmap/grafana-dashboard-k8s-resources-pod created
configmap/grafana-dashboard-k8s-resources-workload created
configmap/grafana-dashboard-k8s-resources-workloads-namespace created
configmap/grafana-dashboard-nodes created
configmap/grafana-dashboard-persistentvolumesusage created
configmap/grafana-dashboard-pods created
configmap/grafana-dashboard-statefulset created
configmap/grafana-dashboards created
deployment.apps/grafana created
service/grafana created
serviceaccount/grafana created
servicemonitor.monitoring.coreos.com/grafana created
clusterrole.rbac.authorization.k8s.io/kube-state-metrics created
clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created
deployment.apps/kube-state-metrics created
role.rbac.authorization.k8s.io/kube-state-metrics created
rolebinding.rbac.authorization.k8s.io/kube-state-metrics created
service/kube-state-metrics created
serviceaccount/kube-state-metrics created
servicemonitor.monitoring.coreos.com/kube-state-metrics created
clusterrole.rbac.authorization.k8s.io/node-exporter created
clusterrolebinding.rbac.authorization.k8s.io/node-exporter created
daemonset.apps/node-exporter created
service/node-exporter created
serviceaccount/node-exporter created
servicemonitor.monitoring.coreos.com/node-exporter created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
clusterrole.rbac.authorization.k8s.io/prometheus-adapter created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-adapter created
clusterrolebinding.rbac.authorization.k8s.io/resource-metrics:system:auth-delegator created
clusterrole.rbac.authorization.k8s.io/resource-metrics-server-resources created
configmap/adapter-config created
deployment.apps/prometheus-adapter created
rolebinding.rbac.authorization.k8s.io/resource-metrics-auth-reader created
service/prometheus-adapter created
serviceaccount/prometheus-adapter created
clusterrole.rbac.authorization.k8s.io/prometheus-k8s created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-k8s created
prometheus.monitoring.coreos.com/k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s-config created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s-config created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
prometheusrule.monitoring.coreos.com/prometheus-k8s-rules created
service/prometheus-k8s created
serviceaccount/prometheus-k8s created
servicemonitor.monitoring.coreos.com/prometheus created
servicemonitor.monitoring.coreos.com/kube-apiserver created
servicemonitor.monitoring.coreos.com/coredns created
servicemonitor.monitoring.coreos.com/kube-controller-manager created
servicemonitor.monitoring.coreos.com/kube-scheduler created
servicemonitor.monitoring.coreos.com/kubelet created
檢查資源就緒狀況
[root@master kube-prometheus]# until kubectl get customresourcedefinitions servicemonitors.monitoring.coreos.com ; do date; sleep 1; echo ""; done
NAME CREATED AT
servicemonitors.monitoring.coreos.com 2019-05-13T05:38:41Z
[root@master kube-prometheus]# until kubectl get servicemonitors --all-namespaces ; do date; sleep 1; echo ""; done
NAMESPACE NAME AGE
monitoring alertmanager 28s
monitoring coredns 25s
monitoring grafana 27s
monitoring kube-apiserver 25s
monitoring kube-controller-manager 25s
monitoring kube-scheduler 25s
monitoring kube-state-metrics 27s
monitoring kubelet 25s
monitoring node-exporter 27s
monitoring prometheus 25s
monitoring prometheus-operator 28s
7 訪問Prometheus
通過kube-Prometheus在kubernetes集群中部署的Prometheus,默認情況下,只能在集群內部訪問,其建立起來的所有service的類型(type)都是ClusterIP:
[root@master kube-prometheus]# kubectl get svc -n monitoring
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager-main ClusterIP 10.102.26.246 <none> 9093/TCP 84m
alertmanager-operated ClusterIP None <none> 9093/TCP,6783/TCP 83m
grafana ClusterIP 10.111.252.205 <none> 3000/TCP 84m
kube-state-metrics ClusterIP None <none> 8443/TCP,9443/TCP 84m
node-exporter ClusterIP None <none> 9100/TCP 84m
prometheus-adapter ClusterIP 10.100.63.156 <none> 443/TCP 84m
prometheus-k8s ClusterIP 10.108.202.31 <none> 9090/TCP 83m
prometheus-operated ClusterIP None <none> 9090/TCP 83m
prometheus-operator ClusterIP None <none> 8080/TCP 84m
如果想在集群外部訪問Prometheus,就需要把服務暴露給外界,下面提供具體的方式
7.1 通過修改NodePort從外界訪問Promethus
7.1.1 修改prometheus-k8s服務的類型
編輯prometheus-service.yaml文件,修改為如下內容(注意nodePort及type屬性):
apiVersion: v1
kind: Service
metadata:
labels:
prometheus: k8s
name: prometheus-k8s
namespace: monitoring
spec:
ports:
- name: web
nodePort: 32090
port: 9090
targetPort: web
selector:
app: prometheus
prometheus: k8s
sessionAffinity: ClientIP
type: NodePort
應用prometheus-service.yaml
[root@master manifests]# kubectl apply -f prometheus-service.yaml --force
service/prometheus-k8s created
檢查prometheus-k8s service修改后的類型及對應的nodePort:
[root@master kube-prometheus]# kubectl get svc -n monitoring
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager-main ClusterIP 10.102.26.246 <none> 9093/TCP 139m
alertmanager-operated ClusterIP None <none> 9093/TCP,6783/TCP 139m
grafana ClusterIP 10.111.252.205 <none> 3000/TCP 139m
kube-state-metrics ClusterIP None <none> 8443/TCP,9443/TCP 139m
node-exporter ClusterIP None <none> 9100/TCP 139m
prometheus-adapter ClusterIP 10.100.63.156 <none> 443/TCP 139m
prometheus-k8s NodePort 10.96.91.11 <none> 9090:32090/TCP 19s
prometheus-operated ClusterIP None <none> 9090/TCP 138m
prometheus-operator ClusterIP None <none> 8080/TCP 139m
7.1.2 通過nodePort及節點IP訪問Promethus
我們已知master節點的IP為192.168.56.101,且在上面我們修改nodePort為32090,所以從集群外的節點可以訪問地址:http://192.168.56.101:32090
首頁如圖:
到這里,其實我們已經可以通過Prometheus的界面以及PromQL來查詢我們已經獲取的所有指標信息了,當然Prometheus提供的界面相對簡陋,功能比較簡單,所以我們可以繼續下面的操作,通過訪問已經部署好的Granfana來以更加美觀的各種圖標來展示Prometheus中獲得的數據。
7.2 通過nodePort及節點IP訪問Grafana
7.2.1 修改grafana服務的類型
編輯grafana-service.yaml文件,修改為如下內容(注意nodePort及type屬性):
apiVersion: v1
kind: Service
metadata:
labels:
app: grafana
name: grafana
namespace: monitoring
spec:
ports:
- name: http
port: 3000
targetPort: http
nodePort: 32030
selector:
app: grafana
type: NodePort
應用grafana-service.yaml
[root@master manifests]# kubectl apply -f grafana-service.yaml
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
service/grafana configured
檢查grafana service修改后的類型及對應的nodePort:
[root@master manifests]# kubectl get svc -n monitoring
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager-main ClusterIP 10.102.26.246 <none> 9093/TCP 174m
alertmanager-operated ClusterIP None <none> 9093/TCP,6783/TCP 173m
grafana NodePort 10.111.252.205 <none> 3000:32030/TCP 174m
kube-state-metrics ClusterIP None <none> 8443/TCP,9443/TCP 174m
node-exporter ClusterIP None <none> 9100/TCP 174m
prometheus-adapter ClusterIP 10.100.63.156 <none> 443/TCP 174m
prometheus-k8s NodePort 10.96.91.11 <none> 9090:32090/TCP 35m
prometheus-operated ClusterIP None <none> 9090/TCP 173m
prometheus-operator ClusterIP None <none> 8080/TCP 174m
7.2.2 通過nodePort及節點IP訪問grafana
我們已知master節點的IP為192.168.56.101,且在上面我們修改nodePort為32030,所以從集群外的節點可以訪問地址:http://192.168.56.101:32030 (默認用戶名/密碼:admin:admin)
首頁如圖:
我們可以發現,通過kube-prometheus部署出來的Granfana已經把數據源配置為相同集群中的Prometheus,同時存在了大量已經定義好的圖表,使用起來非常簡單。
7.3 通過nodePort及節點IP訪問Alert Manager
7.3.1 修改alertmanager-main服務的類型
編輯alertmanager-service.yaml文件,修改為如下內容(注意nodePort及type屬性):
apiVersion: v1
kind: Service
metadata:
labels:
alertmanager: main
name: alertmanager-main
namespace: monitoring
spec:
ports:
- name: web
port: 9093
targetPort: web
nodePort: 30093
selector:
alertmanager: main
app: alertmanager
sessionAffinity: ClientIP
type: NodePort
應用alertmanager-service.yaml
[root@master manifests]# kubectl apply -f alertmanager-service.yaml
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
service/alertmanager-main configured
檢查alertmanager-main service修改后的類型及對應的nodePort:
[root@master kube-prometheus]# kubectl get svc -n monitoring
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager-main NodePort 10.102.26.246 <none> 9093:30093/TCP 3h10m
alertmanager-operated ClusterIP None <none> 9093/TCP,6783/TCP 3h9m
grafana NodePort 10.111.252.205 <none> 3000:32030/TCP 3h10m
kube-state-metrics ClusterIP None <none> 8443/TCP,9443/TCP 3h10m
node-exporter ClusterIP None <none> 9100/TCP 3h10m
prometheus-adapter ClusterIP 10.100.63.156 <none> 443/TCP 3h10m
prometheus-k8s NodePort 10.96.91.11 <none> 9090:32090/TCP 51m
prometheus-operated ClusterIP None <none> 9090/TCP 3h9m
prometheus-operator ClusterIP None <none> 8080/TCP 3h10m
7.3.2 通過nodePort及節點IP訪問alertmanager-main
我們已知master節點的IP為192.168.56.101,且在上面我們修改nodePort為30093,所以從集群外的節點可以訪問地址:http://192.168.56.101:30093