運維三板斧-監控prometheus

kube-promethesu對k8s版本支持

kube-prometheus版本	kubenetes版本
release-0.4	1.16、1.17
release-0.5	1.18
release-0.6	1.18、1.19
release-0.7	1.19、1.20
release-0.8	1.20、1.21
release-0.9	1.21、1.22
release-0.10	1.22、1.23
release-0.11	1.23、1.24
main	1.24

參考連接： https://github.com/prometheus-operator/kube-prometheus#compatibility

kube-prometheus安裝

kube-prometheus地址

https://github.com/prometheus-operator/kube-prometheus

kube-prometheus組件

prometheus-operator
prometheus
alertmanager
prometheus-adapter
node-exporter
kube-state-metrics
grafana
blackbox-exporter

以上組件均來源與kube-prometheus release-0.8版本

下載kube-prometheus

git clone  -b release-0.8  https://github.com/prometheus-operator/kube-prometheus.git

[root@k8s-master kube-prometheus-release-0.8]# ll
總用量 184
-rwxr-xr-x 1 root root   679 3月  21 2022 build.sh
-rw-r--r-- 1 root root  3039 3月  21 2022 code-of-conduct.md
-rw-r--r-- 1 root root  1422 3月  21 2022 DCO
drwxr-xr-x 2 root root  4096 3月  21 2022 docs
-rw-r--r-- 1 root root  2051 3月  21 2022 example.jsonnet
drwxr-xr-x 7 root root  4096 3月  21 2022 examples
drwxr-xr-x 3 root root    28 3月  21 2022 experimental
-rw-r--r-- 1 root root   237 3月  21 2022 go.mod
-rw-r--r-- 1 root root 59996 3月  21 2022 go.sum
drwxr-xr-x 3 root root    68 3月  21 2022 hack
drwxr-xr-x 3 root root    29 3月  21 2022 jsonnet
-rw-r--r-- 1 root root   206 3月  21 2022 jsonnetfile.json
-rw-r--r-- 1 root root  4857 3月  21 2022 jsonnetfile.lock.json
-rw-r--r-- 1 root root  4495 3月  21 2022 kustomization.yaml
-rw-r--r-- 1 root root 11325 3月  21 2022 LICENSE
-rw-r--r-- 1 root root  2153 3月  21 2022 Makefile
drwxr-xr-x 3 root root  4096 9月  19 19:39 manifests
-rw-r--r-- 1 root root   126 3月  21 2022 NOTICE
-rw-r--r-- 1 root root 38246 3月  21 2022 README.md
drwxr-xr-x 2 root root   187 3月  21 2022 scripts
-rw-r--r-- 1 root root   928 3月  21 2022 sync-to-internal-registry.jsonnet
drwxr-xr-x 3 root root    17 3月  21 2022 tests
-rwxr-xr-x 1 root root   808 3月  21 2022 test.sh

部署文件清單

[root@k8s-master kube-prometheus-release-0.8]# tree manifests/ 
manifests/
├── alertmanager-alertmanager.yaml
├── alertmanager-podDisruptionBudget.yaml
├── alertmanager-prometheusRule.yaml
├── alertmanager-secret.yaml
├── alertmanager-serviceAccount.yaml
├── alertmanager-serviceMonitor.yaml
├── alertmanager-service.yaml
├── blackbox-exporter-clusterRoleBinding.yaml
├── blackbox-exporter-clusterRole.yaml
├── blackbox-exporter-configuration.yaml
├── blackbox-exporter-deployment.yaml
├── blackbox-exporter-serviceAccount.yaml
├── blackbox-exporter-serviceMonitor.yaml
├── blackbox-exporter-service.yaml
├── grafana-dashboardDatasources.yaml
├── grafana-dashboardDefinitions.yaml
├── grafana-dashboardSources.yaml
├── grafana-deployment.yaml
├── grafana-serviceAccount.yaml
├── grafana-serviceMonitor.yaml
├── grafana-service.yaml
├── istio-servicemonitor.yaml
├── kube-prometheus-prometheusRule.yaml
├── kubernetes-prometheusRule.yaml
├── kubernetes-serviceMonitorApiserver.yaml
├── kubernetes-serviceMonitorCoreDNS.yaml
├── kubernetes-serviceMonitorKubeControllerManager.yaml
├── kubernetes-serviceMonitorKubelet.yaml
├── kubernetes-serviceMonitorKubeScheduler.yaml
├── kube-state-metrics-clusterRoleBinding.yaml
├── kube-state-metrics-clusterRole.yaml
├── kube-state-metrics-deployment.yaml
├── kube-state-metrics-prometheusRule.yaml
├── kube-state-metrics-serviceAccount.yaml
├── kube-state-metrics-serviceMonitor.yaml
├── kube-state-metrics-service.yaml
├── node-exporter-clusterRoleBinding.yaml
├── node-exporter-clusterRole.yaml
├── node-exporter-daemonset.yaml
├── node-exporter-prometheusRule.yaml
├── node-exporter-serviceAccount.yaml
├── node-exporter-serviceMonitor.yaml
├── node-exporter-service.yaml
├── prometheus-adapter-apiService.yaml
├── prometheus-adapter-clusterRoleAggregatedMetricsReader.yaml
├── prometheus-adapter-clusterRoleBindingDelegator.yaml
├── prometheus-adapter-clusterRoleBinding.yaml
├── prometheus-adapter-clusterRoleServerResources.yaml
├── prometheus-adapter-clusterRole.yaml
├── prometheus-adapter-configMap.yaml
├── prometheus-adapter-deployment.yaml
├── prometheus-adapter-podDisruptionBudget.yaml
├── prometheus-adapter-roleBindingAuthReader.yaml
├── prometheus-adapter-serviceAccount.yaml
├── prometheus-adapter-serviceMonitor.yaml
├── prometheus-adapter-service.yaml
├── prometheus-clusterRoleBinding.yaml
├── prometheus-clusterRole.yaml
├── prometheus-operator-prometheusRule.yaml
├── prometheus-operator-serviceMonitor.yaml
├── prometheus-operator.yaml
├── prometheus-podDisruptionBudget.yaml
├── prometheus-prometheusRule.yaml
├── prometheus-prometheus.yaml
├── prometheus-roleBindingConfig.yaml
├── prometheus-roleBindingSpecificNamespaces.yaml
├── prometheus-roleConfig.yaml
├── prometheus-roleSpecificNamespaces.yaml
├── prometheus-serviceAccount.yaml
├── prometheus-serviceMonitor.yaml
├── prometheus-service.yaml
└── setup
    ├── 0namespace-namespace.yaml
    ├── prometheus-operator-0alertmanagerConfigCustomResourceDefinition.yaml
    ├── prometheus-operator-0alertmanagerCustomResourceDefinition.yaml
    ├── prometheus-operator-0podmonitorCustomResourceDefinition.yaml
    ├── prometheus-operator-0probeCustomResourceDefinition.yaml
    ├── prometheus-operator-0prometheusCustomResourceDefinition.yaml
    ├── prometheus-operator-0prometheusruleCustomResourceDefinition.yaml
    ├── prometheus-operator-0servicemonitorCustomResourceDefinition.yaml
    ├── prometheus-operator-0thanosrulerCustomResourceDefinition.yaml
    ├── prometheus-operator-clusterRoleBinding.yaml
    ├── prometheus-operator-clusterRole.yaml
    ├── prometheus-operator-deployment.yaml
    ├── prometheus-operator-serviceAccount.yaml
    └── prometheus-operator-service.yaml

部署

kubectl apply -f  manifests/setup/
kubectl apply -f  manifests/

驗證

[root@k8s-master kube-prometheus-release-0.8]# kubectl get all -n monitoring  
NAME                                       READY   STATUS             RESTARTS   AGE
pod/alertmanager-main-0                    2/2     Running            6          115d
pod/alertmanager-main-1                    2/2     Running            4          115d
pod/alertmanager-main-2                    2/2     Running            6          115d
pod/blackbox-exporter-55c457d5fb-swjqc     3/3     Running            6          115d
pod/grafana-9df57cdc4-wc9hn                1/1     Running            2          115d
pod/kube-state-metrics-76f6cb7996-8x7pj    2/3     ImagePullBackOff   4          115d
pod/kube-state-metrics-7749b7b647-4mzsq    2/3     ImagePullBackOff   2          10d
pod/node-exporter-9tj5z                    2/2     Running            4          113d
pod/node-exporter-hsxf7                    2/2     Running            4          115d
pod/node-exporter-q8g6m                    2/2     Running            4          115d
pod/node-exporter-zngtl                    2/2     Running            4          115d
pod/prometheus-adapter-59df95d9f5-hjb7l    1/1     Running            3          115d
pod/prometheus-adapter-59df95d9f5-kdx7n    1/1     Running            4          115d
pod/prometheus-k8s-0                       2/2     Running            5          115d
pod/prometheus-k8s-1                       2/2     Running            5          115d
pod/prometheus-operator-7775c66ccf-2r99w   2/2     Running            5          115d

NAME                            TYPE        CLUSTER-IP        EXTERNAL-IP   PORT(S)                      AGE
service/alertmanager-main       ClusterIP   100.101.252.112   <none>        9093/TCP                     115d
service/alertmanager-operated   ClusterIP   None              <none>        9093/TCP,9094/TCP,9094/UDP   115d
service/blackbox-exporter       ClusterIP   100.111.30.55     <none>        9115/TCP,19115/TCP           115d
service/grafana                 ClusterIP   100.97.190.206    <none>        3000/TCP                     115d
service/kube-state-metrics      ClusterIP   None              <none>        8443/TCP,9443/TCP            115d
service/node-exporter           ClusterIP   None              <none>        9100/TCP                     115d
service/prometheus-adapter      ClusterIP   100.109.111.30    <none>        443/TCP                      115d
service/prometheus-k8s          NodePort    100.101.111.146   <none>        9090:32101/TCP               115d
service/prometheus-operated     ClusterIP   None              <none>        9090/TCP                     115d
service/prometheus-operator     ClusterIP   None              <none>        8443/TCP                     115d

NAME                           DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
daemonset.apps/node-exporter   4         4         4       4            4           kubernetes.io/os=linux   115d

NAME                                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/blackbox-exporter     1/1     1            1           115d
deployment.apps/grafana               1/1     1            1           115d
deployment.apps/kube-state-metrics    0/1     1            0           115d
deployment.apps/prometheus-adapter    2/2     2            2           115d
deployment.apps/prometheus-operator   1/1     1            1           115d

NAME                                             DESIRED   CURRENT   READY   AGE
replicaset.apps/blackbox-exporter-55c457d5fb     1         1         1       115d
replicaset.apps/grafana-9df57cdc4                1         1         1       115d
replicaset.apps/kube-state-metrics-76f6cb7996    1         1         0       115d
replicaset.apps/kube-state-metrics-7749b7b647    1         1         0       67d
replicaset.apps/prometheus-adapter-59df95d9f5    2         2         2       115d
replicaset.apps/prometheus-operator-7775c66ccf   1         1         1       115d

NAME                                 READY   AGE
statefulset.apps/alertmanager-main   3/3     115d
statefulset.apps/prometheus-k8s      2/2     115d

說明：關于prometheus的service type上述已經修改為NodePort類型

修改grafana的service類型為NodePort

修改前grafana-service.yaml

apiVersion: v1
kind: Service
metadata:
kind: Service
metadata:
  labels:
    app.kubernetes.io/component: grafana
    app.kubernetes.io/name: grafana
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 7.5.4
  name: grafana
  namespace: monitoring
spec:
  ports:
  - name: http
    port: 3000
    targetPort: http
  selector:
    app.kubernetes.io/component: grafana
    app.kubernetes.io/name: grafana
    app.kubernetes.io/part-of: kube-prometheus

修改后grafana-service.yaml

kind: Service
metadata:
  labels:
    app.kubernetes.io/component: grafana
    app.kubernetes.io/name: grafana
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 7.5.4
  name: grafana
  namespace: monitoring
spec:
  type: NodePort
  ports:
  - name: http
    port: 3000
    targetPort: http
    nodePort: 32009
  selector:
    app.kubernetes.io/component: grafana
    app.kubernetes.io/name: grafana
    app.kubernetes.io/part-of: kube-prometheus

查看修改后的效果

[root@k8s-master manifests]# kubectl get svc -n monitoring  | grep grafana 
grafana                 NodePort    100.97.190.206    <none>        3000:32009/TCP               115d

[圖片上傳失敗...(image-84565c-1689257827144)]

持久化prometheus

prometheus-prometheus.yaml

  ，，，
  serviceAccountName: prometheus-k8s
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector: {}
  version: 2.26.0
  volumeClaimTemplates:
  - metadata:
      name: db-volume
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: mysql
      resources:
        requests:
          storage: 8Gi

具體sc的創建可以參考：https://note.youdao.com/s/T71lEiDh

kube-prometheus介紹

[圖片上傳失敗...(image-9dc40c-1689257827144)]

Operator

operator在k8s中以deployment類型運行，其職責為自定義資源(CRD)，用來部署和管理prometheus server，同時監控這些CRD資源事件的變化來做出相應的處理，是整個架構中的控制中心
Prometheus
1. 該CRD聲明定義了Prometheus期望在k8s集群中運行的配置，提供了配置選項來配置副本、持久化、報警等
2. 對于每個Prometheus CRD資源，Operator都會以sts形式在形同的名稱空間下部署對應配置，proemtheus pod的配置是通過一個包含prometheus配置的名為prometheus-name的secret對象聲明掛載的
3. 該CRD根據標簽選擇來指定部署到prometheus實例應該覆蓋那些ServiceMonitors，然后Operator會根據包含的ServiceMonitor生成配置，并在包含配置的secret中進行更新
Alertmanager
1. 該CRD定義了在k8s集群中運行的Alertmanager的配置，同樣提供了多種配置，包含持久化存儲。
2. 對于每個Alertmanager資源，Operator都會在相同的名稱空間中部署一個對應配置的sts，Alertmanager pod被配置為一個包含名為alertmanager-name的secret，該secret以alertmanager.yaml為key的方式保存使用的配置文件
ThanosRuler
1. 該CRD定義了一個Thanos Ruler組件的配置，以方便在k8s集群中運行，通過Thanos Ruler，可以跨多個Proemtheus實例處理記錄和報警規則
2. 一個ThanosRuler實例至少需要一個queryEndpoint，它指向Thanos Queriers或prometheus實例的位置，queryEndpoints用于配置Thanos運行時的--query參數
ServiceMonitor
1. 該CRD定義了如何監控一組動態的服務，使用標簽來定義那些service被選擇進行監控
2. 為了讓Prometheus金控k8s內的任何應用，需要存在一個Endpoints對象，Endpoints對象本質上時IP地址的列表，通常Endpoints對象是由Service對象自動填充的，Service對象通過標簽選擇器匹配pod，并將其添加到Endpoints對象中，一個Service可以暴露一個或多個端口，這些端口由多個Endpoints列表支持，這些端點一般情況下都是指向一個pod
3. 注意：endpoints是ServiceMonitor CRD中的字段，Endpoints是k8s的一種對象
PodMonitor
1. 該CRD用于定義如何監控一組動態pod，使用標簽來定義那些pod被選擇進行監控。
Probe
1. 該CRD用于定義如何監控一組Ingress和靜態目標，除了target之外，Probe對象還需要一個Prober，它是監控的目標并為prometheus提供指標的服務，例如可以通過使用blackbox-exporter來提供這個服務
PrometheusRule
1. 用于配置prometheus的rule規則文件，包括recording rule和alerting，可以自動被prometheus加載
AlertmanagerConfig

kube-prometheus自定義監控

創建需要被監控服務的pod(deployment控制器)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: check-secret-tls
  namespace: kube-ops
  labels:
    app: check-secret-tls
    release: prod
spec:
  replicas: 1
  selector:
    matchLabels:
      app: check-secret-tls
      release: prod
  strategy:
    rollingUpdate:
      maxSurge: 70%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: check-secret-tls
        release: prod
    spec:
      terminationGracePeriodSeconds: 60
      containers:
      - image: xxxxxx/kube-ops/check-secret-tls:v1.0
        imagePullPolicy: Always
        name: check-secret-tls
        readinessProbe:
          httpGet:
            port: 8090
            path: /health
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 30
          failureThreshold: 10
        livenessProbe:
          httpGet:
            port: 8090
            path: /health
          initialDelaySeconds: 330
          periodSeconds: 10
          timeoutSeconds: 3
          failureThreshold: 3
        resources:
          requests:
            cpu: 0.5
            memory: 500Mi
          limits:
            cpu: 0.5
            memory: 500Mi
        lifecycle:
          preStop:
            exec:
              command: ["/bin/sh", "-c", "echo 1"]
      imagePullSecrets:
      - name: cn-beijing-ali-tope365

---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: check-secret-tls
    release: prod
  name: check-secret-tls
  namespace: kube-ops
spec:
  ports:
  - name: check-secret-tls
    port: 8090
    protocol: TCP
    targetPort: 8090
  selector:
    app: check-secret-tls
    release: prod
  type: ClusterIP

創建ServiceMonitor
```
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: chekc-secret-tls
  namespace: monitoring
spec:
  endpoints:
    - interval: 15s
      path: /metrics
      port: check-secret-tls
  namespaceSelector:
    any: true
  selector:
    matchLabels:
      app: 'check-secret-tls'
```
metadata.name:該ServcieMonitor的名稱

metadata.namespace:該ServiceMonitor所屬的名稱空間

spec.endpoints:prometheus所采集Metrics地址配置，endpoints為一個數組，可以創建多個，但是每個endpoints包含三個字段interval、path、port

spec.endpoints.interval:prometheus采集數據的周期，單位為秒

spec.endpoints.path:prometheus采集數據的路徑

spec.endpoints.port:prometheus采集數據的端口，這里為port的name，主要是通過spec.selector中選擇對應的svc，在選中的svc中匹配該端口

spec.namespaceSelector:需要發現svc的范圍

spec.namespaceSelector.any:有且僅有一個值true，當該字段被設置時，表示監聽所有符合selector所選擇的svc
1. 使用matchNames時：
```
......
namespaceSelector:
  matchNames:
  - default
  - kube-ops
  ......
```
  matchNames數組值，表示監聽的namespeces的范圍，上述yaml表示監控的namespaces為default和kube-ops

kube-prometheus主要yaml文件介紹

alertmanager-prometheusRule.yaml

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    app.kubernetes.io/component: alert-router
    app.kubernetes.io/name: alertmanager
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 0.21.0
    prometheus: k8s
    role: alert-rules
  name: alertmanager-main-rules
  namespace: monitoring
spec:
  groups:
  - name: alertmanager.rules
    rules:
    - alert: AlertmanagerFailedReload
      annotations:
        description: Configuration has failed to load for {{ $labels.namespace }}/{{ $labels.pod}}.
        runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/alertmanagerfailedreload
        summary: Reloading an Alertmanager configuration has failed.
      expr: |
        # Without max_over_time, failed scrapes could create false negatives, see
        # https://www.robustperception.io/alerting-on-gauges-in-prometheus-2-0 for details.
        max_over_time(alertmanager_config_last_reload_successful{job="alertmanager-main",namespace="monitoring"}[5m]) == 0
      for: 10m
      labels:
        severity: critical
#  該yaml文件截取了一部分，剩余的其實就是其他報警項目
# 由 -alert到severity: critical為一組報警規則，當然你也可以自己定義所需的報警規則，只需由-alert到everity: critical復制粘貼即可

重要標簽：

prometheus:k8s

role:alert-rules

這兩個標簽是在prometheus-prometheus.yaml需要使用到的，通過ruleSelector來選擇對應的報警

所以我們要想自定義一個報警規則，只需要創建一個具有 prometheus=k8s 和 role=alert-rules 標簽的 PrometheusRule 對象就行了

kube-prometheus-prometheusRule.yaml

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    app.kubernetes.io/component: exporter
    app.kubernetes.io/name: kube-prometheus
    app.kubernetes.io/part-of: kube-prometheus
    prometheus: k8s
    role: alert-rules
  name: kube-prometheus-rules
  namespace: monitoring
spec:
  groups:
  - name: general.rules
    rules:
    - alert: TargetDown
      annotations:
        description: '{{ printf "%.4g" $value }}% of the {{ $labels.job }}/{{ $labels.service }} targets in {{ $labels.namespace }} namespace are down.'
        runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/targetdown
        summary: One or more targets are unreachable.
      expr: 100 * (count(up == 0) BY (job, namespace, service) / count(up) BY (job, namespace, service)) > 10
      for: 10m
      labels:
        severity: warning
    - alert: Watchdog
      annotations:
        description: |
          This is an alert meant to ensure that the entire alerting pipeline is functional.
          This alert is always firing, therefore it should always be firing in Alertmanager
          and always fire against a receiver. There are integrations with various notification
          mechanisms that send a notification when this alert is not firing. For example the
          "DeadMansSnitch" integration in PagerDuty.
        runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/watchdog
        summary: An alert that should always be firing to certify that Alertmanager is working properly.
      expr: vector(1)
      labels:
        severity: none
  - name: node-network
    rules:
    - alert: NodeNetworkInterfaceFlapping
      annotations:
        message: Network interface "{{ $labels.device }}" changing it's up status often on node-exporter {{ $labels.namespace }}/{{ $labels.pod }}
        runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/nodenetworkinterfaceflapping
      expr: |
        changes(node_network_up{job="node-exporter",device!~"veth.+"}[2m]) > 2
      for: 2m
      labels:
        severity: warning
  - name: kube-prometheus-node-recording.rules
    rules:
    - expr: sum(rate(node_cpu_seconds_total{mode!="idle",mode!="iowait",mode!="steal"}[3m])) BY (instance)
      record: instance:node_cpu:rate:sum
    - expr: sum(rate(node_network_receive_bytes_total[3m])) BY (instance)
      record: instance:node_network_receive_bytes:rate:sum
    - expr: sum(rate(node_network_transmit_bytes_total[3m])) BY (instance)
      record: instance:node_network_transmit_bytes:rate:sum
    - expr: sum(rate(node_cpu_seconds_total{mode!="idle",mode!="iowait",mode!="steal"}[5m])) WITHOUT (cpu, mode) / ON(instance) GROUP_LEFT() count(sum(node_cpu_seconds_total) BY (instance, cpu)) BY (instance)
      record: instance:node_cpu:ratio
    - expr: sum(rate(node_cpu_seconds_total{mode!="idle",mode!="iowait",mode!="steal"}[5m]))
      record: cluster:node_cpu:sum_rate5m
    - expr: cluster:node_cpu_seconds_total:rate5m / count(sum(node_cpu_seconds_total) BY (instance, cpu))
      record: cluster:node_cpu:ratio
  - name: kube-prometheus-general.rules
    rules:
    - expr: count without(instance, pod, node) (up == 1)
      record: count:up1
    - expr: count without(instance, pod, node) (up == 0)
      record: count:up0

該yaml與alertmanager-prometheusRule.yaml是定義同一的yaml文件，由于 kind、prometheus: k8s、role: alert-rules不難看出

kubernetes-prometheusRule.yaml

該yaml與上述兩個定義同一的yaml文件，不做過多解釋
kubernetes-serviceMonitorCoreDNS.yaml

關于serviceMonitor不做過多的解釋，上述已經解讀過，如果不清楚可以翻看該文檔的 kube-prometheus自定義監控
prometheus-adapterxxx.yaml
1. adapter這里簡單介紹一下，有興趣的可以自行百度查看具體用法
2. prometheus采集到的metrics并不能直接給k8s用，因為兩者數據格式不兼容，這時就需要一個組件(prometheus-adapter)，將prometheus采集到的數據格式轉換成k8s API接口能夠識別的格式。

kube-prometheus報警流程介紹

[圖片上傳失敗...(image-bd5dd1-1689257827144)]

東西向流量簡單說明：exporter-->prometheus-->alertmanager-->報警接收渠道

exporter：可參考 https://note.youdao.com/s/EQ3Ra7MD

prometheus對接alertmanager

global:
  scrape_interval: 30s
  scrape_timeout: 10s
  evaluation_interval: 30s
  external_labels:
    prometheus: monitoring/k8s
    prometheus_replica: prometheus-k8s-0
alerting:
  alert_relabel_configs:
  - separator: ;
    regex: prometheus_replica
    replacement: $1
    action: labeldrop
  alertmanagers:
  - follow_redirects: true
    scheme: http
    path_prefix: /
    timeout: 10s
    api_version: v2
    relabel_configs:
    - source_labels: [__meta_kubernetes_service_name]
      separator: ;
      regex: alertmanager-main
      replacement: $1
      action: keep
    - source_labels: [__meta_kubernetes_endpoint_port_name]
      separator: ;
      regex: web
      replacement: $1
      action: keep
    kubernetes_sd_configs:
    - role: endpoints
      follow_redirects: true
      namespaces:
        names:
        - monitoring
rule_files:
- /etc/prometheus/rules/prometheus-k8s-rulefiles-0/*.yaml

該文件來自于prometheus web中connfiguration

[圖片上傳失敗...(image-811dff-1689257827144)]

regex: alertmanager-main

regex:web

匹配的服務為alertmanager-main，端口為web

查看svc為alertmanager-main

[root@k8s-master manifests]# kubectl get svc -n monitoring  | grep alertmanager-main 
alertmanager-main       ClusterIP   100.101.252.112   <none>        9093/TCP                     117d
[root@k8s-master manifests]# kubectl describe  svc alertmanager-main   -n monitoring  
Name:              alertmanager-main
Namespace:         monitoring
Labels:            alertmanager=main
                   app.kubernetes.io/component=alert-router
                   app.kubernetes.io/name=alertmanager
                   app.kubernetes.io/part-of=kube-prometheus
                   app.kubernetes.io/version=0.21.0
Annotations:       <none>
Selector:          alertmanager=main,app.kubernetes.io/component=alert-router,app.kubernetes.io/name=alertmanager,app.kubernetes.io/part-of=kube-prometheus,app=alertmanager
Type:              ClusterIP
IP Families:       <none>
IP:                100.101.252.112
IPs:               100.101.252.112
Port:              web  9093/TCP
TargetPort:        web/TCP
Endpoints:         10.244.129.125:9093,10.244.32.198:9093,10.244.32.254:9093
Session Affinity:  ClientIP
Events:            <none>

該svc正是由alertmanager-service.yaml生成

kube-prometheus自動發現

配置自動發現

- job_name: "endpoints"
  kubernetes_sd_configs:
    - role: endpoints
  relabel_configs: # 指標采集之前或采集過程中去重新配置
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
      action: keep # 保留具有 prometheus.io/scrape=true 這個注解的Service
      regex: true
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
      action: replace
      target_label: __metrics_path__
      regex: (.+)
    - source_labels:
        [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
      action: replace
      target_label: __address__
      regex: ([^:]+)(?::\d+)?;(\d+) # RE2 正則規則，+是一次多多次，?是0次或1次，其中?:表示非匹配組(意思就是不獲取匹配結果)
      replacement: $1:$2
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
      action: replace
      target_label: __scheme__
      regex: (https?)
    - action: labelmap
      regex: __meta_kubernetes_service_label_(.+)
      replacement: $1
    - source_labels: [__meta_kubernetes_namespace]
      action: replace
      target_label: kubernetes_namespace
    - source_labels: [__meta_kubernetes_service_name]
      action: replace
      target_label: kubernetes_service
    - source_labels: [__meta_kubernetes_pod_name]
      action: replace
      target_label: kubernetes_pod
    - source_labels: [__meta_kubernetes_node_name]
      action: replace
      target_label: kubernetes_node

將上述文件保存為prometheus-additional.yaml

創建secret

kubectl create secret generic additional-configs --from-file=prometheus-additional.yaml -n monitoring

其中 -from-file=prometheus-additional.yaml 該文件就是上邊生成的yaml文件

修改prometheus-prometheus.yaml

  ......
  additionalScrapeConfigs:
    name: additional-configs
    key: prometheus-additional.yaml

在該文件末尾處添加

更新prometheus-prometheus.yaml
```
kubectl apply -f prometheus-prometheus.yaml
```
補充：如果版本過低，在kubectl logs -f prometheus-k8s-0 prometheus -n monitoring日志中會出現forbidden報錯的字段，主要是因為權限設置的問題(rbac)

prometheus關聯的ServiceAccount為 serviceAccountName: prometheus-k8s 該ServiceAccount配置來自于prometheus-prometheus.yaml，然后通過serviceAccountName: prometheus-k8s查找發現其綁定的文件為prometheus-clusterRole.yaml

prometheus-clusterRole.yaml
```
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    app.kubernetes.io/component: prometheus
    app.kubernetes.io/name: prometheus
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 2.26.0
  name: prometheus-k8s
rules:
- apiGroups:
  - ""
  resources:
  - nodes/metrics
  - services
  - endpoints
  - pods
  verbs:
  - get
  - list
  - watch
- nonResourceURLs:
  - /metrics
  verbs:
  - get
```
其中resources中包含了services、pods的資源，不需要更改
通過prometheus web查看targets

[圖片上傳失敗...(image-72eb32-1689257827144)]

prometheus自動發現自定義service

apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/scrape: "true"
  labels:
    app: check-secret-tls
    release: prod
  name: check-secret-tls
  namespace: kube-ops
spec:
  ports:
  - name: check-secret-tls
    port: 8090
    protocol: TCP
    targetPort: 8090
  selector:
    app: check-secret-tls
    release: prod
  type: ClusterIP

添加注解：prometheus.io/scrape: "true" 即可

驗證自動發現是否生效

[圖片上傳失敗...(image-e8c5ec-1689257827144)]

忽略新加入的targets狀態為down，這是因為我環境問題，導致程序啟動失敗

補充：關于exporter自定義開發可以參考：https://note.youdao.com/s/EQ3Ra7MD

關于metrics路徑

特殊項目，需要指定metrics

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: chekc-secret-tls
  namespace: monitoring
spec:
  endpoints:
    - interval: 15s
      path: /api/metrics
      port: check-secret-tls
  namespaceSelector:
    any: true
  selector:
    matchLabels:
      app: 'check-secret-tls'

創建ServiceMonitor，通過selector.matchLabels標簽進行匹配svc，這時匹配到的svc就不需要添加加注解：prometheus.io/scrape: "true"

其中spec.endpoints.path 該參數可以指定metrics的路徑

效果展示

[圖片上傳失敗...(image-1f4e55-1689257827144)]

請忽略狀態為down，這是因為僅測試prometheus的功能

如果需要使用自動發現更改metrics路徑，適用于以后的所有項目

更改prometheus-additional.yaml

- job_name: "endpoints"
  metrics_path: /api/v2/metrics
  kubernetes_sd_configs:
    - role: endpoints
  relabel_configs: # 指標采集之前或采集過程中去重新配置
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
      action: keep # 保留具有 prometheus.io/scrape=true 這個注解的Service
      regex: true
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
      action: replace
      target_label: __metrics_path__
      regex: (.+)
    - source_labels:
        [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
      action: replace
      target_label: __address__
      regex: ([^:]+)(?::\d+)?;(\d+) # RE2 正則規則，+是一次多多次，?是0次或1次，其中?:表示非匹配組(意思就是不獲取匹配結果)
      replacement: $1:$2
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
      action: replace
      target_label: __scheme__
      regex: (https?)
    - action: labelmap
      regex: __meta_kubernetes_service_label_(.+)
      replacement: $1
    - source_labels: [__meta_kubernetes_namespace]
      action: replace
      target_label: kubernetes_namespace
    - source_labels: [__meta_kubernetes_service_name]
      action: replace
      target_label: kubernetes_service
    - source_labels: [__meta_kubernetes_pod_name]
      action: replace
      target_label: kubernetes_pod
    - source_labels: [__meta_kubernetes_node_name]
      action: replace
      target_label: kubernetes_node

添加字段 metrics_path: /api/v2/metrics，更多字段可以參考：https://prometheus.io/docs/prometheus/latest/configuration/configuration/

下面步驟參考本章節的創建secret-->修改prometheus-prometheus.yaml-->更新prometheus-prometheus.yaml

kube-prometheus自定義告警規則

在kube-prometheus主要yaml文件介紹的時候，已經說過，prometheus-prometheusRule.yaml這個定義的其實是報警規則，其kind: PrometheusRule，metadata.labels為prometheus: k8s、role: alert-rules，至于為什么添加這兩個標簽，這是因為prometheus-prometheus.yaml 這里的spec.ruleSelector.metchLabels進行匹配的

創建報警規則yaml文件

customize_rule.yaml

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    prometheus: k8s
    role: alert-rules
  name: customize-rules
  namespace: monitoring
spec:
  groups:
    - name: test
      rules:
        - alert: CustomizeRule
          annotations:
            summary: CustomizeRule summary
            description: Customize Rule use for test
          expr: |
            coredns_forward_request_duration_seconds_bucket >=3000
          for: 3m
          labels:
            severity: warning

創建報警規則 kubectl apply -f customize_rule.yaml

補充：當然也可以直接在 prometheus-prometheusRule.yaml繼續添加報警規則

查看告警創建結果

[圖片上傳失敗...(image-4b7f29-1689257827144)]

[圖片上傳失敗...(image-516ec0-1689257827144)]

至此報警規則已經創建完畢，根據自己的需求創建即可

kube-prometheus自定義告警渠道

常用告警渠道簡單介紹幾種
1. 郵件
2. webhook

查看alertmanager web

[圖片上傳失敗...(image-28e5da-1689257827144)]

該config文件，其實是`alertmanager-secret.yaml`定義的，經過base64加密，具體信息可以查看`alertmanager-main`secret

webhook告警渠道自定義

定義告警配置

alertmanager-config.yaml

apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
  name: config-example
  namespace: monitoring
  labels:
    alertmanagerConfig: example
spec:
  route:
    groupBy: ['job']
    groupWait: 30s
    groupInterval: 5m
    repeatInterval: 12h
    receiver: 'webhook'
  receivers:
  - name: 'webhook'
    webhookConfigs:
    - url: 'http://192.168.10.70:8008/api/v1/ping'

修改alertmanager-alertmanager.yaml

  ......
  configSecret:
  alertmanagerConfigSelector: # 匹配 AlertmanagerConfig 的標簽
    matchLabels:
      alertmanagerConfig: example

該文件末尾添加標簽選擇

更新修改的文件

kubectl apply -f  alertmanager-config.yaml
 kubectl apply -f alertmanager-alertmanager.yaml

再次查看alertmanager web

[圖片上傳失敗...(image-acff9d-1689257827144)]
查看url接口(/api/v1/ping)

image-20230115171811951

這個是我僅做測試使用的，該接口(/api/v1/ping)經過測試驗證了自定義告警渠道webhook是ok的
webhook告警工具推薦

PrometheusAlert
更多告警渠道

https://prometheus.io/docs/alerting/0.21/configuration/
參考資料

https://prometheus.io/docs/alerting/0.21/configuration/

https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/user-guides/alerting.md

Thanos

優勢

全局視圖
長期存儲
兼容prometheus

架構圖

[圖片上傳失敗...(image-da809-1689257827144)]

Sidecar：連接prometheus，讀取其數據進行查詢或上傳到云存儲

Store Gateway：訪問放在對象存儲中的指標數據

Compact：采樣壓縮，清理對象存儲中的數據

Ruler：根據Thanos中的數據評估記錄和報警規則，進行展示、上傳

Query：實現prometheus的api v1聚合

object Storage：對象存儲

Frontend：查詢緩存

thanos+minio+kube-prometheus對接

minio需要提前部署，具體可參考：https://note.youdao.com/s/KyC0xeI3

修改prometheus-prometheus.yaml

  thanos:
    baseImage: quay.io/thanos/thanos
    version: v0.29.0
    objectStorageConfig:
      key: thanos.yaml
      name: thanos-objstore-config

在文件末尾處添加，主要作用是添加sidecar

objectStorageConfig：這里使用secret的方式

創建secret供prometheus-prometheus.yaml中objectStorageConfig使用
- thanos-config.yaml文件
```
type: s3
config:
  bucket: thanos
  endpoint: 192.168.70.201:9090
  access_key: admin
  secret_key: adminadmin
  insecure: true
  signature_version2: false
  http_config:
    idle_conn_timeout: 90s
    insecure_skip_verify: true
```
  bucket：存儲桶
  
  ccess_key：登錄minio的賬號
  
  ecret_key：登錄minio的密碼
  
  insecure: true 使用http
  
  idle_conn_timeout：超時時間配置
  
  insecure_skip_verify: true 跳過TLS證書驗證
  
  更多配置參數可以參考： https://thanos.io/tip/thanos/storage.md/#s3
- 創建secret
```
kubectl -n monitoring  create secret generic thanos-objstore-config  --from-file=thanos.yaml=./thanos-config.yaml
```

更新prometheus-prometheus.yaml后查看pod

[root@k8s-master manifests]# kubectl get pod -n monitoring  | grep prometheus-k8s
prometheus-k8s-0                       3/3     Running            1          22h
prometheus-k8s-1                       3/3     Running            1          22h

其中3個pod已經running，3個pod中有sidecar

 [root@k8s-master manifests]# kubectl describe pod prometheus-k8s-0 -n monitoring
 thanos-sidecar:
    Container ID:  docker://b8ee9554fc2f4cb37480987a5a65da2c26c8aa16395103fed79c2ea1cdf043b9
    Image:         quay.io/thanos/thanos:v0.29.0
    Image ID:      docker-pullable://quay.io/thanos/thanos@sha256:4766a6caef0d834280fed2d8d059e922bc8781e054ca11f62de058222669d9dd
    Ports:         10902/TCP, 10901/TCP
    Host Ports:    0/TCP, 0/TCP
    Args:
      sidecar
      --prometheus.url=http://localhost:9090/
      --grpc-address=[$(POD_IP)]:10901
      --http-address=[$(POD_IP)]:10902
      --objstore.config=$(OBJSTORE_CONFIG)
      --tsdb.path=/prometheus

下載kube-thanos需要的yaml文件
```
git clone https://github.com/thanos-io/kube-thanos
```
所需文件：manifests目錄中

kube-thanos清單

[root@k8s-master manifests]# ll
總用量 72
-rw-r--r-- 1 root root 2604 1月  31 09:52 thanos-query-deployment.yaml
-rw-r--r-- 1 root root  285 11月  3 23:20 thanos-query-serviceAccount.yaml
-rw-r--r-- 1 root root  603 11月  3 23:20 thanos-query-serviceMonitor.yaml
-rw-r--r-- 1 root root  539 1月  30 11:19 thanos-query-service.yaml
-rw-r--r-- 1 root root  790 11月  3 23:20 thanos-receive-ingestor-default-service.yaml
-rw-r--r-- 1 root root 4779 11月  3 23:20 thanos-receive-ingestor-default-statefulSet.yaml
-rw-r--r-- 1 root root  321 11月  3 23:20 thanos-receive-ingestor-serviceAccount.yaml
-rw-r--r-- 1 root root  729 11月  3 23:20 thanos-receive-ingestor-serviceMonitor.yaml
-rw-r--r-- 1 root root  268 11月  3 23:20 thanos-receive-router-configmap.yaml
-rw-r--r-- 1 root root 2676 11月  3 23:20 thanos-receive-router-deployment.yaml
-rw-r--r-- 1 root root  308 11月  3 23:20 thanos-receive-router-serviceAccount.yaml
-rw-r--r-- 1 root root  661 11月  3 23:20 thanos-receive-router-service.yaml
-rw-r--r-- 1 root root  294 11月  3 23:20 thanos-store-serviceAccount.yaml
-rw-r--r-- 1 root root  621 11月  3 23:20 thanos-store-serviceMonitor.yaml
-rw-r--r-- 1 root root  560 11月  3 23:20 thanos-store-service.yaml
-rw-r--r-- 1 root root 3331 1月  30 18:10 thanos-store-statefulSet.yaml

修改thanos-query-deployment.yaml

      containers:
      - args:
        - query
        - --grpc-address=0.0.0.0:10901
        - --http-address=0.0.0.0:9090
        - --log.level=info
        - --log.format=logfmt
        - --query.replica-label=prometheus_replica
        - --query.replica-label=rule_replica
        - --store=dnssrv+prometheus-operated.monitoring.svc.cluster.local:10901
        - --query.auto-downsampling

添加 --store=dnssrv+prometheus-operated.monitoring.svc.cluster.local:10901，這里使用跨名稱空間訪問prometheus的svc

創建thanos的名稱空間
```
kubectl create ns thanos
```

更新query相關yaml

kubectl apply -f  thanos-query-deployment.yaml -f  thanos-query-serviceAccount.yaml -f  thanos-query-serviceMonitor.yaml -f thanos-query-service.yaml

修改query的svc類型

thanos-query-service.yaml

apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/component: query-layer
    app.kubernetes.io/instance: thanos-query
    app.kubernetes.io/name: thanos-query
    app.kubernetes.io/version: v0.29.0
  name: thanos-query
  namespace: thanos
spec:
  type: NodePort
  ports:
  - name: grpc
    port: 10901
    targetPort: 10901
  - name: http
    port: 9090
    targetPort: 9090
  selector:
    app.kubernetes.io/component: query-layer
    app.kubernetes.io/instance: thanos-query
    app.kubernetes.io/name: thanos-query

訪問query截圖

[圖片上傳失敗...(image-e069b6-1689257827144)]
修改thanos-store-statefulSet.yaml
```
  volumeClaimTemplates:
   - metadata:
       labels:
         app.kubernetes.io/component: object-store-gateway
         app.kubernetes.io/instance: thanos-store
         app.kubernetes.io/name: thanos-store
       name: data
     spec:
       storageClassName: 'nfs-storage'
       accessModes:
       - ReadWriteOnce
       resources:
         requests:
           storage: 10Gi
```
文件末尾處修改處：storageClassName: 'nfs-storage'

sc需要自行提前創建，具體可參考：https://note.youdao.com/s/IGwRd3gk

補充：由于該實驗k8s版本為：v1.20.15，使用sc是需要修改 /etc/kubernetes/manifests/kube-apiserver.yaml

否則掛載失敗，主要是k8s版本更新，去掉了一些字段

添加 - --feature-gates=RemoveSelfLink=false

image-20230201103326686

更新store相關yaml

kubectl apply -f thanos-store-serviceAccount.yaml -f thanos-store-serviceMonitor.yaml -f thanos-store-service.yaml -f thanos-store-statefulSet.yaml

修改thanos-query-deployment.yaml對接store

   ......
   containers:
      - args:
        - query
        - --grpc-address=0.0.0.0:10901
        - --http-address=0.0.0.0:9090
        - --log.level=info
        - --log.format=logfmt
        - --query.replica-label=prometheus_replica
        - --query.replica-label=rule_replica
        - --store=dnssrv+prometheus-operated.monitoring.svc.cluster.local:10901
        - --store=dnssrv+_grpc._tcp.thanos-store.thanos.svc.cluster.local:10901
        - --query.auto-downsampling
        env:
        - name: HOST_IP_ADDRESS
          valueFrom:
            fieldRef:
              fieldPath: status.hostIP
        image: quay.io/thanos/thanos:v0.29.0
        imagePullPolicy: IfNotPresent
        ......

添加 - --store=dnssrv+_grpc._tcp.thanos-store.thanos.svc.cluster.local:10901

更新thanos-query-deployment.yaml

kubectl apply -f thanos-query-deployment.yaml

訪問query截圖

[圖片上傳失敗...(image-294353-1689257827144)]
補充
- 多集群對接，類似上邊的步驟，當然使用kube-prometheus部署的prometheus，其會有external_labels標簽為prometheus_replica

thanos之query

當使用thanos-query進行指標查詢時，通過storeApi grpc進行查詢的

Query與Sidecar

[圖片上傳失敗...(image-9f7aa6-1689257827144)]

Sicecar上傳數據到對象存儲

[圖片上傳失敗...(image-fa2b00-1689257827144)]

快樂交流

博客請訪問：https://kubesre.com/
公眾號請搜索：云原生運維圈

?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明：文章內容（如有圖片或視頻亦包括在內）由作者上傳并發布，文章內容僅代表作者本人觀點，簡書系信息發布平臺，僅提供信息存儲服務。

禁止轉載，如需轉載請通過簡信或評論聯系作者。

人面猴
序言：七十年代末，一起剝皮案震驚了整個濱河市，隨后出現的幾起案子，更是在濱河造成了極大的恐慌，老刑警劉巖，帶你破解...
沈念sama閱讀 228,316評論 6贊 531
死咒
序言：濱河連續發生了三起死亡事件，死亡現場離奇詭異，居然都是意外死亡，警方通過查閱死者的電腦和手機，發現死者居然都...
沈念sama閱讀 98,481評論 3贊 415
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進店門，熙熙樓的掌柜王于貴愁眉苦臉地迎上來，“玉大人，你說我怎么就攤上這事。” “怎么了？”我有些...
開封第一講書人閱讀 176,241評論 0贊 374
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵，是天一觀的道長。經常有香客問我，道長，這世上最難降的妖魔是什么？我笑而不...
開封第一講書人閱讀 62,939評論 1贊 309
?港島之戀（遺憾婚禮）
正文為了忘掉前任，我火速辦了婚禮，結果婚禮上，老公的妹妹穿的比我還像新娘。我一直安慰自己，他們只是感情好，可當我...
茶點故事閱讀 71,697評論 6贊 409
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布。她就那樣靜靜地躺著，像睡著了一般。火紅的嫁衣襯著肌膚如雪。梳的紋絲不亂的頭發上，一...
開封第一講書人閱讀 55,182評論 1贊 324
城市分裂傳說
那天，我揣著相機與錄音，去河邊找鬼。笑死，一個胖子當著我的面吹牛，可吹牛的內容都是我干的。我是一名探鬼主播，決...
沈念sama閱讀 43,247評論 3贊 441
雙鴛鴦連環套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼，長吁一口氣：“原來是場噩夢啊……” “哼！你這毒婦竟也來了？” 一聲冷哼從身側響起，我...
開封第一講書人閱讀 42,406評論 0贊 288
萬榮殺人案實錄
序言：老撾萬榮一對情侶失蹤，失蹤者是張志新（化名）和其女友劉穎，沒想到半個月后，有當地人在樹林里發現了一具尸體，經...
沈念sama閱讀 48,933評論 1贊 334
?護林員之死
正文獨居荒郊野嶺守林人離奇死亡，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內容為張勛視角年9月15日...
茶點故事閱讀 40,772評論 3贊 354
?白月光啟示錄
正文我和宋清朗相戀三年，在試婚紗的時候發現自己被綠了。大學時的朋友給我發了我未婚夫和他白月光在一起吃飯的照片。...
茶點故事閱讀 42,973評論 1贊 369
活死人
序言：一個原本活蹦亂跳的男人離奇死亡，死狀恐怖，靈堂內的尸體忽然破棺而出，到底是詐尸還是另有隱情，我是刑警寧澤，帶...
沈念sama閱讀 38,516評論 5贊 359
?日本核電站爆炸內幕
正文年R本政府宣布，位于F島的核電站，受9級特大地震影響，放射性物質發生泄漏。R本人自食惡果不足惜，卻給世界環境...
茶點故事閱讀 44,209評論 3贊 347
男人毒藥：我在死后第九天來索命
文/蒙蒙一、第九天我趴在偏房一處隱蔽的房頂上張望。院中可真熱鬧，春花似錦、人聲如沸。這莊子的主人今日做“春日...
開封第一講書人閱讀 34,638評論 0贊 26
一樁弒父案，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽。三九已至，卻和暖如春，著一層夾襖步出監牢的瞬間，已是汗流浹背。一陣腳步聲響...
開封第一講書人閱讀 35,866評論 1贊 285
情欲美人皮
我被黑心中介騙來泰國打工，沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留，地道東北人。一個月前我還...
沈念sama閱讀 51,644評論 3贊 391
代替公主和親
正文我出身青樓，卻偏偏與公主長得像，于是被迫代替她去往敵國和親。傳聞我的和親對象是個殘疾皇子，可洞房花燭夜當晚...
茶點故事閱讀 47,953評論 2贊 373

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频

運維三板斧-監控prometheus

運維三板斧-監控prometheus

kube-promethesu對k8s版本支持

kube-prometheus安裝

kube-prometheus地址

kube-prometheus組件

下載kube-prometheus

部署文件清單

部署

驗證

修改grafana的service類型為NodePort

持久化prometheus

kube-prometheus介紹

kube-prometheus自定義監控

kube-prometheus主要yaml文件介紹

kube-prometheus報警流程介紹

kube-prometheus自動發現

kube-prometheus自定義告警規則

kube-prometheus自定義告警渠道

Thanos

優勢

架構圖

thanos+minio+kube-prometheus對接

thanos之query

Query與Sidecar

Sicecar上傳數據到對象存儲

快樂交流

推薦閱讀更多精彩內容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美 国产 综合 欧美 视频

運維三板斧-監控prometheus

kube-promethesu對k8s版本支持

kube-prometheus安裝

kube-prometheus地址

kube-prometheus組件

下載kube-prometheus

部署文件清單

部署

驗證

修改grafana的service類型為NodePort

持久化prometheus

kube-prometheus介紹

kube-prometheus自定義監控

kube-prometheus主要yaml文件介紹

kube-prometheus報警流程介紹

kube-prometheus自動發現

kube-prometheus自定義告警規則

kube-prometheus自定義告警渠道

Thanos

優勢

架構圖

thanos+minio+kube-prometheus對接

thanos之query

Query與Sidecar

Sicecar上傳數據到對象存儲

快樂交流

推薦閱讀更多精彩內容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频