jvm-exporter整合k8s+prometheus監(jiān)控報警

文章背景:使用Prometheus+Grafana監(jiān)控JVM,這片文章中介紹了怎么用jvm-exporter監(jiān)控我們的java應(yīng)用,在我們的使用場景中需要監(jiān)控k8s集群中的jvm,接下來談?wù)刱8s和Prometheus的集成擴展使用,假設(shè)我們已經(jīng)成功將Prometheus部署到我們的k8s集群中了kubernetes集成prometheus+grafana監(jiān)控,但是kube-prometheus并沒有集成jvm-exporter,這就需要我們自己操作。

  1. 將jvm-exporter整合進我們的應(yīng)用

整合過程很簡單,只需要將jvm-exporter作為javaagent加入到我們的java啟動命令就可以了,詳細見使用Prometheus+Grafana監(jiān)控JVM

  1. 配置Prometheus服務(wù)自動發(fā)現(xiàn)

對于有Service暴露的服務(wù)我們可以用 prometheus-operator 項目定義的ServiceMonitorCRD來配置服務(wù)發(fā)現(xiàn),配置模板如下:

--- # ServiceMonitor 服務(wù)自動發(fā)現(xiàn)規(guī)則
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor # prometheus-operator 定義的CRD
metadata:
  name: jmx-metrics
  namespace: monitoring
  labels:
    k8s-apps: jmx-metrics
spec:
  jobLabel: metrics #監(jiān)控數(shù)據(jù)的job標簽指定為metrics label的值,即加上數(shù)據(jù)標簽job=jmx-metrics
  selector:
    matchLabels:
      metrics: jmx-metrics # 自動發(fā)現(xiàn) label中有metrics: jmx-metrics 的service
  namespaceSelector:
    matchNames: # 配置需要自動發(fā)現(xiàn)的命名空間,可以配置多個
    - my-namespace
  endpoints:
  - port: http-metrics # 拉去metric的端口,這個寫的是 service的端口名稱,即 service yaml的spec.ports.name
    interval: 15s # 拉取metric的時間間隔

--- # 服務(wù)service模板
apiVersion: v1
kind: Service
metadata:
  labels:
    metrics: jmx-metrics # ServiceMonitor 自動發(fā)現(xiàn)的關(guān)鍵label
  name: jmx-metrics
  namespace: my-namespace
spec:
  ports:
  - name: http-metrics #對應(yīng) ServiceMonitor 中spec.endpoints.port
    port: 9093 # jmx-exporter 暴露的服務(wù)端口
    targetPort: http-metrics # pod yaml 暴露的端口名
  selector:
    metrics: jmx-metrics # service本身的標簽選擇器

以上配置了my-namespace命名空間的 jmx-metrics Service的服務(wù)自動發(fā)現(xiàn),Prometheus會將這個service 的所有關(guān)聯(lián)pod自動加入監(jiān)控,并從apiserver獲取到最新的pod列表,這樣當我們的服務(wù)副本擴充時也能自動添加到監(jiān)控系統(tǒng)中。

那么對于沒有創(chuàng)建 Service 的服務(wù),比如以HostPort對集群外暴露服務(wù)的實例,我們可以使用 PodMonitor 來做服務(wù)發(fā)現(xiàn),相關(guān)樣例如下:

--- # PodMonitor 服務(wù)自動發(fā)現(xiàn)規(guī)則,最新的版本支持,舊版本可能不支持
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor # prometheus-operator 定義的CRD
metadata:
  name: jmx-metrics
  namespace: monitoring
  labels:
    k8s-apps: jmx-metrics
spec:
  jobLabel: metrics #監(jiān)控數(shù)據(jù)的job標簽指定為metrics label的值,即加上數(shù)據(jù)標簽job=jmx-metrics
  selector:
    matchLabels:
     metrics: jmx-metrics # 自動發(fā)現(xiàn) label中有metrics: jmx-metrics 的pod
  namespaceSelector:
    matchNames: # 配置需要自動發(fā)現(xiàn)的命名空間,可以配置多個
    - my-namespace
  podMetricsEndpoints:
  - port: http-metrics # Pod yaml中 metric暴露端口的名稱 即 spec.ports.name
    interval: 15s # 拉取metric的時間間隔
--- # 需要監(jiān)控的Pod模板
apiVersion: v1
kind: Pod
metadata:
  labels:
    metrics: jmx-metrics
  name: jmx-metrics
  namespace: my-namespace
spec:
  containers:
  - image: tomcat:9.0
    name: tomcat
    ports:
    - containerPort: 9093
      name: http-metrics
  1. 為Prometheus serviceAccount 添加對應(yīng)namespace的權(quán)限
--- # 在對應(yīng)的ns中創(chuàng)建角色
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: prometheus-k8s
  namespace: my-namespace
rules:
- apiGroups:
  - ""
  resources:
  - services
  - endpoints
  - pods
  verbs:
  - get
  - list
  - watch
--- # 綁定角色 prometheus-k8s 角色到 Role
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: prometheus-k8s
  namespace: my-namespace
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: prometheus-k8s
subjects:
- kind: ServiceAccount
  name: prometheus-k8s # Prometheus 容器使用的 serviceAccount,kube-prometheus默認使用prometheus-k8s這個用戶
  namespace: monitoring
  1. 在Prometheus管理頁面中查看服務(wù)發(fā)現(xiàn)

服務(wù)發(fā)現(xiàn)配置成功后會出現(xiàn)在Prometheus的管理界面中:

image.png
  1. 添加報警規(guī)則

新建報警規(guī)則文件:jvm-alert-rules.yaml,填入以下內(nèi)容

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    prometheus: k8s
    role: alert-rules
  name: jvm-metrics-rules
  namespace: monitoring
spec:
  groups:
  - name: jvm-metrics-rules
    rules:
    # 在5分鐘里,GC花費時間超過10%
    - alert: GcTimeTooMuch
      expr: increase(jvm_gc_collection_seconds_sum[5m]) > 30
      for: 5m
      labels:
        severity: red
      annotations:
        summary: "{{ $labels.app }} GC時間占比超過10%"
        message: "ns:{{ $labels.namespace }} pod:{{ $labels.pod }} GC時間占比超過10%,當前值({{ $value }}%)"
    # GC次數(shù)太多
    - alert: GcCountTooMuch
      expr: increase(jvm_gc_collection_seconds_count[1m]) > 30
      for: 1m
      labels:
        severity: red
      annotations:
        summary: "{{ $labels.app }} 1分鐘GC次數(shù)>30次"
        message: "ns:{{ $labels.namespace }} pod:{{ $labels.pod }} 1分鐘GC次數(shù)>30次,當前值({{ $value }})"
    # FGC次數(shù)太多
    - alert: FgcCountTooMuch
      expr: increase(jvm_gc_collection_seconds_count{gc="ConcurrentMarkSweep"}[1h]) > 3
      for: 1m
      labels:
        severity: red
      annotations:
        summary: "{{ $labels.app }} 1小時的FGC次數(shù)>3次"
        message: "ns:{{ $labels.namespace }} pod:{{ $labels.pod }} 1小時的FGC次數(shù)>3次,當前值({{ $value }})"
    # 非堆內(nèi)存使用超過80%
    - alert: NonheapUsageTooMuch
      expr: jvm_memory_bytes_used{job="jmx-metrics", area="nonheap"} / jvm_memory_bytes_max * 100 > 80
      for: 1m
      labels:
        severity: red
      annotations:
        summary: "{{ $labels.app }} 非堆內(nèi)存使用>80%"
        message: "ns:{{ $labels.namespace }} pod:{{ $labels.pod }} 非堆內(nèi)存使用率>80%,當前值({{ $value }}%)"
    # 內(nèi)存使用預(yù)警
    - alert: HeighMemUsage
      expr: process_resident_memory_bytes{job="jmx-metrics"} / os_total_physical_memory_bytes * 100 > 85
      for: 1m
      labels:
        severity: red
      annotations:
        summary: "{{ $labels.app }} rss內(nèi)存使用率大于85%"
        message: "ns:{{ $labels.namespace }} pod:{{ $labels.pod }} rss內(nèi)存使用率大于85%,當前值({{ $value }}%)"

執(zhí)行 kubectl apply -f jvm-alert-rules.yaml使規(guī)則生效

  1. 添加報警接收人

編輯接受人配置:

global:
  resolve_timeout: 5m
route:
  group_by: ['job', 'alertname', 'pod']
  group_interval: 2m
  receiver: my-alert-receiver
  routes:
  - match: 
      job: jmx-metrics
    receiver: my-alert-receiver
    repeat_interval: 3h
receivers:
- name: my-alert-receiver
  webhook_configs:
  - url: http://mywebhook.com/
    max_alerts: 1
    send_resolved: true

使用工具轉(zhuǎn)換為base64編碼,填入alert-manager對應(yīng)的配置Secret中
kubectl edit -n monitoring Secret alertmanager-main

apiVersion: v1
data:
  alertmanager.yaml: KICAgICJyZWNlaXZlciI6ICJudWxsIg== # base64填入這里
kind: Secret
metadata:
  name: alertmanager-main
  namespace: monitoring
type: Opaque

退出編輯后稍等一會兒生效。

自此,jvm監(jiān)控系統(tǒng)配置完成。

附j(luò)vm-exporter接口返回參數(shù)示例,可以根據(jù)需要自取其中的metric

# HELP jvm_threads_current Current thread count of a JVM
# TYPE jvm_threads_current gauge
jvm_threads_current 218.0
# HELP jvm_threads_daemon Daemon thread count of a JVM
# TYPE jvm_threads_daemon gauge
jvm_threads_daemon 40.0
# HELP jvm_threads_peak Peak thread count of a JVM
# TYPE jvm_threads_peak gauge
jvm_threads_peak 219.0
# HELP jvm_threads_started_total Started thread count of a JVM
# TYPE jvm_threads_started_total counter
jvm_threads_started_total 249.0
# HELP jvm_threads_deadlocked Cycles of JVM-threads that are in deadlock waiting to acquire object monitors or ownable synchronizers
# TYPE jvm_threads_deadlocked gauge
jvm_threads_deadlocked 0.0
# HELP jvm_threads_deadlocked_monitor Cycles of JVM-threads that are in deadlock waiting to acquire object monitors
# TYPE jvm_threads_deadlocked_monitor gauge
jvm_threads_deadlocked_monitor 0.0
# HELP jvm_threads_state Current count of threads by state
# TYPE jvm_threads_state gauge
jvm_threads_state{state="NEW",} 0.0
jvm_threads_state{state="RUNNABLE",} 49.0
jvm_threads_state{state="TIMED_WAITING",} 141.0
jvm_threads_state{state="TERMINATED",} 0.0
jvm_threads_state{state="WAITING",} 28.0
jvm_threads_state{state="BLOCKED",} 0.0
# HELP jvm_info JVM version info
# TYPE jvm_info gauge
jvm_info{version="1.8.0_261-b12",vendor="Oracle Corporation",runtime="Java(TM) SE Runtime Environment",} 1.0
# HELP jvm_memory_bytes_used Used bytes of a given JVM memory area.
# TYPE jvm_memory_bytes_used gauge
jvm_memory_bytes_used{area="heap",} 1.553562144E9
jvm_memory_bytes_used{area="nonheap",} 6.5181496E7
# HELP jvm_memory_bytes_committed Committed (bytes) of a given JVM memory area.
# TYPE jvm_memory_bytes_committed gauge
jvm_memory_bytes_committed{area="heap",} 4.08027136E9
jvm_memory_bytes_committed{area="nonheap",} 6.8747264E7
# HELP jvm_memory_bytes_max Max (bytes) of a given JVM memory area.
# TYPE jvm_memory_bytes_max gauge
jvm_memory_bytes_max{area="heap",} 4.08027136E9
jvm_memory_bytes_max{area="nonheap",} 1.317011456E9
# HELP jvm_memory_bytes_init Initial bytes of a given JVM memory area.
# TYPE jvm_memory_bytes_init gauge
jvm_memory_bytes_init{area="heap",} 4.294967296E9
jvm_memory_bytes_init{area="nonheap",} 2555904.0
# HELP jvm_memory_pool_bytes_used Used bytes of a given JVM memory pool.
# TYPE jvm_memory_pool_bytes_used gauge
jvm_memory_pool_bytes_used{pool="Code Cache",} 2.096832E7
jvm_memory_pool_bytes_used{pool="Metaspace",} 3.9320064E7
jvm_memory_pool_bytes_used{pool="Compressed Class Space",} 4893112.0
jvm_memory_pool_bytes_used{pool="Par Eden Space",} 1.71496168E8
jvm_memory_pool_bytes_used{pool="Par Survivor Space",} 7.1602832E7
jvm_memory_pool_bytes_used{pool="CMS Old Gen",} 1.310463144E9
# HELP jvm_memory_pool_bytes_committed Committed bytes of a given JVM memory pool.
# TYPE jvm_memory_pool_bytes_committed gauge
jvm_memory_pool_bytes_committed{pool="Code Cache",} 2.3396352E7
jvm_memory_pool_bytes_committed{pool="Metaspace",} 4.0239104E7
jvm_memory_pool_bytes_committed{pool="Compressed Class Space",} 5111808.0
jvm_memory_pool_bytes_committed{pool="Par Eden Space",} 1.718091776E9
jvm_memory_pool_bytes_committed{pool="Par Survivor Space",} 2.14695936E8
jvm_memory_pool_bytes_committed{pool="CMS Old Gen",} 2.147483648E9
# HELP jvm_memory_pool_bytes_max Max bytes of a given JVM memory pool.
# TYPE jvm_memory_pool_bytes_max gauge
jvm_memory_pool_bytes_max{pool="Code Cache",} 2.5165824E8
jvm_memory_pool_bytes_max{pool="Metaspace",} 5.36870912E8
jvm_memory_pool_bytes_max{pool="Compressed Class Space",} 5.28482304E8
jvm_memory_pool_bytes_max{pool="Par Eden Space",} 1.718091776E9
jvm_memory_pool_bytes_max{pool="Par Survivor Space",} 2.14695936E8
jvm_memory_pool_bytes_max{pool="CMS Old Gen",} 2.147483648E9
# HELP jvm_memory_pool_bytes_init Initial bytes of a given JVM memory pool.
# TYPE jvm_memory_pool_bytes_init gauge
jvm_memory_pool_bytes_init{pool="Code Cache",} 2555904.0
jvm_memory_pool_bytes_init{pool="Metaspace",} 0.0
jvm_memory_pool_bytes_init{pool="Compressed Class Space",} 0.0
jvm_memory_pool_bytes_init{pool="Par Eden Space",} 1.718091776E9
jvm_memory_pool_bytes_init{pool="Par Survivor Space",} 2.14695936E8
jvm_memory_pool_bytes_init{pool="CMS Old Gen",} 2.147483648E9
# HELP jmx_config_reload_failure_total Number of times configuration have failed to be reloaded.
# TYPE jmx_config_reload_failure_total counter
jmx_config_reload_failure_total 0.0
# HELP os_free_physical_memory_bytes FreePhysicalMemorySize (java.lang<type=OperatingSystem><>FreePhysicalMemorySize)
# TYPE os_free_physical_memory_bytes gauge
os_free_physical_memory_bytes 9.1234304E8
# HELP os_committed_virtual_memory_bytes CommittedVirtualMemorySize (java.lang<type=OperatingSystem><>CommittedVirtualMemorySize)
# TYPE os_committed_virtual_memory_bytes gauge
os_committed_virtual_memory_bytes 2.2226296832E10
# HELP os_total_swap_space_bytes TotalSwapSpaceSize (java.lang<type=OperatingSystem><>TotalSwapSpaceSize)
# TYPE os_total_swap_space_bytes gauge
os_total_swap_space_bytes 0.0
# HELP os_max_file_descriptor_count MaxFileDescriptorCount (java.lang<type=OperatingSystem><>MaxFileDescriptorCount)
# TYPE os_max_file_descriptor_count gauge
os_max_file_descriptor_count 1048576.0
# HELP os_system_load_average SystemLoadAverage (java.lang<type=OperatingSystem><>SystemLoadAverage)
# TYPE os_system_load_average gauge
os_system_load_average 4.97
# HELP os_total_physical_memory_bytes TotalPhysicalMemorySize (java.lang<type=OperatingSystem><>TotalPhysicalMemorySize)
# TYPE os_total_physical_memory_bytes gauge
os_total_physical_memory_bytes 1.073741824E10
# HELP os_system_cpu_load SystemCpuLoad (java.lang<type=OperatingSystem><>SystemCpuLoad)
# TYPE os_system_cpu_load gauge
os_system_cpu_load 1.0
# HELP os_free_swap_space_bytes FreeSwapSpaceSize (java.lang<type=OperatingSystem><>FreeSwapSpaceSize)
# TYPE os_free_swap_space_bytes gauge
os_free_swap_space_bytes 0.0
# HELP os_available_processors AvailableProcessors (java.lang<type=OperatingSystem><>AvailableProcessors)
# TYPE os_available_processors gauge
os_available_processors 6.0
# HELP os_process_cpu_load ProcessCpuLoad (java.lang<type=OperatingSystem><>ProcessCpuLoad)
# TYPE os_process_cpu_load gauge
os_process_cpu_load 0.14194299011052938
# HELP os_open_file_descriptor_count OpenFileDescriptorCount (java.lang<type=OperatingSystem><>OpenFileDescriptorCount)
# TYPE os_open_file_descriptor_count gauge
os_open_file_descriptor_count 717.0
# HELP jmx_scrape_duration_seconds Time this JMX scrape took, in seconds.
# TYPE jmx_scrape_duration_seconds gauge
jmx_scrape_duration_seconds 0.004494197
# HELP jmx_scrape_error Non-zero if this scrape failed.
# TYPE jmx_scrape_error gauge
jmx_scrape_error 0.0
# HELP jmx_scrape_cached_beans Number of beans with their matching rule cached
# TYPE jmx_scrape_cached_beans gauge
jmx_scrape_cached_beans 0.0
# HELP jvm_buffer_pool_used_bytes Used bytes of a given JVM buffer pool.
# TYPE jvm_buffer_pool_used_bytes gauge
jvm_buffer_pool_used_bytes{pool="direct",} 2.3358974E7
jvm_buffer_pool_used_bytes{pool="mapped",} 0.0
# HELP jvm_buffer_pool_capacity_bytes Bytes capacity of a given JVM buffer pool.
# TYPE jvm_buffer_pool_capacity_bytes gauge
jvm_buffer_pool_capacity_bytes{pool="direct",} 2.3358974E7
jvm_buffer_pool_capacity_bytes{pool="mapped",} 0.0
# HELP jvm_buffer_pool_used_buffers Used buffers of a given JVM buffer pool.
# TYPE jvm_buffer_pool_used_buffers gauge
jvm_buffer_pool_used_buffers{pool="direct",} 61.0
jvm_buffer_pool_used_buffers{pool="mapped",} 0.0
# HELP jvm_gc_collection_seconds Time spent in a given JVM garbage collector in seconds.
# TYPE jvm_gc_collection_seconds summary
jvm_gc_collection_seconds_count{gc="ParNew",} 77259.0
jvm_gc_collection_seconds_sum{gc="ParNew",} 2399.831
jvm_gc_collection_seconds_count{gc="ConcurrentMarkSweep",} 1.0
jvm_gc_collection_seconds_sum{gc="ConcurrentMarkSweep",} 0.29
# HELP jmx_config_reload_success_total Number of times configuration have successfully been reloaded.
# TYPE jmx_config_reload_success_total counter
jmx_config_reload_success_total 0.0
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 1759604.89
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.608630226597E9
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 717.0
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1048576.0
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 2.2226292736E10
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 4.644765696E9
# HELP jmx_exporter_build_info A metric with a constant '1' value labeled with the version of the JMX exporter.
# TYPE jmx_exporter_build_info gauge
jmx_exporter_build_info{version="0.14.0",name="jmx_prometheus_javaagent",} 1.0
# HELP jvm_memory_pool_allocated_bytes_total Total bytes allocated in a given JVM memory pool. Only updated after GC, not continuously.
# TYPE jvm_memory_pool_allocated_bytes_total counter
jvm_memory_pool_allocated_bytes_total{pool="Par Survivor Space",} 1.42928399936E11
jvm_memory_pool_allocated_bytes_total{pool="CMS Old Gen",} 2.862731656E9
jvm_memory_pool_allocated_bytes_total{pool="Code Cache",} 2.8398656E7
jvm_memory_pool_allocated_bytes_total{pool="Compressed Class Space",} 4912848.0
jvm_memory_pool_allocated_bytes_total{pool="Metaspace",} 3.9438872E7
jvm_memory_pool_allocated_bytes_total{pool="Par Eden Space",} 1.32737951722432E14
# HELP jvm_classes_loaded The number of classes that are currently loaded in the JVM
# TYPE jvm_classes_loaded gauge
jvm_classes_loaded 7282.0
# HELP jvm_classes_loaded_total The total number of classes that have been loaded since the JVM has started execution
# TYPE jvm_classes_loaded_total counter
jvm_classes_loaded_total 7317.0
# HELP jvm_classes_unloaded_total The total number of classes that have been unloaded since the JVM has started execution
# TYPE jvm_classes_unloaded_total counter
jvm_classes_unloaded_total 35.0

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

推薦閱讀更多精彩內(nèi)容