Prometheus
TSDB是什么? (Time Series Database)
簡(jiǎn)單的理解為.一個(gè)優(yōu)化后用來(lái)處理時(shí)間序列數(shù)據(jù)的軟件,并且數(shù)據(jù)中的數(shù)組是由時(shí)間進(jìn)行索引的
l 大部分時(shí)間都是寫(xiě)入操作
l 寫(xiě)入操作幾乎是順序添加;大多數(shù)時(shí)候數(shù)據(jù)到達(dá)后都以時(shí)間排序.
l 寫(xiě)操作很少寫(xiě)入很久之前的數(shù)據(jù),也很少更新數(shù)據(jù).大多數(shù)情況在數(shù)據(jù)被采集到數(shù)秒或者數(shù)分鐘后就會(huì)被寫(xiě)入數(shù)據(jù)庫(kù).
l 刪除操作一般為區(qū)塊刪除,選定開(kāi)始的歷史時(shí)間并指定后續(xù)的區(qū)塊.很少單獨(dú)刪除某個(gè)時(shí)間或者分開(kāi)的隨機(jī)時(shí)間的數(shù)據(jù).
l 數(shù)據(jù)一般遠(yuǎn)遠(yuǎn)超過(guò)內(nèi)存大小,所以緩存基本無(wú)用.系統(tǒng)一般是 IO 密集型
l 讀操作是十分典型的升序或者降序的順序讀,
l 高并發(fā)的讀操作十分常見(jiàn).
Prometheus是什么
Prometheus 是由 SoundCloud 開(kāi)發(fā)的開(kāi)源監(jiān)控報(bào)警系統(tǒng)和時(shí)序列數(shù)據(jù)庫(kù)(TSDB)
Prometheus 在2016加入 CNCF (Cloud Native Computing Foundation), 作為在 kubernetes 之后的第二個(gè)由基金會(huì)主持的項(xiàng)目
Prometheus 的特點(diǎn)
l 多維數(shù)據(jù)模型(時(shí)序列數(shù)據(jù)由metric名和一組key/value組成)
l 在多維度上靈活的查詢語(yǔ)言(PromQl)
l 不依賴分布式存儲(chǔ),單主節(jié)點(diǎn)工作.
l 通過(guò)基于HTTP的pull方式采集時(shí)序數(shù)據(jù)
l 可以通過(guò)中間網(wǎng)關(guān)進(jìn)行時(shí)序列數(shù)據(jù)推送(pushing)
l 目標(biāo)服務(wù)器可以通過(guò)發(fā)現(xiàn)服務(wù)或者靜態(tài)配置實(shí)現(xiàn)
l 多種可視化和儀表盤(pán)支持
Prometheus 生態(tài)系統(tǒng)
l Prometheus 主服務(wù),用來(lái)抓取和存儲(chǔ)時(shí)序數(shù)據(jù)
l client library 用來(lái)構(gòu)造應(yīng)用或 exporter 代碼 (go,java,python,ruby)
l push 網(wǎng)關(guān)可用來(lái)支持短連接任務(wù)
l 可視化的dashboard (兩種選擇,promdash 和 grafana.目前主流選擇是 grafana.)
l 一些特殊需求的數(shù)據(jù)出口(用于HAProxy, StatsD, Graphite等服務(wù))
l 實(shí)驗(yàn)性的報(bào)警管理端(alartmanager,單獨(dú)進(jìn)行報(bào)警匯總,分發(fā),屏蔽等 )
<v:shapetype id="_x0000_t75" coordsize="21600,21600" o:spt="75" o:preferrelative="t" path="m@4@5l@4@11@9@11@9@5xe" filled="f" stroked="f"><v:stroke joinstyle="miter"><v:formulas></v:formulas><v:path o:extrusionok="f" gradientshapeok="t" o:connecttype="rect"></v:path></v:stroke></v:shapetype><v:shape id="圖片_x0020_2" o:spid="_x0000_i1031" type="#_x0000_t75" style="width:414.75pt;height:195.75pt;visibility:visible;mso-wrap-style:square"><v:imagedata src="file:///C:/Users/ccsou/AppData/Local/Temp/msohtmlclip1/01/clip_image001.png" o:title=""></v:imagedata></v:shape>
部署和配置
下載
地址: https://prometheus.io/download/
部署
下載 prometheus-*.tar.gz
解壓
配置
在prometheus目錄下有一個(gè)名為 prometheus.yml 的主配置文件.其中包含大多數(shù)標(biāo)準(zhǔn)配置及 prometheus 的自檢控配置,配置文件如下:
my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. [ 抓取的間隔時(shí)間]
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. [計(jì)算的間隔時(shí)間]
scrape_timeout is set to the global default (10s).
Alertmanager configuration
alerting:
alertmanagers:
static_configs:
targets:
'172.17.20.231:20507' [連接報(bào)警管理器]
Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- "first_rules.yml"
- "second_rules.yml"
- "alert-rule.yml" [此處有兩個(gè)規(guī)則,一個(gè)為計(jì)算規(guī)則,一個(gè)為報(bào)警規(guī)則]
A scrape configuration containing exactly one endpoint to scrape:
Here it's Prometheus itself.
scrape_configs:
The job name is added as a label job=<job_name>
to any timeseries scraped from this config.
- job_name: 'prometheus' [抓取的目標(biāo)]
metrics_path defaults to '/metrics' // [連接的prometheus 自帶的 exporter]
scheme defaults to 'http'.
static_configs:
targets: ['localhost:20504'] // [prometheus 啟動(dòng)的端口]
job_name: 'spring-boot'
metrics_path: '/prometheus' // [自己寫(xiě)的spring-boot的exporter地址]
static_configs:
- targets: ['localhost:20506'] [spring-boot 啟動(dòng)的端口]
啟動(dòng)
編寫(xiě)啟動(dòng)腳本
nohup ./prometheus --config.file=prometheus.yml --web.enable-admin-api --web.listen-address=:20504 >/dev/null 2>&1 &
靜默啟動(dòng) --web-listen-address 指定端口
數(shù)據(jù)類(lèi)型
l Counter : Counter表示收集的數(shù)據(jù)是按照某個(gè)趨勢(shì)(增加/減少)一直變化的。
l Gauge:
Gauge表示搜集的數(shù)據(jù)是瞬時(shí)的,可以任意變高變低。
l Histogram: Histogram可以理解為直方圖,主要用于表示一段時(shí)間范圍內(nèi)對(duì)數(shù)據(jù)進(jìn)行采樣,(通常是請(qǐng)求持續(xù)時(shí)間或響應(yīng)大小),并能夠?qū)ζ渲付▍^(qū)間以及總數(shù)進(jìn)行統(tǒng)計(jì)。
l Summary: Summary和Histogram十分相似,主要用于表示一段時(shí)間范圍內(nèi)對(duì)數(shù)據(jù)進(jìn)行采樣,(通常是請(qǐng)求持續(xù)時(shí)間或響應(yīng)大小),它直接存儲(chǔ)了 quantile 數(shù)據(jù),而不是根據(jù)統(tǒng)計(jì)區(qū)間計(jì)算出來(lái)的。
時(shí)序數(shù)據(jù)-打點(diǎn)-查詢
我們知道每條時(shí)序數(shù)據(jù)都是由 metric(指標(biāo)名稱),一個(gè)或一組label(標(biāo)簽),以及float64的值組成的。
標(biāo)準(zhǔn)格式為 <metric name>{<label name>=<label value>,...}
例如:
rpc_invoke_cnt_c{code="0",method="Session.GenToken",job="Center"} 5
rpc_invoke_cnt_c{code="0",method="Relation.GetUserInfo",job="Center"} 12
rpc_invoke_cnt_c{code="0",method="Message.SendGroupMsg",job="Center"} 12
rpc_invoke_cnt_c{code="4",method="Message.SendGroupMsg",job="Center"} 3
rpc_invoke_cnt_c{code="0",method="Tracker.Tracker.Get",job="Center"} 70
這是一組用于統(tǒng)計(jì)RPC接口處理次數(shù)的監(jiān)控?cái)?shù)據(jù)。
其中rpc_invoke_cnt_c為指標(biāo)名稱,每條監(jiān)控?cái)?shù)據(jù)包含三個(gè)標(biāo)簽:code 表示錯(cuò)誤碼,service表示該指標(biāo)所屬的服務(wù),method表示該指標(biāo)所屬的方法,最后的數(shù)字代表監(jiān)控值。
針對(duì)這個(gè)例子,我們共有四個(gè)維度(一個(gè)指標(biāo)名稱、三個(gè)標(biāo)簽),這樣我們便可以利用Prometheus強(qiáng)大的查詢語(yǔ)言PromQL進(jìn)行極為復(fù)雜的查詢。
PromQL
PromQL(Prometheus Query Language) 是 Prometheus 自己開(kāi)發(fā)的數(shù)據(jù)查詢 DSL 語(yǔ)言,語(yǔ)言表現(xiàn)力非常豐富,支持條件查詢、操作符,并且內(nèi)建了大量?jī)?nèi)置函,供我們針對(duì)監(jiān)控?cái)?shù)據(jù)的各種維度進(jìn)行查詢。
我們想統(tǒng)計(jì)Center組件Relation.GetUserInfo的頻率,可使用如下Query語(yǔ)句:
rate(rpc_invoke_cnt_c{method="Relation.GetUserInfo",job="Center"}[1m])
或者基于方法和錯(cuò)誤碼統(tǒng)計(jì)Center的整體RPC請(qǐng)求錯(cuò)誤頻率:
sum by (method, code)(rate(rpc_invoke_cnt_c{job="Center",code!="0"}[1m]))
如果我們想統(tǒng)計(jì)Center各方法的接口耗時(shí),使用如下Query語(yǔ)句即可:
rate(rpc_invoke_time_h_sum{job="Center"}[1m]) / rate(rpc_invoke_time_h_count{job="Center"}[1m])
rate(http_requests_total[5m])
返回范圍向量中每個(gè)時(shí)間序列在過(guò)去5分鐘內(nèi)測(cè)量的HTTP請(qǐng)求的每秒速率
increase(http_request_total[5m])
返回范圍向量中每個(gè)時(shí)間序列在過(guò)去5分鐘內(nèi)測(cè)得的HTTP請(qǐng)求數(shù)
官方函數(shù)庫(kù): https://prometheus.io/docs/querying/functions/
另外,配合查詢,在打點(diǎn)時(shí)metric和labal名稱的定義也有一定技巧。
rpc_invoke_cnt_c 表示rpc調(diào)用統(tǒng)計(jì)
api_req_num_cv 表示httpapi調(diào)用統(tǒng)計(jì)
msg_queue_cnt_c 表示隊(duì)列長(zhǎng)度統(tǒng)計(jì)
命名官方引導(dǎo): https://prometheus.io/docs/practices/naming/
報(bào)警
部署安裝
下載地址: https://prometheus.io/download/
制作啟動(dòng)腳本
nohup ./alertmanager --web.listen-address=:20507 >/dev/null 2>&1 &
調(diào)整配置文件
alertmanager.yml 文件
制定報(bào)警規(guī)則
首先制定報(bào)警規(guī)則,在prometheus 上進(jìn)行報(bào)警 rules 的配置
rule_files:
- "first_rules.yml"
- "second_rules.yml"
- "alert-rule.yml" [此處有兩個(gè)規(guī)則,一個(gè)為計(jì)算規(guī)則,一個(gè)為報(bào)警規(guī)則]
自己寫(xiě)對(duì)應(yīng)的報(bào)警規(guī)則:
groups:
- name: example
interval: 1s
rules:
Alert for any instance that is unreachable for >5 minutes.
- alert: InstanceDown
expr: up == 0
for: 1s
labels:
severity: page
annotations:
summary: "Instance {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down"
以上為宕機(jī)的報(bào)警規(guī)則
配置報(bào)警設(shè)置
以下為簡(jiǎn)易配置
global:
smtp_smarthost: 'smtp.exmail.qq.com:25' // 配置smtp服務(wù)器用于發(fā)信
smtp_from: xxx@ulopay.com'
smtp_auth_username: xxx@ulopay.com'
smtp_auth_password: 'xxx'
The directory from which notification templates are read.
templates:
- '/etc/alertmanager/template/*.tmpl'
The root route on which each incoming alert enters.
route:
The labels by which incoming alerts are grouped together. For example,
multiple alerts coming in for cluster=A and alertname=LatencyHigh would
be batched into a single group.
group_by: ['alertname', 'cluster', 'service'] //配置組用于后面的一些規(guī)則制定
When a new group of alerts is created by an incoming alert, wait at
least 'group_wait' to send the initial notification.
This way ensures that you get multiple alerts for the same group that start
firing shortly after another are batched together on the first
notification. //新建立的組,在發(fā)信之前等待時(shí)間。 組隊(duì)上車(chē)
group_wait: 5s
When the first notification was sent, wait 'group_interval' to send a batch
of new alerts that started firing for that group.
group_interval: 1m // 一個(gè)組的發(fā)送間隔
If an alert has successfully been sent, wait 'repeat_interval' to
resend them.
repeat_interval: 3h // 重發(fā)的間隔
A default receiver
receiver: zhangm // 默認(rèn)收件人
receivers: //配置所有收件人
- name: 'zhangm'
email_configs:
- to: 'zhangm@ulopay.com'
繪圖展示
啟動(dòng)
安裝Grafana。https://grafana.com/
下載 grafana.tar.gz 包
解壓
進(jìn)入bin目錄
nohup ./grafana-server >/dev/null 2>&1 &
后臺(tái)啟動(dòng) grafana
配置
更改端口 conf 目錄下的 default.ini http_port 參數(shù)
界面
<v:shape id="圖片_x0020_1" o:spid="_x0000_i1030" type="#_x0000_t75" style="width:414.75pt;height:265.5pt;
visibility:visible;mso-wrap-style:square"><v:imagedata src="file:///C:/Users/ccsou/AppData/Local/Temp/msohtmlclip1/01/clip_image002.png" o:title=""></v:imagedata></v:shape>
賬號(hào)密碼
默認(rèn)賬號(hào):admin 密碼: admin
新增數(shù)據(jù)源
<v:shape id="圖片_x0020_3" o:spid="_x0000_i1029" type="#_x0000_t75" style="width:415.5pt;height:369.75pt;
visibility:visible;mso-wrap-style:square"><v:imagedata src="file:///C:/Users/ccsou/AppData/Local/Temp/msohtmlclip1/01/clip_image003.png" o:title=""></v:imagedata></v:shape>
<v:shape id="圖片_x0020_4" o:spid="_x0000_i1028" type="#_x0000_t75" style="width:415.5pt;height:167.25pt;
visibility:visible;mso-wrap-style:square"><v:imagedata src="file:///C:/Users/ccsou/AppData/Local/Temp/msohtmlclip1/01/clip_image004.png" o:title=""></v:imagedata></v:shape>
<v:shape id="圖片_x0020_5" o:spid="_x0000_i1027" type="#_x0000_t75" style="width:415.5pt;height:345pt;
visibility:visible;mso-wrap-style:square"><v:imagedata src="file:///C:/Users/ccsou/AppData/Local/Temp/msohtmlclip1/01/clip_image005.png" o:title=""></v:imagedata></v:shape>
<v:shape id="圖片_x0020_6" o:spid="_x0000_i1026" type="#_x0000_t75" style="width:415.5pt;height:207pt;
visibility:visible;mso-wrap-style:square"><v:imagedata src="file:///C:/Users/ccsou/AppData/Local/Temp/msohtmlclip1/01/clip_image006.png" o:title=""></v:imagedata></v:shape>
<v:shape id="圖片_x0020_7" o:spid="_x0000_i1025" type="#_x0000_t75" style="width:414.75pt;height:221.25pt;
visibility:visible;mso-wrap-style:square"><v:imagedata src="file:///C:/Users/ccsou/AppData/Local/Temp/msohtmlclip1/01/clip_image007.png" o:title=""></v:imagedata></v:shape>
集成
集成相關(guān)參考 [[Prometheus官方示例]] [Play集成 Prometheus] [Spring集成Prometheus]
參考文獻(xiàn)
[Prometheus入門(mén)] (http://www.10tiao.com/html/357/201705/2247485232/1.html)
[Prometheus進(jìn)階] (http://www.10tiao.com/html/357/201705/2247485249/1.html)