概述
我們現在所處的生產環境是一個集Nodejs, Go, Java, Ruby, Scala等多種語言程序的混合場景.Twitter的Finagle框架, 是一個基于Thrift協議的RPC框架,其中Zipkin是針對Finagle框架的一個基于Thrift協議的RPC調用鏈跟蹤的工具,可搜集各服務調用數據,并提供分析查詢展示功能。幫助識別分布式RPC調用鏈里,哪一個調用比較耗時,性能有問題,以及是否有異常等,使得診斷分布式系統性能成為可能。
基本概念
Trace
一次服務調用追蹤鏈路,由一組Span組成。需在web總入口處生成TraceID,并確保在當前請求上下文里能訪問到
Annotation
表示某個時間點發生的Event
-
Event類型
cs:Client Send 請求 sr:Server Receive到請求 ss:Server 處理完成、并Send Response cr:Client Receive 到響應
什么時候生成
客戶端發送Request、接受到Response、服務器端接受到Request、發送 Response時生成。Annotation屬于某個Span,需把新生成的Annotation添加到當前上下文里Span的annotations數組里-
thrift數據結構
/** * Associates an event that explains latency with a timestamp. * * Unlike log statements, annotations are often codes: for example "sr". */ struct Annotation { /** * Microseconds from epoch. * * This value should use the most precise value possible. For example, * gettimeofday or syncing nanoTime against a tick of currentTimeMillis. */ 1: i64 timestamp /** * Usually a short tag indicating an event, like "sr" or "finagle.retry". */ 2: string value /** * The host that recorded the value, primarily for query by service name. */ 3: optional Endpoint host // don't reuse 4: optional i32 OBSOLETE_duration // how long did the operation take? microseconds }
BinaryAnnotation
存放用戶自定義信息,比如:sessionID、userID、userIP、異常等
什么時候生成?
在任意需要記錄自定義跟蹤信息時都可生成。比如:異常、SessionID等。如Annotation一樣,BinaryAnnotation也屬于某個Span。需把新生成的BinaryAnnotation,添加到當前上下文里Span的binary_annotations數組.-
Thrift數據結構
/** * Binary annotations are tags applied to a Span to give it context. For * example, a binary annotation of "http.uri" could the path to a resource in a * RPC call. * * Binary annotations of type STRING are always queryable, though more a * historical implementation detail than a structural concern. * * Binary annotations can repeat, and vary on the host. Similar to Annotation, * the host indicates who logged the event. This allows you to tell the * difference between the client and server side of the same key. For example, * the key "http.uri" might be different on the client and server side due to * rewriting, like "/api/v1/myresource" vs "/myresource. Via the host field, * you can see the different points of view, which often help in debugging. */ struct BinaryAnnotation { /** * Name used to lookup spans, such as "http.uri" or "finagle.version". */ 1: string key, /** * Serialized thrift bytes, in TBinaryProtocol format. * * For legacy reasons, byte order is big-endian. See THRIFT-3217. */ 2: binary value, /** * The thrift type of value, most often STRING. * * annotation_type shouldn't vary for the same key. */ 3: AnnotationType annotation_type, /** * The host that recorded value, allowing query by service name or address. * * There are two exceptions: when key is "ca" or "sa", this is the source or * destination of an RPC. This exception allows zipkin to display network * context of uninstrumented services, such as browsers or databases. */ 4: optional Endpoint host }
-
AnnotationType 結構
/** * A subset of thrift base types, except BYTES. */ enum AnnotationType { BOOL, BYTES,I16, I32, I64, DOUBLE, STRING }
Span
表示一次完整RPC調用,是由一組Annotation和BinaryAnnotation組成。是追蹤服務調用的基本結構,多span形成樹形結構組合成一次Trace追蹤記錄。Span是有父子關系的,比如:Client A、Client A -> B、B ->C、C -> D、分別會產生4個Span。Client A接收到請求會時生成一個Span A、Client A -> B發請求時會再生成一個Span A-B,并且Span A是 Span A-B的父節點
-
什么時候生成
- 服務接受到 Request時,若當前Request沒有關聯任何Span,便生成一個Span,包括:Span ID、TraceID
- 向下游服務發送Request時,需生成一個Span,并把新生成的Span的父節點設置成上一步生成的Span
-
Thrift結構
/** * A trace is a series of spans (often RPC calls) which form a latency tree. * * Spans are usually created by instrumentation in RPC clients or servers, but * can also represent in-process activity. Annotations in spans are similar to * log statements, and are sometimes created directly by application developers * to indicate events of interest, such as a cache miss. * * The root span is where parent_id = Nil; it usually has the longest duration * in the trace. * * Span identifiers are packed into i64s, but should be treated opaquely. * String encoding is fixed-width lower-hex, to avoid signed interpretation. */ struct Span { /** * Unique 8-byte identifier for a trace, set on all spans within it. */ 1: i64 trace_id /** * Span name in lowercase, rpc method for example. Conventionally, when the * span name isn't known, name = "unknown". */ 3: string name, /** * Unique 8-byte identifier of this span within a trace. A span is uniquely * identified in storage by (trace_id, id). */ 4: i64 id, /** * The parent's Span.id; absent if this the root span in a trace. */ 5: optional i64 parent_id, /** * Associates events that explain latency with a timestamp. Unlike log * statements, annotations are often codes: for example SERVER_RECV("sr"). * Annotations are sorted ascending by timestamp. */ 6: list<Annotation> annotations, /** * Tags a span with context, usually to support query or aggregation. For * example, a binary annotation key could be "http.uri". */ 8: list<BinaryAnnotation> binary_annotations /** * True is a request to store this span even if it overrides sampling policy. */ 9: optional bool debug = 0 /** * Epoch microseconds of the start of this span, absent if this an incomplete * span. * * This value should be set directly by instrumentation, using the most * precise value possible. For example, gettimeofday or syncing nanoTime * against a tick of currentTimeMillis. * * For compatibilty with instrumentation that precede this field, collectors * or span stores can derive this via Annotation.timestamp. * For example, SERVER_RECV.timestamp or CLIENT_SEND.timestamp. * * Timestamp is nullable for input only. Spans without a timestamp cannot be * presented in a timeline: Span stores should not output spans missing a * timestamp. * * There are two known edge-cases where this could be absent: both cases * exist when a collector receives a span in parts and a binary annotation * precedes a timestamp. This is possible when.. * - The span is in-flight (ex not yet received a timestamp) * - The span's start event was lost */ 10: optional i64 timestamp, /** * Measurement in microseconds of the critical path, if known. * * This value should be set directly, as opposed to implicitly via annotation * timestamps. Doing so encourages precision decoupled from problems of * clocks, such as skew or NTP updates causing time to move backwards. * * For compatibility with instrumentation that precede this field, collectors * or span stores can derive this by subtracting Annotation.timestamp. * For example, SERVER_SEND.timestamp - SERVER_RECV.timestamp. * * If this field is persisted as unset, zipkin will continue to work, except * duration query support will be implementation-specific. Similarly, setting * this field non-atomically is implementation-specific. * * This field is i64 vs i32 to support spans longer than 35 minutes. */ 11: optional i64 duration }
服務之間需傳遞的信息
Trace的基本信息需在上下游服務之間傳遞,如下信息是必須的:
- Trace ID:起始(根)服務生成的TraceID
- Span ID:調用下游服務時所生成的Span ID
- Parent Span ID:父Span ID
- Is Sampled:是否需要采樣
- Flags:告訴下游服務,是否是debug Reqeust
Trace Tree組成
一個完整Trace 由一組Span組成,這一組Span必須具有相同的TraceID;Span具有父子關系,處于子節點的Span必須有parent_id,Span由一組 Annotation和BinaryAnnotation組成。整個Trace Tree通過Trace Id、Span ID、parent Span ID串起來的。
其他要求
- Web入口處,需把SessionID、UserID(若登陸)、用戶IP等信息記錄到BinaryAnnotation里
- 關鍵子子調用也需用zipkin追蹤,比如:訂單調用了Mysql,也許把個調用的耗時情況記錄到 Annotation里
- 關鍵出錯日志或者異常也許記錄到BinaryAnnotation里
經過上述三條,用戶任何訪問所引起的后臺服務間調用,完全可以串起來,并形成一顆調用樹。通過調用樹,哪個調用耗時多久,是否有異常等都可清晰定位到。
完整示例
testService(Web服務) -> OrderServ(Thrift) -> StockServ & PayServ(Thrift)。一共有四個服務,testService 調用 OrderServ、OrderServ同時調用 StockServ和PayServ。需生成的Trace信息如下:
- testService收到Http Reqeust時,需在入口處生成TraceID、SpanID,以及一個Span對象,假若叫Span1。
- testService向OrderServ發送 Thrift Request時,需新生成一Span2,并把parent ID設置成Span1的spanID。同事需修改Thrift Header,把Span2的spanID、parent ID、TraceID 傳遞給下游服務。也需生成"cs" Annotation,關聯到span2上;當接受到OrderServ的Response時,再生成"cr" Annotation,也關聯到span2上。
- OrderServ接受到Thrift Request后,從Thrift Header里解析到TraceID、parent ID、 Span ID(span2)、并保留到上下文里。同時生成"sr"Annotaition,并關聯到span2上;當處理完成發送response時,再生成"ss"Annotation,并關聯到span2上。
- OrderServ向StockServ發送 Thrift Request時,需新生成一Span3,并把parentID設置成上一步(Span2)的span ID。Annotation處理如上。
- Order Serv向PayServ發送請求時,新生成一Span4,并把parentID設置Span2的span ID。Annotation處理如上。
總結客戶端要做哪些事情?
Thrift協議擴展(參照Finagle)
- Thrift Server端向Thrift協議Header里增加TraceID、SpanID、ParentID、Flags、Is Sampled
- Thrift Client從Thrift Request里解析出TraceID、SpanID、ParentID、flags、Is Sampled
Trace數據生成
- Web入口處,生成TraceID、SpanID、以及Span對象,并把sessionID、userID、userIP記錄到BinaryAnnotation里.
- 調用下游服務時生成Span對象.
- Thrift客戶端Send Request時,生成"cs" Annotation.
- Thrift客戶端Receive Response時,生成"cr"Annotation.
- Thrift服務器端Receive Reqeust時,生成"sr" Annotation.
- Thrift服務器端Send Response時,生成"ss"Annotation.
- 異常、關鍵系統調用數據也要記錄到Annotation里.
Trace數據傳遞與收集
- 數據刷新不能影響業務性能,刷新失敗更不能影響正常業務邏輯.
- 上下文傳遞
- 入口處生成的TraceID、SpanID需要在當前Request上下文里讀取到
- Ruby 、java、NodeJS上下文傳遞要做到業務代碼透明, Golang單獨處理
- 客戶端生成Trace數據,并寫入到本地log文件
- 由logstash(類似工具)統一搜集日志,并寫入kafka
- 由kafka consumer從kafka讀取數據,并索引到ES里
Zipkin跟蹤數據收集格式定義
我們針對跟蹤數據的收集的統一接入點, 為kafka.所有應用產生的zipkin數據統一發送到Kafka集群中.具體的channel為: EAGLEYE_ZIPKIN_CHANNEL.
Span(記錄每次調用信息)
JSON串范例(格式化):
span{
"app": "app", //所屬應用
"flag": "flag", //cscr標明是調用端,srss標明是被調用端。標識, 一個span數據 ,會有基本的cr, cs, sr, ss四個點. 但是獲取數據時,一般只能兩兩獲取, 所以, 一個span通常會被分割為cr, cs和 sr, ss分別發送
"ip": "192.168.10.100", //ip地址
"mname": "mname", //方法名
"pid": "pid", //進程id
"port": 1000, //端口, 如果是cscr客戶端, 則為0
"psid": "psid", //parent span id
"sid": "sid", //spanId
"sname": "sname", //服務名
"etime": 1449072000000, //結束時間點, 單位到ms,如果是client端,則是cr時間點, 如果是server端,則是ss時間點
"stime": 1449039924512, //開始時間點, 單位到ms, 如果是client端,則是cs時間點, 如果是server端,則是sr時間點
"duration": 32075488, //結束時間點減去開始時間點的, 調用的區間時長, 單位ms
"tid": "tid", //traceId
"timestamp": 1449039924548 //產生的時間點
}
備注: span信息的json,添加'span'頭信息, 在通過kafka做收集時, 通過該頭信息將span信息路由到獨立的es的index中. 每一條span記錄其實是半個span信息, 要么是client端產生的, 要么是server端產生的.
BinaryAnnotation(可以用來傳遞用戶session_id的信息, 也可以傳遞其他業務信息)
-
JSON串范例(格式化)
{ "app": "app", //所屬應用 "ip": "ip", //ip地址,冗余信息 "key": "key", //key, 可以設為存儲用戶session的key, 如果是用來傳遞用戶session信息的, 可以統一約定為: session_id "mname": "mname", //方法名 "pid": "10000", //進程id,冗余信息 "sid": "sid", //spanId "sname": "sname", //服務名 "tid": "tid", //traceId "timestamp": 1449038780194, //產生的時間戳, 長整型, 精確到毫秒 "type": "type", //類型,用來區分是記錄異常的還是業務流程的等等, 默認是'common'即可 "value": "value" //如果是傳遞用戶session信息 ,可以直接寫在該字段中. }
備注:這里將BinaryAnnotation獨立記錄, 并發送到kafka消息隊列, 其通過sid關聯具體的span信息.
java使用zipkin經驗
- 客戶端調用服務端失敗(服務端沒啟動),客戶端的annotations有cs、cr
- 客戶端超時(等待響應超時),服務端的annotations有sr、ss,客戶端的annotations有cs、cr
- 服務端超時,服務端的annotations有timeout和sr,沒有ss、沒有binary_annotation,客戶端的annotations有cs、cr
- 客戶端無法記錄binary_annotation
實際效果
收集了Zipkin產生的RPC調用鏈信息,并給服務治理框架提供跟蹤信息檢索(雙擊某行有問題的Span記錄可以查看某次調用鏈詳情,了解具體的網絡情況和業務執行情況)