在一次會話的創(chuàng)建過程中,需要客戶端首先發(fā)送創(chuàng)建會話請求
,服務(wù)端集群創(chuàng)建會話成功后會將響應(yīng)
發(fā)送給客戶端。
客戶端會話請求
在zookeeper源碼分析(2)-客戶端啟動流程中我們分析了客戶端發(fā)送的第一個請求就是會話創(chuàng)建請求。對于客戶端,請求的通信內(nèi)容都是由ClientCnxn.Packet.createBB
構(gòu)造
static class Packet {
//請求頭,只有創(chuàng)建session的過程不帶這個內(nèi)容
RequestHeader requestHeader;
//響應(yīng)頭,當(dāng)收到服務(wù)端的響應(yīng)后,根據(jù)響應(yīng)內(nèi)容構(gòu)建該變量
ReplyHeader replyHeader;
//請求內(nèi)容
Record request;
//響應(yīng)內(nèi)容
Record response;
//實際發(fā)送請求的通信內(nèi)容
ByteBuffer bb;
/** Client's view of the path (may differ due to chroot) **/
String clientPath;
/** Servers's view of the path (may differ due to chroot) **/
String serverPath;
//請求是否完成
boolean finished;
//請求完成時的異步回調(diào)函數(shù)
AsyncCallback cb;
Object ctx;
//watch注冊器
WatchRegistration watchRegistration;
//客戶端是否僅從服務(wù)端讀取數(shù)據(jù)
public boolean readOnly;
//構(gòu)建請求的發(fā)送內(nèi)容
public void createBB() {
try {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
BinaryOutputArchive boa = BinaryOutputArchive.getArchive(baos);
//1.先寫入數(shù)據(jù)總長度len
boa.writeInt(-1, "len"); // We'll fill this in later
//2.如果請求頭requestHeader不為null,則寫入請求頭的序列化內(nèi)容
if (requestHeader != null) {
requestHeader.serialize(boa, "header");
}
//3.寫入請求內(nèi)容request的序列化內(nèi)容
if (request instanceof ConnectRequest) {
request.serialize(boa, "connect");
// append "am-I-allowed-to-be-readonly" flag
//會話創(chuàng)建請求的話寫入readOnly
boa.writeBool(readOnly, "readOnly");
} else if (request != null) {
request.serialize(boa, "request");
}
baos.close();
this.bb = ByteBuffer.wrap(baos.toByteArray());
//4.重新寫入 "len"的長度
this.bb.putInt(this.bb.capacity() - 4);
this.bb.rewind();
} catch (IOException e) {
LOG.warn("Ignoring unexpected exception", e);
}
}
}
實際上會話創(chuàng)建請求的Packet構(gòu)造為:
//sendThread.primeConnection中
//protocolVersion:協(xié)議版本,默認為0
//lastZxidSeen:客戶端認為的服務(wù)端最大zxid,lastZxid
//timeOut:會話超時時間,sessionTimeout
//sessionId:此時為0
//passwd:會話密碼,此時也沒有值
ConnectRequest conReq = new ConnectRequest(0, lastZxid,sessionTimeout, sessId, sessionPasswd);
//請求體為conReq
Packet packet = new Packet(null, null, conReq, null, null, readOnly);
所以實際的通信內(nèi)容也就是len + connectRequest(protocolVersion+lastZxidSeen+timeOut+sessionId+passwd) + readOnly
發(fā)送完這個連接請求之后客戶端會等待服務(wù)端的響應(yīng)數(shù)據(jù),反序列化響應(yīng)數(shù)據(jù)后重新設(shè)置sessionId等,完成會話創(chuàng)建。
服務(wù)端會話創(chuàng)建過程
首先,會話創(chuàng)建一般是事務(wù)請求,主要可分為請求接收
,會話創(chuàng)建
,預(yù)處理
,事務(wù)處理
,事務(wù)應(yīng)用
和會話響應(yīng)
6個階段。而客戶端選擇連接的服務(wù)端可能是Leader,F(xiàn)ollewer或Observer。Follewer或Observer接收到事務(wù)請求后會將請求交給Leader處理。此處僅分析連接服務(wù)端是Follewer
的情況。
在此之前,需要先了解集群服務(wù)器的請求處理鏈的初始化過程,可參考zookeeper源碼分析(7)-服務(wù)器請求處理鏈的初始化
會話創(chuàng)建服務(wù)端流程如下:
Follewer接收請求
1.I/O層接收客戶端請求
當(dāng)剛接收客戶端的連接請求時,會針對客戶端初始化一個
NIOServerCnxn
實例,負責(zé)來自該客戶端的所有請求。當(dāng)服務(wù)端監(jiān)聽到來自客戶端通道的讀請求時,會最終調(diào)用到NIOServerCnxn.doIO
處理來自該客戶端的讀寫請求。NIOServerCnxn.doIO
void doIO(SelectionKey k) throws InterruptedException {
try {
if (isSocketOpen() == false) {
LOG.warn("trying to do i/o on a null socket for session:0x"
+ Long.toHexString(sessionId));
return;
}
if (k.isReadable()) {
int rc = sock.read(incomingBuffer);
if (rc < 0) {
throw new EndOfStreamException(
"Unable to read additional data from client sessionid 0x"
+ Long.toHexString(sessionId)
+ ", likely client has closed socket");
}
if (incomingBuffer.remaining() == 0) {
boolean isPayload;
if (incomingBuffer == lenBuffer) { // start of next request
incomingBuffer.flip();
isPayload = readLength(k);
incomingBuffer.clear();
} else {
// continuation
isPayload = true;
}
//處理讀數(shù)據(jù)
if (isPayload) { // not the case for 4letterword
readPayload();
}
else {
// four letter words take care
// need not do anything else
return;
}
}
}
....................省略寫請求處理和異常處理..................
}
可以看到會將通道數(shù)據(jù)讀到incomingBuffer中,如果不是四字命令的請求會調(diào)用readPayload
方法。
private void readPayload() throws IOException, InterruptedException {
if (incomingBuffer.remaining() != 0) { // have we read length bytes?
int rc = sock.read(incomingBuffer); // sock is non-blocking, so ok
if (rc < 0) {
throw new EndOfStreamException(
"Unable to read additional data from client sessionid 0x"
+ Long.toHexString(sessionId)
+ ", likely client has closed socket");
}
}
if (incomingBuffer.remaining() == 0) { // have we read length bytes?
packetReceived();
incomingBuffer.flip();
if (!initialized) {
readConnectRequest();
} else {
//客戶端非連接請求
readRequest();
}
lenBuffer.clear();
//重置incomingBuffer ,用來接收下一個讀數(shù)據(jù)
incomingBuffer = lenBuffer;
}
}
如果NIOServerCnxn尚未被初始化!initialized
,則說明此時的請求就是第一個連接請求ConnectRequest
。
2.反序列化ConnectRequest
NIOServerCnxn.readConnectRequest
private void readConnectRequest() throws IOException, InterruptedException {
if (!isZKServerRunning()) {
throw new IOException("ZooKeeperServer not running");
}
zkServer.processConnectRequest(this, incomingBuffer);
//初始化標志設(shè)置為true
initialized = true;
}
調(diào)用 zkServer.processConnectRequest
public void processConnectRequest(ServerCnxn cnxn, ByteBuffer incomingBuffer) throws IOException {
BinaryInputArchive bia = BinaryInputArchive.getArchive(new ByteBufferInputStream(incomingBuffer));
ConnectRequest connReq = new ConnectRequest();
connReq.deserialize(bia, "connect");
boolean readOnly = false;
try {
readOnly = bia.readBool("readOnly");
cnxn.isOldClient = false;
} catch (IOException e) {
// this is ok -- just a packet from an old client which
// doesn't contain readOnly field
LOG.warn("Connection request from old client "
+ cnxn.getRemoteSocketAddress()
+ "; will be dropped if server is in r-o mode");
}
if (!readOnly && this instanceof ReadOnlyZooKeeperServer) {
String msg = "Refusing session request for not-read-only client "
+ cnxn.getRemoteSocketAddress();
LOG.info(msg);
throw new CloseRequestException(msg);
}
if (connReq.getLastZxidSeen() > zkDb.dataTree.lastProcessedZxid) {
String msg = "Refusing session request for client "
+ cnxn.getRemoteSocketAddress()
+ " as it has seen zxid 0x"
+ Long.toHexString(connReq.getLastZxidSeen())
+ " our last zxid is 0x"
+ Long.toHexString(getZKDatabase().getDataTreeLastProcessedZxid())
+ " client must try another server";
LOG.info(msg);
throw new CloseRequestException(msg);
}
int sessionTimeout = connReq.getTimeOut();
byte passwd[] = connReq.getPasswd();
int minSessionTimeout = getMinSessionTimeout();
if (sessionTimeout < minSessionTimeout) {
sessionTimeout = minSessionTimeout;
}
int maxSessionTimeout = getMaxSessionTimeout();
if (sessionTimeout > maxSessionTimeout) {
sessionTimeout = maxSessionTimeout;
}
cnxn.setSessionTimeout(sessionTimeout);
// We don't want to receive any packets until we are sure that the
// session is setup
cnxn.disableRecv();
long sessionId = connReq.getSessionId();
if (sessionId == 0) {
//客戶端第一次連接
LOG.info("Client attempting to establish new session at "
+ cnxn.getRemoteSocketAddress());
createSession(cnxn, passwd, sessionTimeout);
} else {
//客戶端重連
long clientSessionId = connReq.getSessionId();
LOG.info("Client attempting to renew session 0x"
+ Long.toHexString(clientSessionId)
+ " at " + cnxn.getRemoteSocketAddress());
if (serverCnxnFactory != null) {
serverCnxnFactory.closeSession(sessionId);
}
if (secureServerCnxnFactory != null) {
secureServerCnxnFactory.closeSession(sessionId);
}
cnxn.setSessionId(sessionId);
reopenSession(cnxn, sessionId, passwd, sessionTimeout);
}
}
主要流程為:
1.將客戶端序列化數(shù)據(jù)反序列化為 ConnectRequest connReq
對象
2.判斷服務(wù)端是否以ReadOnly模式啟動,此時將不能處理寫相關(guān)請求
3.判斷客戶端客戶端zxid是否比服務(wù)端大,此時將拋異常
4.校驗會話過期時間sessionTimeout
,使其落在minSessionTimeout ~maxSessionTimeout 之間
3.根據(jù)sessionId 是否大于0判斷客戶端是第一次連接還是重練,第一次連接sessionId ==0,此時需要創(chuàng)建Session
createSession
long createSession(ServerCnxn cnxn, byte passwd[], int timeout) {
if (passwd == null) {
// Possible since it's just deserialized from a packet on the wire.
passwd = new byte[0];
}
long sessionId = sessionTracker.createSession(timeout);
Random r = new Random(sessionId ^ superSecret);
r.nextBytes(passwd);
ByteBuffer to = ByteBuffer.allocate(4);
to.putInt(timeout);
cnxn.setSessionId(sessionId);
Request si = new Request(cnxn, sessionId, 0, OpCode.createSession, to, null);
setLocalSessionFlag(si);
submitRequest(si);
return sessionId;
}
- 生成sessionId的方法為
sessionTracker.createSession(timeout)
。在每個服務(wù)器啟動時,都會初始化一個會話管理器sessionTracker,對于Follewer服務(wù)器而言就是LearnerSessionTracker
,同時也會初始化當(dāng)前服務(wù)器的sessionId(基準sessionId),以后每創(chuàng)建一個客戶端連接,它的sessionId只需要在基準sessionId的基礎(chǔ)上遞增就可以。
由于sessionId是zookeeper會話的重要標識,必須保持全局唯一。它的初始化算法為:
SessionTrackerImpl.initializeNextSession
//id 為myid的值
public static long initializeNextSession(long id) {
long nextSid;
nextSid = (Time.currentElapsedTime() << 24) >>> 8;
nextSid = nextSid | (id <<56);
if (nextSid == EphemeralType.CONTAINER_EPHEMERAL_OWNER) {
++nextSid; // this is an unlikely edge case, but check it just in case
}
return nextSid;
}
nextSid即為基準sessionId,由當(dāng)前時間(低56位)+服務(wù)標識構(gòu)成(高8位)
首先獲取當(dāng)前時間的毫秒值Time.currentElapsedTime()
,左移24位保證將有意義位(非0位)移到高位上,無符號右移8位保證最高位為8個0,不會影響高8位的值
然后將服務(wù)id(myid)左移56位移到高8位, 與上面的當(dāng)前時間異或運算nextSid | (id <<56)
,即為基準sessionId。
- 對于follewer此時不會注冊激活會話,會將請求包裝為
Request si
- 交給處理鏈處理
submitRequest
public void submitRequest(Request si) {
if (firstProcessor == null) {
synchronized (this) {
try {
// Since all requests are passed to the request
// processor it should wait for setting up the request
// processor chain. The state will be updated to RUNNING
// after the setup.
while (state == State.INITIAL) {
wait(1000);
}
} catch (InterruptedException e) {
LOG.warn("Unexpected interruption", e);
}
if (firstProcessor == null || state != State.RUNNING) {
throw new RuntimeException("Not started");
}
}
}
try {
touch(si.cnxn);
boolean validpacket = Request.isValid(si.type);
if (validpacket) {
firstProcessor.processRequest(si);
if (si.cnxn != null) {
incInProcess();
}
} else {
LOG.warn("Received packet at server of unknown type " + si.type);
new UnimplementedRequestProcessor().processRequest(si);
}
} catch (MissingSessionException e) {
if (LOG.isDebugEnabled()) {
LOG.debug("Dropping request: " + e.getMessage());
}
} catch (RequestProcessorException e) {
LOG.error("Unable to process request:" + e.getMessage(), e);
}
}
1.將會話放到全局session map中
LearnerSessionTracker.touchSession
public boolean touchSession(long sessionId, int sessionTimeout) {
if (localSessionsEnabled) {
if (localSessionTracker.touchSession(sessionId, sessionTimeout)) {
return true;
}
if (!isGlobalSession(sessionId)) {
return false;
}
}
touchTable.get().put(sessionId, sessionTimeout);
return true;
}
AtomicReference<Map<Long, Integer>> touchTable
負責(zé)存儲當(dāng)前Follewer服務(wù)器的全局session
2.交給請求鏈處理,F(xiàn)ollewer服務(wù)器的第一個請求處理器為FollowerRequestProcessor
FollowerRequestProcessor.processRequest
public void processRequest(Request request) {
if (!finished) {
// Before sending the request, check if the request requires a
// global session and what we have is a local session. If so do
// an upgrade.
//如果當(dāng)前請求是事務(wù)請求,但是會話又是本地會話,此時需要升級會話為全局會話
Request upgradeRequest = null;
try {
upgradeRequest = zks.checkUpgradeSession(request);
} catch (KeeperException ke) {
if (request.getHdr() != null) {
request.getHdr().setType(OpCode.error);
request.setTxn(new ErrorTxn(ke.code().intValue()));
}
request.setException(ke);
LOG.info("Error creating upgrade request", ke);
} catch (IOException ie) {
LOG.error("Unexpected error in upgrade", ie);
}
if (upgradeRequest != null) {
queuedRequests.add(upgradeRequest);
}
queuedRequests.add(request);
}
}
主要邏輯為將請求放到請求存儲隊列queuedRequests
中,run
方法會不斷處理接受到的請求
FollowerRequestProcessor.run
public void run() {
try {
while (!finished) {
Request request = queuedRequests.take();
if (LOG.isTraceEnabled()) {
ZooTrace.logRequest(LOG, ZooTrace.CLIENT_REQUEST_TRACE_MASK,
'F', request, "");
}
if (request == Request.requestOfDeath) {
break;
}
// We want to queue the request to be processed before we submit
// the request to the leader so that we are ready to receive
// the response
nextProcessor.processRequest(request);
// We now ship the request to the leader. As with all
// other quorum operations, sync also follows this code
// path, but different from others, we need to keep track
// of the sync operations this follower has pending, so we
// add it to pendingSyncs.
switch (request.type) {
case OpCode.sync:
zks.pendingSyncs.add(request);
zks.getFollower().request(request);
break;
case OpCode.create:
case OpCode.create2:
case OpCode.createTTL:
case OpCode.createContainer:
case OpCode.delete:
case OpCode.deleteContainer:
case OpCode.setData:
case OpCode.reconfig:
case OpCode.setACL:
case OpCode.multi:
case OpCode.check:
zks.getFollower().request(request);
break;
case OpCode.createSession:
case OpCode.closeSession:
// Don't forward local sessions to the leader.
if (!request.isLocalSession()) {
zks.getFollower().request(request);
}
break;
}
}
} catch (Exception e) {
handleException(this.getName(), e);
}
LOG.info("FollowerRequestProcessor exited loop!");
}
1.將請求交給下一個處理器CommitProcessor,后面分析
2.事務(wù)請求會轉(zhuǎn)交給Leader服務(wù)器處理
因為會話創(chuàng)建請求request.type=OpCode.createSession
,一般不會創(chuàng)建本地會話,會調(diào)用
FollowerZooKeeperServer.getFollower().request
void request(Request request) throws IOException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
DataOutputStream oa = new DataOutputStream(baos);
oa.writeLong(request.sessionId);
oa.writeInt(request.cxid);
oa.writeInt(request.type);
if (request.request != null) {
request.request.rewind();
int len = request.request.remaining();
byte b[] = new byte[len];
request.request.get(b);
request.request.rewind();
oa.write(b);
}
oa.close();
QuorumPacket qp = new QuorumPacket(Leader.REQUEST, -1, baos
.toByteArray(), request.authInfo);
//發(fā)送QuorumPacket qp給leader服務(wù)器,并強刷馬上發(fā)出
writePacket(qp, true);
}
Leader與Follewer同步時會建立socket通道,leader通過LearnerHandler不斷接收來自Follewer的通信請求,當(dāng)request.type=Leader.REQUEST
,會交給處理鏈處理
LearnerHandler.run
public void run() {
··········省略不相關(guān)代碼··········
while (true) {
··········省略不相關(guān)代碼··········
case Leader.REQUEST:
bb = ByteBuffer.wrap(qp.getData());
sessionId = bb.getLong();
cxid = bb.getInt();
type = bb.getInt();
bb = bb.slice();
Request si;
if(type == OpCode.sync){
si = new LearnerSyncRequest(this, sessionId, cxid, type, bb, qp.getAuthinfo());
} else {
si = new Request(null, sessionId, cxid, type, bb, qp.getAuthinfo());
}
si.setOwner(this);
leader.zk.submitLearnerRequest(si);
break;
·············不相關(guān)代碼············
}
}
public void submitLearnerRequest(Request request) {
//因為會話創(chuàng)建請求已經(jīng)經(jīng)過Follewer的校驗和會話升級(如果需要的話),leader的請求鏈此時也
//已經(jīng)初始化完成,所以可直接遞交到prepRequestProcessor處理器
prepRequestProcessor.processRequest(request);
}
PrepRequestProcessor.processRequest
public void processRequest(Request request) {
submittedRequests.add(request);
}
將請求放入請求存儲隊列submittedRequests
,通過run
方法不斷處理
PrepRequestProcessor.run
public void run() {
try {
while (true) {
Request request = submittedRequests.take();
long traceMask = ZooTrace.CLIENT_REQUEST_TRACE_MASK;
if (request.type == OpCode.ping) {
traceMask = ZooTrace.CLIENT_PING_TRACE_MASK;
}
if (Request.requestOfDeath == request) {
break;
}
pRequest(request);
}
}
··········省略異常和日志處理··········
}
調(diào)用pRequest(request);
對會話請求處理
protected void pRequest(Request request) throws RequestProcessorException {
// LOG.info("Prep>>> cxid = " + request.cxid + " type = " +
// request.type + " id = 0x" + Long.toHexString(request.sessionId));
request.setHdr(null);
request.setTxn(null);
try {
switch (request.type) {
·············省略不相關(guān)請求類型····························
//create/close session don't require request record
case OpCode.createSession:
case OpCode.closeSession:
if (!request.isLocalSession()) {
pRequest2Txn(request.type, zks.getNextZxid(), request,
null, true);
}
break;
default:
LOG.warn("unknown type " + request.type);
break;
}
}
··········省略異常和日志處理··········
}
request.zxid = zks.getZxid();
nextProcessor.processRequest(request);
}
流程主要為:
1.會話請求預(yù)處理
protected void pRequest2Txn(int type, long zxid, Request request,
Record record, boolean deserialize)
throws KeeperException, IOException, RequestProcessorException
{
request.setHdr(new TxnHeader(request.sessionId, request.cxid, zxid,
Time.currentWallTime(), type));
switch (type) {
·············省略不相關(guān)請求類型····························
case OpCode.createSession:
request.request.rewind();
int to = request.request.getInt();
request.setTxn(new CreateSessionTxn(to));
request.request.rewind();
if (request.isLocalSession()) {
// This will add to local session tracker if it is enabled
zks.sessionTracker.addSession(request.sessionId, to);
} else {
// Explicitly add to global session if the flag is not set
zks.sessionTracker.addGlobalSession(request.sessionId, to);
}
zks.setOwner(request.sessionId, request.getOwner());
break;
default:
LOG.warn("unknown type " + type);
break;
}
}
主要流程為:
- 設(shè)置了請求事務(wù)頭
TxnHeader
包含以下屬性
//客戶端sessionId,用來唯一標示該請求所屬的客戶端
private long clientId;
//客戶端的請求序列號
private int cxid;
//該事務(wù)請求對應(yīng)的事務(wù)ZXID
private long zxid;
//Leader服務(wù)器開始處理該事務(wù)請求的時間
private long time;
//事務(wù)請求的類型,如OpCode.createSession
private int type;
其中zxid是基于服務(wù)器同步時確定的zxid,即基準zxid,不斷自增獲得當(dāng)前事務(wù)的zxid。
- 設(shè)置請求事務(wù)體
Txn
為CreateSessionTxn
包含了會話過期時間
private int timeOut;
- 注冊并激活會話
將會話加入到全局會話中,交由會話管理器管理。
LeaderZooKeeperServer.sessionTracker.addGlobalSession
public boolean addGlobalSession(long sessionId, int sessionTimeout) {
boolean added =
globalSessionTracker.addSession(sessionId, sessionTimeout);
return added;
}
SessionTrackerImpl.addSession
public synchronized boolean addSession(long id, int sessionTimeout) {
sessionsWithTimeout.put(id, sessionTimeout);
boolean added = false;
SessionImpl session = sessionsById.get(id);
if (session == null){
session = new SessionImpl(id, sessionTimeout);
}
// findbugs2.0.3 complains about get after put.
// long term strategy would be use computeIfAbsent after JDK 1.8
SessionImpl existedSession = sessionsById.putIfAbsent(id, session);
if (existedSession != null) {
session = existedSession;
} else {
added = true;
LOG.debug("Adding session 0x" + Long.toHexString(id));
}
updateSessionExpiry(session, sessionTimeout);
return added;
}
保存會話到SessionTrackerImpl.sessionsWithTimeout和SessionTrackerImpl.sessionsById中,并放入SessionTrackerImpl.sessionExpiryQueue
中,激活管理會話。參考會話管理
2.將請求交給下一個處理器ProposalRequestProcessor
,進行事務(wù)處理
public void processRequest(Request request) throws RequestProcessorException {
if (request instanceof LearnerSyncRequest){
zks.getLeader().processSync((LearnerSyncRequest)request);
} else {
//請求傳遞,Commit
nextProcessor.processRequest(request);
if (request.getHdr() != null) {
// We need to sync and get consensus on any transactions
try {
//Proposal提議
zks.getLeader().propose(request);
} catch (XidRolloverException e) {
throw new RequestProcessorException(e.getMessage(), e);
}
//用于事務(wù)日志的存儲,Sync
syncProcessor.processRequest(request);
}
}
}
包括三個流程:
1.Propasal流程
zookeeper中每個事務(wù)請求都需要集群中參與選舉的過半機器認可才能應(yīng)用到內(nèi)存數(shù)據(jù)庫中,這個投票和統(tǒng)計的過程就是Propasal流程
2.Sync流程
參與投票的服務(wù)器都需要將事務(wù)記錄到事務(wù)日志中,完成了事務(wù)日志的記錄會發(fā)送ACK給Leader,表示響應(yīng)投票
3.Commit流程
每個事務(wù)請求都需要在所有服務(wù)器上提交,當(dāng)投票過半后,會通知所有服務(wù)器提交請求。
事務(wù)流程圖如下:
commit流程
ProposalRequestProcessor將請求傳遞給CommitProcessor
,用于控制事務(wù)的提交。
CommitProcessor. processRequest
public void processRequest(Request request) {
if (stopped) {
return;
}
if (LOG.isDebugEnabled()) {
LOG.debug("Processing request:: " + request);
}
queuedRequests.add(request);
wakeup();
}
將請求放到請求存儲隊列queuedRequests
中,并喚醒CommitProcessor線程,說明有請求過來了,可以執(zhí)行run
方法繼續(xù)工作。
CommitProcessor. run
public void run() {
try {
//每次循環(huán)從queuedRequests隊列中取出所有請求進行處理,記錄此時的請求數(shù)。只先處理這一批,
//防止直接從隊列中取出導(dǎo)致不停的讀,而無法對請求進行提交下一步處理
int requestsToProcess = 0;
boolean commitIsWaiting = false;
do {
//follewer和leader服務(wù)器均共用該提交處理器
//對于follewer,接收到客戶端的請求后會調(diào)用到此處;接收到leader的commit通知后,也會循環(huán)到此處,此時committedRequests隊列不為空。
commitIsWaiting = !committedRequests.isEmpty();
requestsToProcess = queuedRequests.size();
// Avoid sync if we have something to do
if (requestsToProcess == 0 && !commitIsWaiting){
// Waiting for requests to process
synchronized (this) {
while (!stopped && requestsToProcess == 0
&& !commitIsWaiting) {
//等待請求的處理
wait();
//說明此時有可提交的請求或有新的請求進來了
commitIsWaiting = !committedRequests.isEmpty();
requestsToProcess = queuedRequests.size();
}
}
}
/*
* Processing up to requestsToProcess requests from the incoming
* queue (queuedRequests), possibly less if a committed request
* is present along with a pending local write. After the loop,
* we process one committed request if commitIsWaiting.
*/
Request request = null;
while (!stopped && requestsToProcess > 0
&& (request = queuedRequests.poll()) != null) {
requestsToProcess--;
if (needCommit(request)
|| pendingRequests.containsKey(request.sessionId)) {
// 事務(wù)請求會放入pendingRequests中,按照當(dāng)前客戶端sessionid進行存儲
LinkedList<Request> requests = pendingRequests
.get(request.sessionId);
if (requests == null) {
requests = new LinkedList<Request>();
pendingRequests.put(request.sessionId, requests);
}
requests.addLast(request);
}
else {
//非事務(wù)請求直接交給nextProcessor處理器處理
sendToNextProcessor(request);
}
//判斷是否有可提交的請求
if (!pendingRequests.isEmpty() && !committedRequests.isEmpty()){
/*
* We set commitIsWaiting so that we won't check
* committedRequests again.
*/
//如果有,則停止讀取queuedRequests,處理可提交請求
commitIsWaiting = true;
break;
}
}
// Handle a single committed request
if (commitIsWaiting && !stopped){
//事務(wù)請求需要等待前面所有的請求處理完畢
waitForEmptyPool();
if (stopped){
return;
}
// Process committed head
if ((request = committedRequests.poll()) == null) {
throw new IOException("Error: committed head is null");
}
/*
* Check if request is pending, if so, update it with the committed info
*/
LinkedList<Request> sessionQueue = pendingRequests
.get(request.sessionId);
if (sessionQueue != null) {
// If session queue != null, then it is also not empty.
Request topPending = sessionQueue.poll();
if (request.cxid != topPending.cxid) {
//如果會話從A服務(wù)器移到了B服務(wù)器,但是原來A的事務(wù)請求也提交到了leader服務(wù)器,
//此時會處理這個請求,但可能這個請求的cxid小于等待隊列中頭部請求的cxid.
LOG.warn("Got request " + request +
" but we are expecting request " + topPending);
sessionQueue.addFirst(topPending);
} else {
//正常情況
topPending.setHdr(request.getHdr());
topPending.setTxn(request.getTxn());
topPending.zxid = request.zxid;
request = topPending;
}
}
sendToNextProcessor(request);
//等待當(dāng)前事務(wù)請求執(zhí)行完畢
waitForEmptyPool();
/*
* Process following reads if any, remove session queue if
* empty.
*/
if (sessionQueue != null) {
while (!stopped && !sessionQueue.isEmpty()
&& !needCommit(sessionQueue.peek())) {
sendToNextProcessor(sessionQueue.poll());
}
// Remove empty queues
if (sessionQueue.isEmpty()) {
pendingRequests.remove(request.sessionId);
}
}
}
} while (!stoppedMainLoop);
} catch (Throwable e) {
handleException(this.getName(), e);
}
LOG.info("CommitProcessor exited loop!");
}
大致流程為:
不斷從請求隊列中取出請求,非事務(wù)請求直接通過調(diào)用nextProcessor,封裝為一個任務(wù),移交線程池workerPool
處理。
事務(wù)請求需要先放入等待隊列pendingRequests
中,等待commit通知。可以commit的時候?qū)⑵浞诺?code>committedRequests隊列中,然后調(diào)用nextProcessor,封裝為一個任務(wù),等待線程池處理完當(dāng)前所有任務(wù),再處理這個事務(wù)請求任務(wù)(類似讀寫鎖)。
假設(shè)集群只有這一個客戶端會話請求,那么:
接收該請求的Follewer
的CommitProcessor處理流程大致為:
請求從queuedRequests 隊列移交到pendingRequests隊列,然后線程處于wait狀態(tài),直到收到leader的commit通知,committedRequests不為空,繼續(xù)進行處理鏈的處理。leader
的CommitProcessor處理流程基本同上。
nextProcessor為ToBeAppliedProcessor
,維護了一個可提交的事務(wù)請求隊列leader.toBeApplied
,此時會將會話請求從該隊列中剔除,并移交給它的下一處理器FinalRequestProcessor
處理。
public void processRequest(Request request) {
ProcessTxnResult rc = null;
synchronized (zks.outstandingChanges) {
// Need to process local session requests
rc = zks.processTxn(request);
// request.hdr is set for write requests, which are the only ones
// that add to outstandingChanges.
if (request.getHdr() != null) {
TxnHeader hdr = request.getHdr();
Record txn = request.getTxn();
long zxid = hdr.getZxid();
while (!zks.outstandingChanges.isEmpty()
&& zks.outstandingChanges.peek().zxid <= zxid) {
ChangeRecord cr = zks.outstandingChanges.remove();
if (cr.zxid < zxid) {
LOG.warn("Zxid outstanding " + cr.zxid
+ " is less than current " + zxid);
}
if (zks.outstandingChangesForPath.get(cr.path) == cr) {
zks.outstandingChangesForPath.remove(cr.path);
}
}
}
// do not add non quorum packets to the queue.
if (request.isQuorum()) {
zks.getZKDatabase().addCommittedProposal(request);
}
}
}
//此處表示是其他服務(wù)器遞交給leader的request類型請求,直接返回不會構(gòu)造響應(yīng)
if (request.cnxn == null) {
return;
}
ServerCnxn cnxn = request.cnxn;
String lastOp = "NA";
zks.decInProcess();
Code err = Code.OK;
Record rsp = null;
switch (request.type) {
case OpCode.createSession: {
zks.serverStats().updateLatency(request.createTime);
lastOp = "SESS";
cnxn.updateStatsForResponse(request.cxid, request.zxid, lastOp,
request.createTime, Time.currentElapsedTime());
zks.finishSessionInit(request.cnxn, true);
return;
}
}
}
long lastZxid = zks.getZKDatabase().getDataTreeLastProcessedZxid();
ReplyHeader hdr =
new ReplyHeader(request.cxid, lastZxid, err.intValue());
zks.serverStats().updateLatency(request.createTime);
cnxn.updateStatsForResponse(request.cxid, lastZxid, lastOp,
request.createTime, Time.currentElapsedTime());
·················省略無關(guān)代碼和異常處理·············
}
主要流程是將會話事務(wù)應(yīng)用到內(nèi)存數(shù)據(jù)庫ZKDatabase,并更新服務(wù)器相關(guān)的事務(wù)信息:zxid等;會話管理信息。并封裝響應(yīng)信息給客戶端。
Propasal流程 && Sync流程
Leader.propose
public Proposal propose(Request request) throws XidRolloverException {
if ((request.zxid & 0xffffffffL) == 0xffffffffL) {
String msg =
"zxid lower 32 bits have rolled over, forcing re-election, and therefore new epoch start";
shutdown(msg);
throw new XidRolloverException(msg);
}
byte[] data = SerializeUtils.serializeRequest(request);
proposalStats.setLastProposalSize(data.length);
QuorumPacket pp = new QuorumPacket(Leader.PROPOSAL, request.zxid, data, null);
Proposal p = new Proposal();
p.packet = pp;
p.request = request;
synchronized(this) {
p.addQuorumVerifier(self.getQuorumVerifier());
if (request.getHdr().getType() == OpCode.reconfig){
self.setLastSeenQuorumVerifier(request.qv, true);
}
if (self.getQuorumVerifier().getVersion()<self.getLastSeenQuorumVerifier().getVersion()) {
p.addQuorumVerifier(self.getLastSeenQuorumVerifier());
}
if (LOG.isDebugEnabled()) {
LOG.debug("Proposing:: " + request);
}
lastProposed = p.packet.getZxid();
// 記錄最新的Proposal
outstandingProposals.put(lastProposed, p);
sendPacket(pp);
}
return p;
}
主要流程為:
將請求封裝QuorumPacket發(fā)送給有選舉資格的Follewer服務(wù)器,發(fā)送信息包括:包類型Leader.PROPOSAL
,事務(wù)zxid
,會話請求的內(nèi)容
。
對于Follewer服務(wù)器,當(dāng)接收到這個PROPOSAL提議
時,會進行事務(wù)日志的同步。
Follewer.followLeader
void followLeader() throws InterruptedException {
QuorumPacket qp = new QuorumPacket();
while (this.isRunning()) {
readPacket(qp);
processPacket(qp);
}
··········省略無關(guān)代碼··········
}
protected void processPacket(QuorumPacket qp) throws Exception{
switch (qp.getType()) {
case Leader.PING:
ping(qp);
break;
case Leader.PROPOSAL:
TxnHeader hdr = new TxnHeader();
Record txn = SerializeUtils.deserializeTxn(qp.getData(), hdr);
if (hdr.getZxid() != lastQueued + 1) {
LOG.warn("Got zxid 0x"
+ Long.toHexString(hdr.getZxid())
+ " expected 0x"
+ Long.toHexString(lastQueued + 1));
}
lastQueued = hdr.getZxid();
if (hdr.getType() == OpCode.reconfig){
SetDataTxn setDataTxn = (SetDataTxn) txn;
QuorumVerifier qv = self.configFromString(new String(setDataTxn.getData()));
self.setLastSeenQuorumVerifier(qv, true);
}
//事務(wù)日志的同步
fzk.logRequest(hdr, txn);
break;
case Leader.COMMIT:
//事務(wù)日志的提交
fzk.commit(qp.getZxid());
break;
··········省略無關(guān)代碼··········
}
調(diào)用FollowerZooKeeperServer.logRequest
進行同步
public void logRequest(TxnHeader hdr, Record txn) {
Request request = new Request(hdr.getClientId(), hdr.getCxid(), hdr.getType(), hdr, txn, hdr.getZxid());
if ((request.zxid & 0xffffffffL) != 0) {
pendingTxns.add(request);
}
syncProcessor.processRequest(request);
}
由SyncRequestProcessor
開始同步請求處理鏈的處理, syncProcessor.processRequest
將請求放入請求存儲隊列queuedRequests
,然后處理器線程調(diào)用run
方法不斷處理同步請求
SyncRequestProcessor.run
public void run() {
try {
int logCount = 0;
//生成快照的隨機次數(shù),防止多個Follewer同時進行快照
int randRoll = r.nextInt(snapCount/2);
while (true) {
Request si = null;
if (toFlush.isEmpty()) {
si = queuedRequests.take();
} else {
si = queuedRequests.poll();
if (si == null) {
//刷新緩存
flush(toFlush);
continue;
}
}
if (si == requestOfDeath) {
break;
}
if (si != null) {
//寫入事務(wù)日志
if (zks.getZKDatabase().append(si)) {
logCount++;
if (logCount > (snapCount / 2 + randRoll)) {
//快照處理
randRoll = r.nextInt(snapCount/2);
// roll the log
zks.getZKDatabase().rollLog();
// take a snapshot
if (snapInProcess != null && snapInProcess.isAlive()) {
LOG.warn("Too busy to snap, skipping");
} else {
//異步快照
snapInProcess = new ZooKeeperThread("Snapshot Thread") {
public void run() {
try {
zks.takeSnapshot();
} catch(Exception e) {
LOG.warn("Unexpected exception", e);
}
}
};
snapInProcess.start();
}
logCount = 0;
}
} else if (toFlush.isEmpty()) {
// optimization for read heavy workloads
// iff this is a read, and there are no pending
// flushes (writes), then just pass this to the next
// processor
if (nextProcessor != null) {
nextProcessor.processRequest(si);
if (nextProcessor instanceof Flushable) {
((Flushable)nextProcessor).flush();
}
}
continue;
}
//
toFlush.add(si);
if (toFlush.size() > 1000) {
flush(toFlush);
}
}
}
} catch (Throwable t) {
handleException(this.getName(), t);
} finally{
running = false;
}
LOG.info("SyncRequestProcessor exited!");
}
流程為將請求寫入事務(wù)日志,并添加到toFlush
隊列中,然后調(diào)用flush
方法將日志刷新到磁盤上
private void flush(LinkedList<Request> toFlush)
throws IOException, RequestProcessorException
{
if (toFlush.isEmpty())
return;
//刷新到磁盤
zks.getZKDatabase().commit();
while (!toFlush.isEmpty()) {
Request i = toFlush.remove();
if (nextProcessor != null) {
//調(diào)用nextProcessor響應(yīng)投票
nextProcessor.processRequest(i);
}
}
if (nextProcessor != null && nextProcessor instanceof Flushable) {
//強刷
((Flushable)nextProcessor).flush();
}
}
流程為:將事務(wù)日志刷新到磁盤上,調(diào)用nextProcessor響應(yīng)投票,對于Follewer服務(wù)器,nextProcessor為SendAckRequestProcessor
,實現(xiàn)了Flushable
接口,會將響應(yīng)強刷出去給Leader服務(wù)器。
SendAckRequestProcessor.processRequest:
發(fā)送ACK響應(yīng)給Leader
public void processRequest(Request si) {
if(si.type != OpCode.sync){
QuorumPacket qp = new QuorumPacket(Leader.ACK, si.getHdr().getZxid(), null,
null);
try {
learner.writePacket(qp, false);
} catch (IOException e) {
LOG.warn("Closing connection to leader, exception during packet send", e);
try {
if (!learner.sock.isClosed()) {
learner.sock.close();
}
} catch (IOException e1) {
// Nothing to do, we are shutting things down, so an exception here is irrelevant
LOG.debug("Ignoring error closing the connection", e1);
}
}
}
}
同理,對于事務(wù)請求,Leader服務(wù)器自身的同步也是調(diào)用SyncRequestProcessor.processRequest
,與Follewer服務(wù)器同步的區(qū)別就是leader中的nextProcessor為AckRequestProcessor
,只需本地響應(yīng)投票即可。
AckRequestProcessor.processRequest
public void processRequest(Request request) {
QuorumPeer self = leader.self;
if(self != null)
leader.processAck(self.getId(), request.zxid, null);
else
LOG.error("Null QuorumPeer");
}
Leader本地響應(yīng)投票調(diào)用leader.processAck
進行投票統(tǒng)計。
當(dāng)Leader接收到Follewer服務(wù)器的投票ACK后的處理為:
LearnerHandler.run
public void run() {
···········省略不相關(guān)代碼·················
while (true) {
qp = new QuorumPacket();
ia.readRecord(qp, "packet");
long traceMask = ZooTrace.SERVER_PACKET_TRACE_MASK;
if (qp.getType() == Leader.PING) {
traceMask = ZooTrace.SERVER_PING_TRACE_MASK;
}
if (LOG.isTraceEnabled()) {
ZooTrace.logQuorumPacket(LOG, traceMask, 'i', qp);
}
tickOfNextAckDeadline = leader.self.tick.get() + leader.self.syncLimit;
ByteBuffer bb;
long sessionId;
int cxid;
int type;
switch (qp.getType()) {
case Leader.ACK:
if (this.learnerType == LearnerType.OBSERVER) {
if (LOG.isDebugEnabled()) {
LOG.debug("Received ACK from Observer " + this.sid);
}
}
syncLimitCheck.updateAck(qp.getZxid());
leader.processAck(this.sid, qp.getZxid(), sock.getLocalSocketAddress());
break;
···········省略不相關(guān)代碼·················
}
}
可以看到同樣會調(diào)用leader.processAck
進行投票統(tǒng)計.
leader.processAck
synchronized public void processAck(long sid, long zxid, SocketAddress followerAddr) {
if (!allowedToCommit) return; // last op committed was a leader change - from now on
// the new leader should commit
if (outstandingProposals.size() == 0) {
if (LOG.isDebugEnabled()) {
LOG.debug("outstanding is 0");
}
return;
}
if (lastCommitted >= zxid) {
if (LOG.isDebugEnabled()) {
LOG.debug("proposal has already been committed, pzxid: 0x{} zxid: 0x{}",
Long.toHexString(lastCommitted), Long.toHexString(zxid));
}
// The proposal has already been committed
return;
}
Proposal p = outstandingProposals.get(zxid);
if (p == null) {
LOG.warn("Trying to commit future proposal: zxid 0x{} from {}",
Long.toHexString(zxid), followerAddr);
return;
}
p.addAck(sid);
//如果獲得超過一半服務(wù)器的投票響應(yīng),則可以提交請求
boolean hasCommitted = tryToCommit(p, zxid, followerAddr);
if (hasCommitted && p.request!=null && p.request.getHdr().getType() == OpCode.reconfig){
long curZxid = zxid;
while (allowedToCommit && hasCommitted && p!=null){
curZxid++;
p = outstandingProposals.get(curZxid);
if (p !=null) hasCommitted = tryToCommit(p, curZxid, null);
}
}
}
流程主要為:從outstandingProposals
中取出當(dāng)前事務(wù)的Proposal,加入該響應(yīng)投票并統(tǒng)計投票是否超過一半,如果超過一半則調(diào)用tryToCommit
提交Proposal
``
synchronized public boolean tryToCommit(Proposal p, long zxid, SocketAddress followerAddr) {
if (outstandingProposals.containsKey(zxid - 1)) return false;
// in order to be committed, a proposal must be accepted by a quorum.
//
// getting a quorum from all necessary configurations.
if (!p.hasAllQuorums()) {
return false;
}
// commit proposals in order
if (zxid != lastCommitted+1) {
LOG.warn("Commiting zxid 0x" + Long.toHexString(zxid)
+ " from " + followerAddr + " not first!");
LOG.warn("First is "
+ (lastCommitted+1));
}
outstandingProposals.remove(zxid);
if (p.request != null) {
//加入可提交隊列
toBeApplied.add(p);
}
if (p.request == null) {
LOG.warn("Going to commmit null: " + p);
} else if (p.request.getHdr().getType() == OpCode.reconfig) {
LOG.debug("Committing a reconfiguration! " + outstandingProposals.size());
QuorumVerifier newQV = p.qvAcksetPairs.get(p.qvAcksetPairs.size()-1).getQuorumVerifier();
self.processReconfig(newQV, designatedLeader, zk.getZxid(), true);
if (designatedLeader != self.getId()) {
allowedToCommit = false;
}
} else {
//發(fā)送Commit給Follewer,通知提交請求
commit(zxid);
//發(fā)送給Observer服務(wù)器,寫入事務(wù)請求數(shù)據(jù)
inform(p);
}
//本地提交請求
zk.commitProcessor.commit(p.request);
if(pendingSyncs.containsKey(zxid)){
for(LearnerSyncRequest r: pendingSyncs.remove(zxid)) {
sendSync(r);
}
}
return true;
}
流程主要為:
1.通知所有Follewer可提交請求
public void commit(long zxid) {
synchronized(this){
lastCommitted = zxid;
}
QuorumPacket qp = new QuorumPacket(Leader.COMMIT, zxid, null, null);
sendPacket(qp);
}
發(fā)送Leader.COMMIT
給所有Follewer,只帶有zxid,因為請求數(shù)據(jù)內(nèi)容在Proposal時已發(fā)送
2.通知所有Observer可提交請求
public void inform(Proposal proposal) {
QuorumPacket qp = new QuorumPacket(Leader.INFORM, proposal.request.zxid,
proposal.packet.getData(), null);
sendObserverPacket(qp);
}
發(fā)送Leader.INFORM
給所有Observer,帶有zxid和請求數(shù)據(jù)內(nèi)容。
3.通知本地Leader可提交事務(wù)請求
CommitProcessor.commit
public void commit(Request request) {
if (stopped || request == null) {
return;
}
if (LOG.isDebugEnabled()) {
LOG.debug("Committing request:: " + request);
}
committedRequests.add(request);
wakeup();
}
也就是將請求加入到CommitProcessor.committedRequests
隊列中,此時會將請求交給下一處理器ToBeAppliedProcessor
進行處理。
感謝您的閱讀,我是Monica23334 || Monica2333 。立下每周寫一篇原創(chuàng)文章flag的小姐姐,關(guān)注我并期待打臉吧~