一、Client層總體介紹
在正式介紹Client層源碼前,我們先來看一下如何在client端與server端通信,demo代碼如下:
TaskClient taskClient = new TaskClient();
taskClient.setRootURI("http://localhost:8080/api/"); //Point this to the server API
int threadCount = 2; //number of threads used to execute workers. To avoid starvation, should be same or more than number of workers
Worker worker1 = new OrderWorker("order");
Worker worker2 = new PaymentWorker("payment");
//Create WorkflowTaskCoordinator
WorkflowTaskCoordinator.Builder builder = new WorkflowTaskCoordinator.Builder();
WorkflowTaskCoordinator coordinator = builder.withWorkers(worker1, worker2).withThreadCount(threadCount).withTaskClient(taskClient).build();
//Start for polling and execution of the tasks
coordinator.init();
代碼說明:
1、第一步需要創建TaskClient類并設置server端的API URL路徑以便客戶端能夠與服務端通信。
2、創建任務工作者Worker對象,具體的任務是由Worker來執行。
3、將Worker對象傳入WorkerflowTaskCoordinator對象中,WorkerflowTaskCoordinator負責啟動線程池來執行Worker任務,同時維護與server端的心跳以及最新任務數據的拉取操作。
通過閱讀上述代碼引出了幾個類名稱的解釋:
- WorkerflowTaskCoordinator:工作流的協調者,負責管理Task Worker的線程池以及和服務端的通信。
- TaskClient:conductor的任務管理客戶端類,負責從server端輪詢任務以及更新任務狀態等。
- Builder:用于創建WorkerflowTaskCoordinator實例的建造類。
這三個類的類圖如圖1-1所示,從圖中可以看到類的依賴、組合等關系。
圖1-1展示是Client層最核心的三個類的依賴關系,我們接下來的源碼解析就是圍繞這三個類來展開。
整個Client模塊的包結構和關鍵類如圖1-2所示:
其中:
- config包是關于Client的一些配置類
- exceptions包是自定義的client異常類
- http包是與服務端通信的基礎類,包括基礎基類ClientBase,還有元數據、負載、客戶端任務,工作流等通信類
- task包主要包括工作流協調者和工作流任務統計類
- worker包主要包括Worker工作者接口類
二、Client層源碼執行的全流程解析
我們拿文章 深入淺出Netflix Conductor使用 中介紹的案例來講解源碼流程(文章中包括了任務、工作流的DSL定義以及如何使用),流程圖形表示如圖1-3所示:
這張圖的含義非常簡單,用戶走下單流程到order模塊,如果下單成功則走payment支付模塊進行支付,如果下單失敗則走失敗模塊進行重試等操作。
在Swagger界面上輸入如下參數啟動工作流,如圖1-4
啟動的過程實際上是通過Swagger API接口調用server端的相關類,而client端則是通過拉取的方式來得到需要自己執行任務的通知和輸入參數。
啟動完工作流之后Client端的代碼進入WorkerflowTaskCoordinator中的init方法,代碼如下所示:
public synchronized void init() {
if(threadCount == -1) {
threadCount = workers.size();
}
logger.info("Initialized the worker with {} threads", threadCount);
this.workerQueue = new LinkedBlockingQueue<Runnable>(workerQueueSize);
AtomicInteger count = new AtomicInteger(0);
this.executorService = new ThreadPoolExecutor(threadCount, threadCount,
0L, TimeUnit.MILLISECONDS,
workerQueue,
(runnable) -> {
Thread thread = new Thread(runnable);
thread.setName(workerNamePrefix + count.getAndIncrement());
return thread;
});
this.scheduledExecutorService = Executors.newScheduledThreadPool(workers.size());
//定時輪詢server狀態策略,默認每隔1秒進行輪詢,根據任務名獲取當前任務信息
workers.forEach(worker -> {
scheduledExecutorService.scheduleWithFixedDelay(()->pollForTask(worker), worker.getPollingInterval(), worker.getPollingInterval(), TimeUnit.MILLISECONDS);
});
}
代碼說明:
這段代碼通過JDK中的scheduledExecutorService.scheduleWithFixedDelay方法每隔一秒對server端進行輪詢,輪詢任務的方法是pollForTask,代碼如下:
private void pollForTask(Worker worker) {
if(eurekaClient != null && !eurekaClient.getInstanceRemoteStatus().equals(InstanceStatus.UP)) {
logger.debug("Instance is NOT UP in discovery - will not poll");
return;
}
if(worker.paused()) {
WorkflowTaskMetrics.incrementTaskPausedCount(worker.getTaskDefName());
logger.debug("Worker {} has been paused. Not polling anymore!", worker.getClass());
return;
}
String domain = Optional.ofNullable(PropertyFactory.getString(worker.getTaskDefName(), DOMAIN, null))
.orElse(PropertyFactory.getString(ALL_WORKERS, DOMAIN, null));
logger.debug("Polling {}, domain={}, count = {} timeout = {} ms", worker.getTaskDefName(), domain, worker.getPollCount(), worker.getLongPollTimeoutInMS());
List<Task> tasks = Collections.emptyList();
try{
// get the remaining capacity of worker queue to prevent queue full exception
int realPollCount = Math.min(workerQueue.remainingCapacity(), worker.getPollCount());
if (realPollCount <= 0) {
logger.warn("All workers are busy, not polling. queue size = {}, max = {}", workerQueue.size(), workerQueueSize);
return;
}
//獲取當前客戶端的任務名稱
String taskType = worker.getTaskDefName();
//根據當前客戶端的任務名稱從server端的狀態機獲取是否有自己要執行的任務,如果有任務則獲取執行,只能獲取一次。
tasks = getPollTimer(taskType)
.record(() -> taskClient.batchPollTasksInDomain(taskType, domain, worker.getIdentity(), realPollCount, worker.getLongPollTimeoutInMS()));
incrementTaskPollCount(taskType, tasks.size());
logger.debug("Polled {}, domain {}, received {} tasks in worker - {}", worker.getTaskDefName(), domain, tasks.size(), worker.getIdentity());
} catch (Exception e) {
WorkflowTaskMetrics.incrementTaskPollErrorCount(worker.getTaskDefName(), e);
logger.error("Error when polling for tasks", e);
}
//根據獲取的任務列表,以線程的方式啟動執行任務
for (Task task : tasks) {
try {
executorService.submit(() -> {
try {
logger.debug("Executing task {}, taskId - {} in worker - {}", task.getTaskDefName(), task.getTaskId(), worker.getIdentity());
//這步就是執行用戶自定義的任務邏輯
execute(worker, task);
} catch (Throwable t) {
//執行失敗,置任務狀態為失敗,并將失敗結果返回到server端
task.setStatus(Task.Status.FAILED);
TaskResult result = new TaskResult(task);
handleException(t, result, worker, task);
}
});
} catch (RejectedExecutionException e) {
WorkflowTaskMetrics.incrementTaskExecutionQueueFullCount(worker.getTaskDefName());
logger.error("Execution queue is full, returning task: {}", task.getTaskId(), e);
returnTask(worker, task);
}
}
}
代碼說明:
每隔一秒從服務端的(tasks/poll/batch/{taskType})獲取當前需要執行的任務列表,任務只能獲取一次不能重新獲取。然后將任務通過異步線程的方式啟動執行,每一個任務都是由用戶自定義的邏輯實現,任務的返回值被封裝到了TaskResult類中,execute方法的內容如下所示:
private void execute(Worker worker, Task task) {
String taskType = task.getTaskDefName();
try {
if(!worker.preAck(task)) {
logger.debug("Worker decided not to ack the task {}, taskId = {}", taskType, task.getTaskId());
return;
}
if (!taskClient.ack(task.getTaskId(), worker.getIdentity())) {
WorkflowTaskMetrics.incrementTaskAckFailedCount(worker.getTaskDefName());
logger.error("Ack failed for {}, taskId = {}", taskType, task.getTaskId());
returnTask(worker, task);
return;
}
} catch (Exception e) {
logger.error(String.format("ack exception for task %s, taskId = %s in worker - %s", task.getTaskDefName(), task.getTaskId(), worker.getIdentity()), e);
WorkflowTaskMetrics.incrementTaskAckErrorCount(worker.getTaskDefName(), e);
returnTask(worker, task);
return;
}
com.google.common.base.Stopwatch stopwatch = com.google.common.base.Stopwatch.createStarted();
TaskResult result = null;
try {
//前面大部分都是做監控和統計功能的,在這里不細說
//這段代碼是真正執行用戶Task任務的代碼,執行完后返回值被封裝為TaskResult對象
logger.debug("Executing task {} in worker {} at {}", task, worker.getClass().getSimpleName(), worker.getIdentity());
result = worker.execute(task);
result.setWorkflowInstanceId(task.getWorkflowInstanceId());
result.setTaskId(task.getTaskId());
result.setWorkerId(worker.getIdentity());
} catch (Exception e) {
logger.error("Unable to execute task {}", task, e);
if (result == null) {
task.setStatus(Task.Status.FAILED);
result = new TaskResult(task);
}
handleException(e, result, worker, task);
} finally {
stopwatch.stop();
WorkflowTaskMetrics.getExecutionTimer(worker.getTaskDefName())
.record(stopwatch.elapsed(TimeUnit.MILLISECONDS), TimeUnit.MILLISECONDS);
}
logger.debug("Task {} executed by worker {} at {} with status {}", task.getTaskId(), worker.getClass().getSimpleName(), worker.getIdentity(), task.getStatus());
//更新任務狀態,成功或者失敗
updateWithRetry(updateRetryCount, task, result, worker);
}
代碼說明:
通過worker.execute方法執行用戶定義的任務邏輯,不管是否成功都執行updatewithRetry方法更新server端的任務狀態和任務執行返回結果。
訪問的URL是/api/tasks。
三、完整流程時序圖