[SPARK][CORE] 面試問題之 Shuffle reader 的細枝末節（下）

歡迎關注公眾號“Tim在路上”

在Spark中shuffleWriter有三種實現，分別是bypassMergeSortShuffleWriter, UnsafeShuffleWriter和SortShuffleWriter。但是shuffleReader卻只有一種實現BlockStoreShuffleReader 。

從上一講中可以知道，這時Spark已經獲取到了shuffle元數據包括每個mapId和其location信息，并將其傳遞給BlockStoreShuffleReader類。接下來我們來詳細分析下BlockStoreShuffleReader的實現。

// BlockStoreShuffleReader
override def read(): Iterator[Product2[K, C]] = {
  // [1] 初始化ShuffleBlockFetcherIterator，負責從executor中獲取 shuffle 塊
  val wrappedStreams = new ShuffleBlockFetcherIterator(
    context,
    blockManager.blockStoreClient,
    blockManager,
    mapOutputTracker,
    blocksByAddress,
    ...
    readMetrics,
    fetchContinuousBlocksInBatch).toCompletionIterator

  val serializerInstance = dep.serializer.newInstance()

  // [2] 將shuffle 塊反序列化為record迭代器
  // Create a key/value iterator for each stream
  val recordIter = wrappedStreams.flatMap { case (blockId, wrappedStream) =>
    // Note: the asKeyValueIterator below wraps a key/value iterator inside of a
    // NextIterator. The NextIterator makes sure that close() is called on the
    // underlying InputStream when all records have been read.
    serializerInstance.deserializeStream(wrappedStream).asKeyValueIterator
  }

  // Update the context task metrics for each record read.
  val metricIter = CompletionIterator[(Any, Any), Iterator[(Any, Any)]](
    recordIter.map { record =>
      readMetrics.incRecordsRead(1)
      record
    },
    context.taskMetrics().mergeShuffleReadMetrics())

  // An interruptible iterator must be used here in order to support task cancellation
  val interruptibleIter = new InterruptibleIterator[(Any, Any)](context, metricIter)
   // [3] reduce端聚合數據：如果map端已經聚合過了，則對讀取到的聚合結果進行聚合。如果map端沒有聚合，則針對未合并的<k,v>進行聚合。
  val aggregatedIter: Iterator[Product2[K, C]] = if (dep.aggregator.isDefined) {
    if (dep.mapSideCombine) {
      // We are reading values that are already combined
      val combinedKeyValuesIterator = interruptibleIter.asInstanceOf[Iterator[(K, C)]]
dep.aggregator.get.combineCombinersByKey(combinedKeyValuesIterator, context)
    } else {
      // We don't know the value type, but also don't care -- the dependency *should*
      // have made sure its compatible w/ this aggregator, which will convert the value
      // type to the combined type C
      val keyValuesIterator = interruptibleIter.asInstanceOf[Iterator[(K, Nothing)]]
dep.aggregator.get.combineValuesByKey(keyValuesIterator, context)
    }
  } else {
    interruptibleIter.asInstanceOf[Iterator[Product2[K, C]]]
  }
  // [4] reduce端排序數據：如果需要對key排序，則進行排序。基于sort的shuffle實現過程中，默認只是按照partitionId排序。在每一個partition內部并沒有排序，因此添加了keyOrdering變量，提供是否需要對分區內部的key排序
  // Sort the output if there is a sort ordering defined.
  val resultIter: Iterator[Product2[K, C]] =dep.keyOrdering match {
    caseSome(keyOrd: Ordering[K]) =>
      // Create an ExternalSorter to sort the data.
      val sorter =
        new ExternalSorter[K, C, C](context, ordering =Some(keyOrd), serializer =dep.serializer)
      sorter.insertAllAndUpdateMetrics(aggregatedIter)
    case None =>
      aggregatedIter
  }

  // [5] 返回結果集迭代器
  resultIter match {
    case _: InterruptibleIterator[Product2[K, C]] => resultIter
    case _ =>
      // Use another interruptible iterator here to support task cancellation as aggregator
      // or(and) sorter may have consumed previous interruptible iterator.
      new InterruptibleIterator[Product2[K, C]](context, resultIter)
  }
}

從上面可見，在BlockStoreShuffleReader.read()讀取數據有五步：

[1] 初始化ShuffleBlockFetcherIterator，負責從executor中獲取 shuffle 塊
[2] 將shuffle 塊反序列化為record迭代器
[3] reduce端聚合數據：如果map端已經聚合過了，則對讀取到的聚合結果進行聚合。如果map端沒有聚合，則針對未合并的<k,v>進行聚合。
[4] reduce端排序數據：如果需要對key排序，則進行排序。基于sort的shuffle實現過程中，默認只是按照partitionId排序。在每一個partition內部并沒有排序，因此添加了keyOrdering變量，提供是否需要對分區內部的key排序
[5] 返回結果集迭代器

下面我們詳細分析下ShuffleBlockFetcherIterator是如何進行fetch數據的

ShuffleBlockFetcherIterator是如何進行fetch數據的？

當shuffle reader創建 ShuffleBlockFetcherIterator 的實例時，迭代器調用在其initialize()方法。

// ShuffleBlockFetcherIterator
private[this] def initialize(): Unit = {
  // Add a task completion callback (called in both success case and failure case) to cleanup.
  context.addTaskCompletionListener(onCompleteCallback)
  // Local blocks to fetch, excluding zero-sized blocks.
  val localBlocks = mutable.LinkedHashSet[(BlockId, Int)]()
  val hostLocalBlocksByExecutor =
    mutable.LinkedHashMap[BlockManagerId, Seq[(BlockId, Long, Int)]]()
  val pushMergedLocalBlocks = mutable.LinkedHashSet[BlockId]()
  // [1] 劃分數據源的請求：本地、主機本地和遠程塊
  // Partition blocks by the different fetch modes: local, host-local, push-merged-local and
  // remote blocks.
  val remoteRequests = partitionBlocksByFetchMode(
    blocksByAddress, localBlocks, hostLocalBlocksByExecutor, pushMergedLocalBlocks)
  // [2] 以隨機順序將遠程請求添加到我們的隊列中
  // Add the remote requests into our queue in a random order
  fetchRequests ++= Utils.randomize(remoteRequests)
  assert((0 ==reqsInFlight) == (0 ==bytesInFlight),
    "expected reqsInFlight = 0 but found reqsInFlight = " +reqsInFlight+
    ", expected bytesInFlight = 0 but found bytesInFlight = " +bytesInFlight)

  // [3] 發送remote fetch請求
  // Send out initial requests for blocks, up to our maxBytesInFlight
  fetchUpToMaxBytes()

  val numDeferredRequest = deferredFetchRequests.values.map(_.size).sum
  val numFetches = remoteRequests.size -fetchRequests.size - numDeferredRequest
  logInfo(s"Started$numFetches remote fetches in${Utils.getUsedTimeNs(startTimeNs)}" +
    (if (numDeferredRequest > 0 ) s", deferred$numDeferredRequest requests" else ""))
  // [4] 支持executor獲取local和remote的merge shuffle數據
  // Get Local Blocks
  fetchLocalBlocks(localBlocks)
  logDebug(s"Got local blocks in${Utils.getUsedTimeNs(startTimeNs)}")
  // Get host local blocks if any
  fetchAllHostLocalBlocks(hostLocalBlocksByExecutor)
pushBasedFetchHelper.fetchAllPushMergedLocalBlocks(pushMergedLocalBlocks)
}

在shuffle fetch的迭代器中，獲取數據請求有下面四步：

[1] 通過不同的獲取模式對塊進行分區：本地、主機本地和遠程塊
[2] 以隨機順序將遠程請求添加到我們的隊列中
[3] 發送remote fetch請求
[4] 獲取local blocks
[5] 獲取host blocks
[6] 獲取pushMerge的local blocks

劃分數據源的請求

private[this] def partitionBlocksByFetchMode(
    blocksByAddress: Iterator[(BlockManagerId, Seq[(BlockId, Long, Int)])],
    localBlocks: mutable.LinkedHashSet[(BlockId, Int)],
    hostLocalBlocksByExecutor: mutable.LinkedHashMap[BlockManagerId, Seq[(BlockId, Long, Int)]],
    pushMergedLocalBlocks: mutable.LinkedHashSet[BlockId]): ArrayBuffer[FetchRequest] = {
  ...

val fallback = FallbackStorage.FALLBACK_BLOCK_MANAGER_ID.executorId
  val localExecIds =Set(blockManager.blockManagerId.executorId, fallback)
  for ((address, blockInfos) <- blocksByAddress) {
    checkBlockSizes(blockInfos)
    // [1] 如果是push-merged blocks， 判斷其是否是主機的還是遠程請求
    if (pushBasedFetchHelper.isPushMergedShuffleBlockAddress(address)) {
      // These are push-merged blocks or shuffle chunks of these blocks.
      if (address.host == blockManager.blockManagerId.host) {
numBlocksToFetch+= blockInfos.size
        pushMergedLocalBlocks ++= blockInfos.map(_._1)
        pushMergedLocalBlockBytes += blockInfos.map(_._2).sum
      } else {
        collectFetchRequests(address, blockInfos, collectedRemoteRequests)
      }
     // [2] 如果是localexecIds, 放入localBlocks
    } else if (localExecIds.contains(address.executorId)) {
      val mergedBlockInfos =mergeContinuousShuffleBlockIdsIfNeeded(
        blockInfos.map(info =>FetchBlockInfo(info._1, info._2, info._3)), doBatchFetch)
numBlocksToFetch+= mergedBlockInfos.size
      localBlocks ++= mergedBlockInfos.map(info => (info.blockId, info.mapIndex))
      localBlockBytes += mergedBlockInfos.map(_.size).sum
    // [3] 如果是host本地，并將其放入hostLocalBlocksByExecutor
    } else if (blockManager.hostLocalDirManager.isDefined &&
      address.host == blockManager.blockManagerId.host) {
      val mergedBlockInfos =mergeContinuousShuffleBlockIdsIfNeeded(
        blockInfos.map(info =>FetchBlockInfo(info._1, info._2, info._3)), doBatchFetch)
numBlocksToFetch+= mergedBlockInfos.size
      val blocksForAddress =
        mergedBlockInfos.map(info => (info.blockId, info.size, info.mapIndex))
      hostLocalBlocksByExecutor += address -> blocksForAddress
      numHostLocalBlocks += blocksForAddress.size
      hostLocalBlockBytes += mergedBlockInfos.map(_.size).sum
    // [4] 如果是remote請求，收集fetch請求, 每個請求的最大請求數據大小，是max(maxBytesInFlight / 5, 1L)，這是為了提高請求的并發度，保證至少向5個不同的節點發送請求獲取數據，最大限度地利用各節點的資源
    } else {
      val (_, timeCost) = Utils.timeTakenMs[Unit] {
        collectFetchRequests(address, blockInfos, collectedRemoteRequests)
      }
      logDebug(s"Collected remote fetch requests for$address in$timeCost ms")
    }
  }
  val (remoteBlockBytes, numRemoteBlocks) =
    collectedRemoteRequests.foldLeft((0L, 0))((x, y) => (x._1 + y.size, x._2 + y.blocks.size))
  val totalBytes = localBlockBytes + remoteBlockBytes + hostLocalBlockBytes +
    pushMergedLocalBlockBytes
  val blocksToFetchCurrentIteration =numBlocksToFetch- prevNumBlocksToFetch
  ...
  this.hostLocalBlocks++= hostLocalBlocksByExecutor.values
    .flatMap { infos => infos.map(info => (info._1, info._3)) }
  collectedRemoteRequests
}

[1] 如果是push-merged blocks，判斷其是否是主機的還是遠程請求
[2] 如果是localexecIds, 放入localBlocks
[3] 如果是host本地，并將其放入hostLocalBlocksByExecutor
[4] 如果是remote請求，收集fetch請求, 每個請求的最大請求數據大小，是max(maxBytesInFlight / 5, 1L)，這是為了提高請求的并發度，保證至少向5個不同的節點發送請求獲取數據，最大限度地利用各節點的資源

在劃分完數據的請求類別后，會依次的進行remote fetch請求，local blocks請求，host blocks請求和獲取pushMerge的local blocks。

那么數據是如何被Fetch的呢？接下來我們看下fetchUpToMaxBytes()方法。

private def fetchUpToMaxBytes(): Unit = {
  // [1] 如果是延遲請求，如果可以遠程塊Fetch同時是未達到處理請求的字節數，進行send請求
  if (deferredFetchRequests.nonEmpty) {
    for ((remoteAddress, defReqQueue) <-deferredFetchRequests) {
      while (isRemoteBlockFetchable(defReqQueue) &&
          !isRemoteAddressMaxedOut(remoteAddress, defReqQueue.front)) {
        val request = defReqQueue.dequeue()
        logDebug(s"Processing deferred fetch request for$remoteAddress with "
          + s"${request.blocks.length} blocks")
        send(remoteAddress, request)
        if (defReqQueue.isEmpty) {
deferredFetchRequests-= remoteAddress
        }
      }
    }
  }

  // [2] 如果正常可以遠程Fetch請求，直接send請求；如果達到處理請求的字節，則創建remoteAddress的延遲請求
  // Process any regular fetch requests if possible.
  while (isRemoteBlockFetchable(fetchRequests)) {
    val request = fetchRequests.dequeue()
    val remoteAddress = request.address
    if (isRemoteAddressMaxedOut(remoteAddress, request)) {
      logDebug(s"Deferring fetch request for$remoteAddress with${request.blocks.size} blocks")
      val defReqQueue = deferredFetchRequests.getOrElse(remoteAddress, new Queue[FetchRequest]())
      defReqQueue.enqueue(request)
deferredFetchRequests(remoteAddress) = defReqQueue
    } else {
      send(remoteAddress, request)
    }
  }
}

Fetch請求字節數據：

[1] 如果是延遲請求，如果可以遠程塊Fetch同時是未達到處理請求的字節數，進行send請求
[2] 如果正常可以遠程Fetch請求，直接send請求；如果達到處理請求的字節，則創建remoteAddress的延遲請求

它會驗證該請求是否應被視為延遲。如果是，則將其添加到deferredFetchRequests中。否則，它會繼續并從BlockStoreClient實現發送請求（如果啟用了 shuffle 服務，則為ExternalBlockStoreClient ，否則為NettyBlockTransferService）。

// ShuffleBlockFetcherIterator
private[this] def sendRequest(req: FetchRequest): Unit = {
      // ...
      // [1] 創建了一個**BlockFetchingListener**，在完成請求后會被調用
      val blockFetchingListener = new BlockFetchingListener {
      override def onBlockFetchSuccess(blockId: String, buf: ManagedBuffer): Unit = {
      // ...
      remainingBlocks -= blockId
      results.put(new SuccessFetchResult(BlockId(blockId), infoMap(blockId)._2,
      address, infoMap(blockId)._1, buf, remainingBlocks.isEmpty))
      // ...
      }
      override def onBlockFetchFailure(blockId: String, e: Throwable): Unit = {
        results.put(new FailureFetchResult(BlockId(blockId), infoMap(blockId)._2, address, e))
      }
    }

    // Fetch remote shuffle blocks to disk when the request is too large. Since the shuffle data is
    // already encrypted and compressed over the wire(w.r.t. the related configs), we can just fetch
    // the data and write it to file directly.
    // [2] 如果請求大小超過可以存儲在內存中的請求的最大大小 ，則迭代器通過可選地定義DownloadFileManager來發送獲取請求
    if (req.size > maxReqSizeShuffleToMem) {
      shuffleClient.fetchBlocks(address.host, address.port, address.executorId, blockIds.toArray,
        blockFetchingListener, this)
    } else {
      shuffleClient.fetchBlocks(address.host, address.port, address.executorId, blockIds.toArray,
        blockFetchingListener, null)
    }

在sendRequest中主要進行了以下兩個步驟：

[1] 創建了一個BlockFetchingListener，在完成請求后會被調用
[2] 如果請求大小超過可以存儲在內存中的請求的最大大小，則迭代器通過可選地定義DownloadFileManager來發送獲取請求

Ued.png

首先，ShuffleBlockFetcherIterator迭代器創建了一個BlockFetchingListener，在其中定義成功執行和實現執行后的回調函數，如果成功執行，它會首先為迭代器加synchronized鎖，然后將塊數據添加到結果變量中。如果發生錯誤，同樣會先加synchronized鎖，然后它將添加一個標記類來指示獲取失敗。

其次，ShuffleBlockFetcherIterator會調用BlockStoreClient的fetchBlocks方法，在調用前會判斷請求的內容的大小，如果超過門限，則傳參定義DownloadFileManager，它會使得shuffleData將被下載到臨時文件。

下面我們看下最終的fetchBlocks是如何實現的？

@Override
public void fetchBlocks(
    String host,
    int port,
    String execId,
    String[] blockIds,
    BlockFetchingListener listener,
    DownloadFileManager downloadFileManager) {
  checkInit();
  logger.debug("External shuffle fetch from {}:{} (executor id {})", host, port, execId);
  try {
    // [1] 首先創建并初始化RetryingBlockFetcher類，用它加載shuffle files
    int maxRetries = transportConf.maxIORetries();
    RetryingBlockTransferor.BlockTransferStarter blockFetchStarter =
        (inputBlockId, inputListener) -> {
          // Unless this client is closed.
          if (clientFactory != null) {
            assert inputListener instanceof BlockFetchingListener :
              "Expecting a BlockFetchingListener, but got " + inputListener.getClass();
            TransportClient client = clientFactory.createClient(host, port, maxRetries > 0);
           // [2] 創建OneForOneBlockFetcher，用其進行下載shuffle Data
            new OneForOneBlockFetcher(client, appId, execId, inputBlockId,
              (BlockFetchingListener) inputListener, transportConf, downloadFileManager).start();
          } else {
            logger.info("This clientFactory was closed. Skipping further block fetch retries.");
          }
        };
      ...
      // [3] 調用OneForOneBlockFetcher的start方法
      blockFetchStarter.createAndStart(blockIds, listener);
    }
}

[1] 首先創建并初始化RetryingBlockFetcher類，用它加載shuffle files
[2] 創建OneForOneBlockFetcher，用其進行下載shuffle Data

OneForOneBlockFetcher進行Shuffle 數據的下載

OneForOneBlockFetcher是基于RPC通信，從各個Executor端獲取shuffle數據，我們首先來簡要概述下：

首先，fetcher 會向持有 shuffle 文件的 executor發送FetchShuffleBlocks消息；
其次，executor將register new Stream 同時返回StreamHandle消息到fetcher，它帶有streamId；
在收到StreamHandle響應后，client將stream或load 數據塊；
如果downloadFileManager 不為空，則會將結果寫入臨時文件；對于內存的場景，shuffle bytes將加載到in-memory buffer中；
最終，基于臨時文件還是基于內存都會調用sendRequest中定義的BlockFetchingListener回調函數。

itled.png

獲取到的shuffle data會被放入到new LinkedBlockingQueue[FetchResult]，并調用next()方法。如果所有可用的塊數據都已被消耗，迭代器將執行之前提供的 fetchUpToMaxBytes()。

ShuffleBlockFetcherIterator初始化完成后

在ShuffleBlockFetcherIterator初始化完成后，我們再來看看剩余的工作：

private class ShuffleFetchCompletionListener(var data: ShuffleBlockFetcherIterator)
  extends TaskCompletionListener {
  override def onTaskCompletion(context: TaskContext): Unit = {
    if (data != null) {
      data.cleanup()locations(blocksByAddress)
      data = null
    }
  }
  def onComplete(context: TaskContext): Unit = this.onTaskCompletion(context)
}

在ShuffleBlockFetcherIterator初始化完成后，會將其轉換為CompletionIterator，在其中主要是進行資源的釋放。然后借助于反序列化器將其將shuffle block反序列化為record迭代器。在將其包裝為metricIter 同于更新task的metric。之后再將其封裝為InterruptibleIterator迭代器。可中斷迭代器的作用是每次執行hasNext方法時，它都會分析任務狀態并最終終止托管此迭代器的任務。主要用于啟用了推測執行的情況。

val interruptibleIter = new InterruptibleIterator[(Any, Any)](context, metricIter)

def hasNext: Boolean = {
    // TODO(aarondav/rxin): Check Thread.interrupted instead of context.interrupted if interrupt
    // is allowed. The assumption is that Thread.interrupted does not have a memory fence in read
    // (just a volatile field in C), while context.interrupted is a volatile in the JVM, which
    // introduces an expensive read fence.
    context.killTaskIfInterrupted()
    delegate.hasNext
 }

接下來就是reduce端的聚合排序的操作, 注意這里需要在ShuffleDependency中定義, aggregator和keyOrdering，這些操作需要在PairRDDFunctions 中進行定義。

但是在SparkSQL中，它采用的是ShuffleExchangeExec并不會定義 aggregator和keyOrdering，那么Spark SQL是如何實現聚合和排序的呢？

val aggregatedIter: Iterator[Product2[K, C]] = if (dep.aggregator.isDefined) {
    ...
  } else {
    interruptibleIter.asInstanceOf[Iterator[Product2[K, C]]]
  }

val resultIter: Iterator[Product2[K, C]] =dep.keyOrdering match {
    caseSome(keyOrd: Ordering[K]) =>
      val sorter =
        new ExternalSorter[K, C, C](context, ordering =Some(keyOrd), serializer =dep.serializer)
      sorter.insertAllAndUpdateMetrics(aggregatedIter)
    case None =>
      aggregatedIter
  }

其實通過其執行計劃可以知道，其會在其中插入Sort算子來實現聚合排序。

到此為止，shuffle reader的大致過程已經走了一遍，但是還有很多的重要細節并沒有展開探討，那么這里就詳細總結下整體的流程：

Fetch前的準備

fetch reader 的調用主要是ShuffledRDD和ShuffledRowRDD中，通過傳入不同的partitionspecs給getReader傳入不同的調用參數。
在getReader中會先通過mapOutputTracker獲取mapid對應的shuffle文件的位置，然后在通過BlockStoreShuffleReader reader的唯一實現類進行shuffle fetch;
在Driver端mapOutputTracker記錄mapId和對應的文件位置主要由MapOutputTrackerMaster進行維護，在創建mapShuffleStage時會向master tracker中注冊shuffleid, 在完成mapStage時會更新對應shuffleId中維護的mapid對應的位置信息。在Executor端從MapOutputTrackerWorker中獲取位置信息，如果獲取不到會向master tracker發送信息，同步信息過來；

處理Fetch請求

在BlockStoreShuffleReader中進行fetch時，會先創建ShuffleBlockFetcherIterator，并將Fetch分為local, host local, remote不同方式；同時在Fetch時也會有些限制，包括每個Excutor阻塞的fetch request數和fetch shuffle數據是否大于分配的內存；如果請求的數據量過多，超過了內存限制，將通過寫入臨時文件實現；如果網絡通信開銷太大，fetcher 將停止讀取，并在需要下一個 shuffle 塊文件時恢復讀取。
最終的Fetch是通過OneForOneBlockFetcher實現的，fetcher 會向持有 shuffle 文件的 executor發送FetchShuffleBlocks消息，executor將register new Stream 同時將數據封裝為StreamHandle消息返回到fetcher，client最后再將加載數據塊；最終調用BlockFetchingListener回調函數。

Fetch后的處理

reduce端聚合數據：如果map端已經聚合過了，則對讀取到的聚合結果進行聚合。如果map端沒有聚合，則針對未合并的<k,v>進行聚合。
reduce端排序數據：如果需要對key排序，則進行排序。基于sort的shuffle實現過程中，默認只是按照partitionId排序。在每一個partition內部并沒有排序，因此添加了keyOrdering變量，提供是否需要對分區內部的key排序
另外需要注意的是SparkSQL中并不會設置ShuffleDependency的排序和聚合，而是通過規則在邏輯樹中插入Sort算子實現的。

學完Shuffle Reader下面是一些思考題：

為什么在調用getReader時要根據partitionspecs的不同傳遞不同的參數？主要的作用是什么？
遠程Fetch和本地Fetch最大的區別是什么？
InterruptibleIterator 和 CompletionIterator 迭代器的作用是什么？

?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明：文章內容（如有圖片或視頻亦包括在內）由作者上傳并發布，文章內容僅代表作者本人觀點，簡書系信息發布平臺，僅提供信息存儲服務。

人面猴
序言：七十年代末，一起剝皮案震驚了整個濱河市，隨后出現的幾起案子，更是在濱河造成了極大的恐慌，老刑警劉巖，帶你破解...
沈念sama閱讀 228,030評論 6贊 531
死咒
序言：濱河連續發生了三起死亡事件，死亡現場離奇詭異，居然都是意外死亡，警方通過查閱死者的電腦和手機，發現死者居然都...
沈念sama閱讀 98,310評論 3贊 415
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進店門，熙熙樓的掌柜王于貴愁眉苦臉地迎上來，“玉大人，你說我怎么就攤上這事。” “怎么了？”我有些...
開封第一講書人閱讀 175,951評論 0贊 373
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵，是天一觀的道長。經常有香客問我，道長，這世上最難降的妖魔是什么？我笑而不...
開封第一講書人閱讀 62,796評論 1贊 309
?港島之戀（遺憾婚禮）
正文為了忘掉前任，我火速辦了婚禮，結果婚禮上，老公的妹妹穿的比我還像新娘。我一直安慰自己，他們只是感情好，可當我...
茶點故事閱讀 71,566評論 6贊 407
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布。她就那樣靜靜地躺著，像睡著了一般。火紅的嫁衣襯著肌膚如雪。梳的紋絲不亂的頭發上，一...
開封第一講書人閱讀 55,055評論 1贊 322
城市分裂傳說
那天，我揣著相機與錄音，去河邊找鬼。笑死，一個胖子當著我的面吹牛，可吹牛的內容都是我干的。我是一名探鬼主播，決...
沈念sama閱讀 43,142評論 3贊 440
雙鴛鴦連環套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼，長吁一口氣：“原來是場噩夢啊……” “哼！你這毒婦竟也來了？” 一聲冷哼從身側響起，我...
開封第一講書人閱讀 42,303評論 0贊 288
萬榮殺人案實錄
序言：老撾萬榮一對情侶失蹤，失蹤者是張志新（化名）和其女友劉穎，沒想到半個月后，有當地人在樹林里發現了一具尸體，經...
沈念sama閱讀 48,799評論 1贊 333
?護林員之死
正文獨居荒郊野嶺守林人離奇死亡，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內容為張勛視角年9月15日...
茶點故事閱讀 40,683評論 3贊 354
?白月光啟示錄
正文我和宋清朗相戀三年，在試婚紗的時候發現自己被綠了。大學時的朋友給我發了我未婚夫和他白月光在一起吃飯的照片。...
茶點故事閱讀 42,899評論 1贊 369
活死人
序言：一個原本活蹦亂跳的男人離奇死亡，死狀恐怖，靈堂內的尸體忽然破棺而出，到底是詐尸還是另有隱情，我是刑警寧澤，帶...
沈念sama閱讀 38,409評論 5贊 358
?日本核電站爆炸內幕
正文年R本政府宣布，位于F島的核電站，受9級特大地震影響，放射性物質發生泄漏。R本人自食惡果不足惜，卻給世界環境...
茶點故事閱讀 44,135評論 3贊 347
男人毒藥：我在死后第九天來索命
文/蒙蒙一、第九天我趴在偏房一處隱蔽的房頂上張望。院中可真熱鬧，春花似錦、人聲如沸。這莊子的主人今日做“春日...
開封第一講書人閱讀 34,520評論 0贊 26
一樁弒父案，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽。三九已至，卻和暖如春，著一層夾襖步出監牢的瞬間，已是汗流浹背。一陣腳步聲響...
開封第一講書人閱讀 35,757評論 1贊 282
情欲美人皮
我被黑心中介騙來泰國打工，沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留，地道東北人。一個月前我還...
沈念sama閱讀 51,528評論 3贊 390
代替公主和親
正文我出身青樓，卻偏偏與公主長得像，于是被迫代替她去往敵國和親。傳聞我的和親對象是個殘疾皇子，可洞房花燭夜當晚...
茶點故事閱讀 47,844評論 2贊 372

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频

[SPARK][CORE] 面試問題之 Shuffle reader 的細枝末節（下）

[SPARK][CORE] 面試問題之 Shuffle reader 的細枝末節（下）

ShuffleBlockFetcherIterator是如何進行fetch數據的？

OneForOneBlockFetcher進行Shuffle 數據的下載

ShuffleBlockFetcherIterator初始化完成后

推薦閱讀更多精彩內容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美 国产 综合 欧美 视频

[SPARK][CORE] 面試問題之 Shuffle reader 的細枝末節 （下）

ShuffleBlockFetcherIterator是如何進行fetch數據的？

OneForOneBlockFetcher進行Shuffle 數據的下載

ShuffleBlockFetcherIterator初始化完成后

推薦閱讀更多精彩內容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频

[SPARK][CORE] 面試問題之 Shuffle reader 的細枝末節（下）