在1.6版本之前spark采用靜態內存管理器StaticMemoryManager進行內存管理,而在之后spark采用統一內存管理器UnifiedMemoryManager進行內存管理,可以對內存進行動態管理,提高了內存的利用率。
1、spark內存模型
object UnifiedMemoryManager {
// Set aside a fixed amount of memory for non-storage, non-execution purposes.
// This serves a function similar to `spark.memory.fraction`, but guarantees that we reserve
// sufficient memory for the system even for small heaps. E.g. if we have a 1GB JVM, then
// the memory used for execution and storage will be (1024 - 300) * 0.6 = 434MB by default.
private val RESERVED_SYSTEM_MEMORY_BYTES = 300 * 1024 * 1024
def apply(conf: SparkConf, numCores: Int): UnifiedMemoryManager = {
val maxMemory = getMaxMemory(conf)
new UnifiedMemoryManager(
conf,
maxHeapMemory = maxMemory,
onHeapStorageRegionSize =
(maxMemory * conf.getDouble("spark.memory.storageFraction", 0.5)).toLong,
numCores = numCores)
}
/**
* Return the total amount of memory shared between execution and storage, in bytes.
*/
private def getMaxMemory(conf: SparkConf): Long = {
val systemMemory = conf.getLong("spark.testing.memory", Runtime.getRuntime.maxMemory)
val reservedMemory = conf.getLong("spark.testing.reservedMemory",
if (conf.contains("spark.testing")) 0 else RESERVED_SYSTEM_MEMORY_BYTES)
val minSystemMemory = (reservedMemory * 1.5).ceil.toLong
if (systemMemory < minSystemMemory) {
throw new IllegalArgumentException(s"System memory $systemMemory must " +
s"be at least $minSystemMemory. Please increase heap size using the --driver-memory " +
s"option or spark.driver.memory in Spark configuration.")
}
// SPARK-12759 Check executor memory to fail fast if memory is insufficient
if (conf.contains("spark.executor.memory")) {
val executorMemory = conf.getSizeAsBytes("spark.executor.memory")
if (executorMemory < minSystemMemory) {
throw new IllegalArgumentException(s"Executor memory $executorMemory must be at least " +
s"$minSystemMemory. Please increase executor memory using the " +
s"--executor-memory option or spark.executor.memory in Spark configuration.")
}
}
val usableMemory = systemMemory - reservedMemory
val memoryFraction = conf.getDouble("spark.memory.fraction", 0.6)
(usableMemory * memoryFraction).toLong
}
}
spark堆內內存可以分為預留內存及可用內存兩部分,其中預留內存為300M,可用內存分為兩部分,一部分為用戶內存,主要用于存儲RDD轉換操作所需要的數據,另一部分內存(可用內存spark.memory.fraction)分為exexution內存及storage內存(可用內存
spark.memory.fraction
spark.memory.storageFraction)。在spark2.4版本中,spark.memory.fraction及spark.memory.storageFraction默認值分別為0.6及0.5。舉個例子,executor-memory設置為2g時,storage內存及execution內存一共為(2048-300)
0.6=1048.8M
捕獲.PNG
spark內存模型
2、storage/execution動態占用機制
* Storage can borrow as much execution memory as is free until execution reclaims its space.
* When this happens, cached blocks will be evicted from memory until sufficient borrowed
* memory is released to satisfy the execution memory request.
*
* Similarly, execution can borrow as much storage memory as is free. However, execution
* memory is *never* evicted by storage due to the complexities involved in implementing this.
* The implication is that attempts to cache blocks may fail if execution has already eaten
* up most of the storage space, in which case the new blocks will be evicted immediately
* according to their respective storage levels.
上面為spark源碼中unifiedMemoryManager中的注釋,大致意思當storage內存不足且execution有多余內存時,storage可以借用execution的內存,execution內存不足時會強制收回被storage占用的內存;同理,execution內存不足時也可以借用storage的內存,但是storage不會強制收回被execution占用的內存。