memory

與傳統的 c/c++不同，go 是一個 gc 語言，大多數情況下內存的分配和銷毀由 go 來進行自動管理，一個對象的內存應該被分配到棧上還是堆上由編譯器來進行決定，基本上不需要用戶參與內存管理，用戶要做的僅僅就是使用內存。在 go 中堆內存管理主要有兩個大的組件，內存分配器負責堆內存的分配，垃圾回收器負責回收釋放無用的堆內存，本文主要講的就是內存分配器的工作方式，go 內存分配器很大程度上受到了谷歌的 TCMalloc 內存分配器的影響。

分配器

在 go 中有兩種內存分配器，一種是線性分配器，另一種就是鏈式分配。

線性分配

線性分配器對應著runtime.linearAlloc結構體，如下所示

type linearAlloc struct {
  next   uintptr // next free byte
  mapped uintptr // one byte past end of mapped space
  end    uintptr // end of reserved space

  mapMemory bool // transition memory from Reserved to Ready if true
}

該分配器會向操作系統預先申請一片連續的內存空間，next指向可使用的內存地址，end指向內存空間的末尾地址，大概可以理解為下圖。

線性分配器的內存分配方式非常好理解，根據要申請的內存大小檢查是否有足夠的剩余空間來容納，如果足夠的話就更新next字段並返回剩余空間的起始地址，代碼如下。

func (l *linearAlloc) alloc(size, align uintptr, sysStat *sysMemStat) unsafe.Pointer {
  p := alignUp(l.next, align)
  if p+size > l.end {
    return nil
  }
  l.next = p + size
  return unsafe.Pointer(p)
}

這種分配方式的優點就是快速和簡單，缺點也相當明顯，就是無法重新利用已釋放的內存，因為next字段只會指向剩余的空間內存地址，對於先前已使用後被釋放的內存空間則無法感知，這樣做會造成很大的內存空間浪費，如下圖所示。

所以線性分配並不是 go 中主要的分配方式，它只在 32 位機器上作為內存預分配的功能來使用。

鏈式分配

鏈式分配器器對應著結構體runtime.fixalloc，鏈式分配器分配的內存不是連續的，以單向鏈表的形式存在。鏈式分配器由若干個固定大小的內存塊組成，而每一個內存塊由若干個固定大小的內存片組成，每一次進行內存分配時，都會使用一個固定大小的內存片。

type fixalloc struct {
  size   uintptr
  first  func(arg, p unsafe.Pointer) // called first time p is returned
  arg    unsafe.Pointer
  list   *mlink
  chunk  uintptr // use uintptr instead of unsafe.Pointer to avoid write barriers
  nchunk uint32  // bytes remaining in current chunk
  nalloc uint32  // size of new chunks in bytes
  inuse  uintptr // in-use bytes now
  stat   *sysMemStat
  zero   bool // zero allocations
}

type mlink struct {
  _    sys.NotInHeap
  next *mlink
}

它的字段不像線性分配器一樣簡單易懂，這裡簡單介紹一下重要的

size，指的是每次內存分配時使用多少的內存。
list，指向可復用內存片的頭節點，每一片內存空間的大小由size決定。
chunk，指向當前正在使用的內存塊中的空閒地址
nchunk，當前內存塊的剩余可用字節數
nalloc，內存塊的大小，固定為 16KB。
inuse，總共已使用了多少字節的內存
zero，在復用內存塊時，是否將內存清零

鏈式分配器持有著當前內存塊和可復用內存片的引用，每一個內存塊的大小都固定為 16KB，這個值在初始化時就被設置好了。

const _FixAllocChunk = 16 << 10

func (f *fixalloc) init(size uintptr, first func(arg, p unsafe.Pointer), arg unsafe.Pointer, stat *sysMemStat) {
  if size > _FixAllocChunk {
    throw("runtime: fixalloc size too large")
  }
  if min := unsafe.Sizeof(mlink{}); size < min {
    size = min
  }

  f.size = size
  f.first = first
  f.arg = arg
  f.list = nil
  f.chunk = 0
  f.nchunk = 0
  f.nalloc = uint32(_FixAllocChunk / size * size)
  f.inuse = 0
  f.stat = stat
  f.zero = true
}

內存塊的分布如下圖所示，圖中的內存塊是按照創建時間的先後來進行排列的，實際上它們的地址是不連續的。

鏈式分配器每一次分配的內存大小也是固定的，由fixalloc.size來決定，在分配時會首先檢查是否有可復用的內存塊，如果有的話則優先使用復用內存塊，然後才會去使用當前的內存塊，如果當前的內存塊的剩余空間不足以容納就會創建一個新的內存塊，這部分邏輯對應如下代碼。

func (f *fixalloc) alloc() unsafe.Pointer {
  if f.size == 0 {
    print("runtime: use of FixAlloc_Alloc before FixAlloc_Init\n")
    throw("runtime: internal error")
  }

  if f.list != nil {
    v := unsafe.Pointer(f.list)
    f.list = f.list.next
    f.inuse += f.size
    if f.zero {
      memclrNoHeapPointers(v, f.size)
    }
    return v
  }
  if uintptr(f.nchunk) < f.size {
    f.chunk = uintptr(persistentalloc(uintptr(f.nalloc), 0, f.stat))
    f.nchunk = f.nalloc
  }

  v := unsafe.Pointer(f.chunk)
  if f.first != nil {
    f.first(f.arg, v)
  }
  f.chunk = f.chunk + f.size
  f.nchunk -= uint32(f.size)
  f.inuse += f.size
  return v
}

鏈式分配器的優點正是它可以復用被釋放的內存，復用內存的基本單位是一個固定大小的內存片，其大小由fixalloc.size決定，在釋放內存時，鏈式分配器會將該內存片作為頭結點添加到空閒內存片鏈表中，代碼如下所示

func (f *fixalloc) free(p unsafe.Pointer) {
  f.inuse -= f.size
  v := (*mlink)(p)
  v.next = f.list
  f.list = v
}

內存組件

go 中的內存分配器主要由msapn，heaparena，mcache，mcentral，mheap這幾個組件構成，它們之間層層作用，管理著整個 go 的堆內存。

mspan

runtime.mspan是 go 內存分配中基本的單位，其結構如下

type mspan struct {
    next *mspan     // next span in list, or nil if none
    prev *mspan     // previous span in list, or nil if none

    startAddr uintptr // address of first byte of span aka s.base()
    npages    uintptr // number of pages in span
    freeindex uintptr

    spanclass             spanClass     // size class and noscan (uint8)
    needzero              uint8         // needs to be zeroed before allocation
    elemsize              uintptr       // computed from sizeclass or from npages
    limit                 uintptr       // end of data in span
    state                 mSpanStateBox // mSpanInUse etc; accessed atomically (get/set methods)

    nelems uintptr // number of object in the span.
    allocCache uint64
    allocCount            uint16        // number of allocated objects
    ...
}

mspan與mspan之間以雙向鏈表的形式通過next和prev進行鏈接，內存地址並不連續。每一個msapn管理著mspan.npages個runtime.pageSize大小的頁內存，通常來說頁的大小就是 8KB，並且由mspan.startAddr記錄著這些頁的起始地址和mspan.limit記錄著已使用內存的末端地址。每一個mspan所存放的元素大小elemsize是固定的，所以能容納的元素數量也是固定的。由於數量固定，對象存放就像是數組一樣分布在mspan中，范圍為[0, nelems]，同時由freeindex記錄著下一個可用於存放對象的索引。mspan總共有三種狀態

mSpanDead，內存已經被釋放
mSpanInUse，被分配到了堆上
mSpanManual，被分配到了用於手動管理內存的部分，比如說棧。

決定著mspan元素大小的是spanClass，spanClass自身是一個uint8類型的整數，高七位存放著表示0-67的 class 值，最後一位用於表示noscan即是否包含指針。

type spanClass uint8

func (sc spanClass) sizeclass() int8 {
  return int8(sc >> 1)
}

func (sc spanClass) noscan() bool {
  return sc&1 != 0
}

它總共有 68 種不同的值，所有值都以打表的形式存放於runtime.sizeclasses.go文件中，在運行時，使用spanClass通過runtime.class_to_size可獲得mspan的對象大小，通過class_to_allocnpages可獲得mspan的頁數。

class	最大對象大小	span 大小	對象數量	尾部浪費	最大內存浪費率	最小對齊
1	8	8192	1024	0	87.50%	8
2	16	8192	512	0	43.75%	16
3	24	8192	341	8	29.24%	8
4	32	8192	256	0	21.88%	32
5	48	8192	170	32	31.52%	16
6	64	8192	128	0	23.44%	64
7	80	8192	102	32	19.07%	16
8	96	8192	85	32	15.95%	32
9	112	8192	73	16	13.56%	16
10	128	8192	64	0	11.72%	128
11	144	8192	56	128	11.82%	16
12	160	8192	51	32	9.73%	32
13	176	8192	46	96	9.59%	16
14	192	8192	42	128	9.25%	64
15	208	8192	39	80	8.12%	16
16	224	8192	36	128	8.15%	32
17	240	8192	34	32	6.62%	16
18	256	8192	32	0	5.86%	256
19	288	8192	28	128	12.16%	32
20	320	8192	25	192	11.80%	64
21	352	8192	23	96	9.88%	32
22	384	8192	21	128	9.51%	128
23	416	8192	19	288	10.71%	32
24	448	8192	18	128	8.37%	64
25	480	8192	17	32	6.82%	32
26	512	8192	16	0	6.05%	512
27	576	8192	14	128	12.33%	64
28	640	8192	12	512	15.48%	128
29	704	8192	11	448	13.93%	64
30	768	8192	10	512	13.94%	256
31	896	8192	9	128	15.52%	128
32	1024	8192	8	0	12.40%	1024
33	1152	8192	7	128	12.41%	128
34	1280	8192	6	512	15.55%	256
35	1408	16384	11	896	14.00%	128
36	1536	8192	5	512	14.00%	512
37	1792	16384	9	256	15.57%	256
38	2048	8192	4	0	12.45%	2048
39	2304	16384	7	256	12.46%	256
40	2688	8192	3	128	15.59%	128
41	3072	24576	8	0	12.47%	1024
42	3200	16384	5	384	6.22%	128
43	3456	24576	7	384	8.83%	128
44	4096	8192	2	0	15.60%	4096
45	4864	24576	5	256	16.65%	256
46	5376	16384	3	256	10.92%	256
47	6144	24576	4	0	12.48%	2048
48	6528	32768	5	128	6.23%	128
49	6784	40960	6	256	4.36%	128
50	6912	49152	7	768	3.37%	256
51	8192	8192	1	0	15.61%	8192
52	9472	57344	6	512	14.28%	256
53	9728	49152	5	512	3.64%	512
54	10240	40960	4	0	4.99%	2048
55	10880	32768	3	128	6.24%	128
56	12288	24576	2	0	11.45%	4096
57	13568	40960	3	256	9.99%	256
58	14336	57344	4	0	5.35%	2048
59	16384	16384	1	0	12.49%	8192
60	18432	73728	4	0	11.11%	2048
61	19072	57344	3	128	3.57%	128
62	20480	40960	2	0	6.87%	4096
63	21760	65536	3	256	6.25%	256
64	24576	24576	1	0	11.45%	8192
65	27264	81920	3	128	10.00%	128
66	28672	57344	2	0	4.91%	4096
67	32768	32768	1	0	12.50%	8192

關於這些值的計算邏輯可以在runtime.mksizeclasses.go的printComment函數中找到，其中的最大內存浪費率的計算公式為

float64((size-prevSize-1)*objects+tailWaste) / float64(spanSize)

例如，當class為 2，其最大內存浪費率為

((16-8-1)*512+0)/8192 = 0.4375

當class值為 0 時，就是專用於分配大於 32KB 以上的大對象所使用的spanClass，基本上一個大對象就會佔用一個mspan。所以，go 的堆內存實際上是由若干個不同固定大小的mspan組成。

heaparena

前面提到了mspan是由若干個頁組成，但mspan只是持有頁的地址引用，並不負責管理這些頁，真正負責管理這些頁內存的是runtime.heaparena。每一個heaparena管理著若干個頁，heaparena的大小由runtime.heapArenaBytes決定，通常是 64MB。bitmap用於標識頁中對應的地址是否存放了對象，zeroedBase就是該heaparena所管理的頁內存的起始地址，並且由spans記錄著每一個頁由哪個mspan使用。

type heapArena struct {
  _ sys.NotInHeap
  bitmap [heapArenaBitmapWords]uintptr
  noMorePtrs [heapArenaBitmapWords / 8]uint8
  spans [pagesPerArena]*mspan
  pageInUse [pagesPerArena / 8]uint8
  pageMarks [pagesPerArena / 8]uint8
  pageSpecials [pagesPerArena / 8]uint8
  checkmarks *checkmarksMap
  zeroedBase uintptr
}

有關於頁與mspan記錄的邏輯可以在mheap.setSpans方法中找到，如下所示

func (h *mheap) setSpans(base, npage uintptr, s *mspan) {
  p := base / pageSize
  ai := arenaIndex(base)
  ha := h.arenas[ai.l1()][ai.l2()]
  for n := uintptr(0); n < npage; n++ {
    i := (p + n) % pagesPerArena
    if i == 0 {
      ai = arenaIndex(base + n*pageSize)
      ha = h.arenas[ai.l1()][ai.l2()]
    }
    ha.spans[i] = s
  }
}

在 go 堆中，是由一個二維的heaparena數組來管理所有的頁內存，參見mheap.arenas字段。

type mheap struct {
  arenas [1 << arenaL1Bits]*[1 << arenaL2Bits]*heapArena
}

在 64 位 windows 平台上，數組的一維是1 << 6，二維是1 << 16，在 64 位 linux 平台上，一維則是 1，二維就是1 << 22。這個由所有heaparena組成的二維數組就構成了 go 運行時的虛擬內存空間，總體來看就如下圖所示。

盡管heaparena之間是相鄰的，但它們所管理的頁內存之間是不連續的。

mcache

mcache對應著runtime.mcache結構體，在並發調度一文中就已經出現過，盡管它的名字叫mcache但它實際上是與處理器 P 綁定的。mcache是每一個處理器 P 上的內存緩存，其中包含了mspan鏈表數組alloc，數組的大小固定為136，剛好是spanClass數量的兩倍，還有微對象緩存tiny，其中tiny指向微對象內存的起始地址，tinyoffset則是空閒內存相對於起始地址的偏移量，tinyAllocs表示分配了多少個微對象。關於棧緩存stackcached，可以前往棧內存分配進行了解。

type mcache struct {
    _ sys.NotInHeap

    nextSample uintptr // trigger heap sample after allocating this many bytes
    scanAlloc  uintptr // bytes of scannable heap allocated
    tiny       uintptr
    tinyoffset uintptr
    tinyAllocs uintptr

    alloc [numSpanClasses]*mspan
    stackcache [_NumStackOrders]stackfreelist
    flushGen atomic.Uint32
}

在剛初始化時，mcache中的alloc中的鏈表都只包含一個空的頭結點runtime.emptymspan，也就是沒有可用內存的mspan。

func allocmcache() *mcache {
  var c *mcache
  systemstack(func() {
    lock(&mheap_.lock)
    c = (*mcache)(mheap_.cachealloc.alloc())
    c.flushGen.Store(mheap_.sweepgen)
    unlock(&mheap_.lock)
  })
  for i := range c.alloc {
    c.alloc[i] = &emptymspan
  }
  c.nextSample = nextSample()
  return c
}

僅當在需要進行內存分配時，才會向mcentral申請一個新的mspan來替換原來的空 span，這部分的工作由mcache.refill方法完成，它唯一的調用入口就是runtime.mallocgc函數，下面是簡化後的代碼。

func (c *mcache) refill(spc spanClass) {
  // Return the current cached span to the central lists.
  s := c.alloc[spc]

  // Get a new cached span from the central lists.
  s = mheap_.central[spc].mcentral.cacheSpan()
  if s == nil {
    throw("out of memory")
  }

  c.scanAlloc = 0

  c.alloc[spc] = s
}

使用mcache的好處在於內存分配時不需要全局鎖，不過當其內存不足時需要訪問mcentral，這時仍然需要加鎖。

mcentral

runtime.mcentral管理著堆中所有存放著小對象的mspan，在mcache申請內存時也是由mcentral進行分配。

type mcentral struct {
    _         sys.NotInHeap
    spanclass spanClass
    partial [2]spanSet
    full    [2]spanSet
}

mcentral的字段很少，spanClass表示所存儲的mspan類型，partial和full是兩個spanSet，前者存放有空閒內存的mspan，後者存放無空閒內存的mspan。mcentral由mheap堆直接進行管理，在運行時總共有 136 個mcentral。

type mheap struct {
    central [numSpanClasses]struct {
        mcentral mcentral
        pad      [(cpu.CacheLinePadSize - unsafe.Sizeof(mcentral{})%cpu.CacheLinePadSize) % cpu.CacheLinePadSize]byte
    }
}

mcentral主要負責兩個工作，當內存足夠時向mcache分配可用的mspan，當內存不足時向mheap申請分配一個新的mspan。向mcache分配mspan的工作由mcentral.cacheSpan方法來完成。首先會在空閒列表的已清掃集合中尋找可用的mspan。

// Try partial swept spans first.
sg := mheap_.sweepgen
if s = c.partialSwept(sg).pop(); s != nil {
    goto havespan
}

如果沒找到，就在空閒列表的未清掃集合中尋找可用的mspan

for ; spanBudget >= 0; spanBudget-- {
    s = c.partialUnswept(sg).pop()
    if s == nil {
        break
    }
    if s, ok := sl.tryAcquire(s); ok {
        s.sweep(true)
        sweep.active.end(sl)
        goto havespan
    }
}

如果仍然沒有找到，就到非空閒列表的未清掃集合去尋找

for ; spanBudget >= 0; spanBudget-- {
    s = c.fullUnswept(sg).pop()
    if s == nil {
        break
    }
    if s, ok := sl.tryAcquire(s); ok {
        s.sweep(true)
        freeIndex := s.nextFreeIndex()
        if freeIndex != s.nelems {
            s.freeindex = freeIndex
            sweep.active.end(sl)
            goto havespan
        }
        c.fullSwept(sg).push(s.mspan)
    }
}

如果最終還是沒有找到，那麼就會由mcentral.grow方法向mheap申請分配一個新的mspan。

s = c.grow()
if s == nil {
    return nil
}

在正常情況下，無論如何都會返回一個可用的mspan。

havespan:
  freeByteBase := s.freeindex &^ (64 - 1)
  whichByte := freeByteBase / 8
  // Init alloc bits cache.
  s.refillAllocCache(whichByte)
  s.allocCache >>= s.freeindex % 64

  return s

對於向mheap申請mspan的過程，實則是調用了mheap.alloc方法，該方法會返回一個新的mspan。

func (c *mcentral) grow() *mspan {
  npages := uintptr(class_to_allocnpages[c.spanclass.sizeclass()])
  size := uintptr(class_to_size[c.spanclass.sizeclass()])

  s := mheap_.alloc(npages, c.spanclass)
  if s == nil {
    return nil
  }

  n := s.divideByElemSize(npages << _PageShift)
  s.limit = s.base() + size*n
  s.initHeapBits(false)
  return s
}

將其初始化好後就可以分配給mcache使用。

mheap

runtimme.mheap是 go 語言堆內存的管理者，在運行時它作為全局變量runtime.mheap_而存在。

var mheap_ mheap

它管理著所有被創建的mspan，所有的mcentral，以及所有的heaparena，還有許多其它的各式各樣的分配器，其簡化後的結構如下所示

type mheap struct {
    _ sys.NotInHeap

    lock mutex

    allspans []*mspan // all spans out there

    pagesInUse         atomic.Uintptr // pages of spans in stats mSpanInUse
    pagesSwept         atomic.Uint64  // pages swept this cycle
    pagesSweptBasis    atomic.Uint64  // pagesSwept to use as the origin of the sweep ratio

    arenas [1 << arenaL1Bits]*[1 << arenaL2Bits]*heapArena
    allArenas []arenaIdx
    sweepArenas []arenaIdx
    central [numSpanClasses]struct {
        mcentral mcentral
        pad      [(cpu.CacheLinePadSize - unsafe.Sizeof(mcentral{})%cpu.CacheLinePadSize) % cpu.CacheLinePadSize]byte
    }

    pages            pageAlloc // page allocation data structure
    spanalloc              fixalloc // allocator for span*
    cachealloc             fixalloc // allocator for mcache*
    specialfinalizeralloc  fixalloc // allocator for specialfinalizer*
    specialprofilealloc    fixalloc // allocator for specialprofile*
    specialReachableAlloc  fixalloc // allocator for specialReachable
    specialPinCounterAlloc fixalloc // allocator for specialPinCounter
    arenaHintAlloc         fixalloc // allocator for arenaHints
}

對於mheap而言，在運行時主要有以下四個工作要做

初始化堆
分配mspan
釋放mspan
堆擴容

下面按照順序來講講這四件事。

初始化

堆的初始化時期位於程序的引導階段，同時也是調度器的初始化階段，其調用順序為

schedinit() -> mallocinit() -> mheap_.init()

在初始化時期，它主要是負責執行各個分配器的初始化工作

func (h *mheap) init() {
  h.spanalloc.init(unsafe.Sizeof(mspan{}), recordspan, unsafe.Pointer(h), &memstats.mspan_sys)
  h.cachealloc.init(unsafe.Sizeof(mcache{}), nil, nil, &memstats.mcache_sys)
  h.specialfinalizeralloc.init(unsafe.Sizeof(specialfinalizer{}), nil, nil, &memstats.other_sys)
  h.specialprofilealloc.init(unsafe.Sizeof(specialprofile{}), nil, nil, &memstats.other_sys)
  h.specialReachableAlloc.init(unsafe.Sizeof(specialReachable{}), nil, nil, &memstats.other_sys)
  h.specialPinCounterAlloc.init(unsafe.Sizeof(specialPinCounter{}), nil, nil, &memstats.other_sys)
  h.arenaHintAlloc.init(unsafe.Sizeof(arenaHint{}), nil, nil, &memstats.other_sys)

  h.spanalloc.zero = false
  for i := range h.central {
    h.central[i].mcentral.init(spanClass(i))
  }

  h.pages.init(&h.lock, &memstats.gcMiscSys, false)
}

其中就包括了負責分配mspan的分配器mheap.spanalloc和負責頁分配的分配器mheap.pages，以及所有mcentral的初始化。

分配

在mheap中，mspan的分配都由mheap.allocSpan方法來完成

func (h *mheap) allocSpan(npages uintptr, typ spanAllocType, spanclass spanClass) (s *mspan)

如果申請分配的內存足夠小，即滿足npages < pageCachePages/4，那麼就會嘗試不加鎖在本地的 P 中的mspan緩存中去獲取一個可用的mspan，倘若 P 的緩存是空的話，還會先進行初始化

// If the cache is empty, refill it.
if c.empty() {
    lock(&h.lock)
    *c = h.pages.allocToCache()
    unlock(&h.lock)
}

然後再從 P 緩存中獲取，由mheap.tryAllocMSpan方法完成。

pp := gp.m.p.ptr()
if !needPhysPageAlign && pp != nil && npages < pageCachePages/4 {
    c := &pp.pcache
    base, scav = c.alloc(npages)
    if base != 0 {
        s = h.tryAllocMSpan()
        if s != nil {
            goto HaveSpan
        }
    }
}

從 P 緩存中獲取mspan的代碼如下，它會嘗試獲取緩存中最後一個mspan。

func (h *mheap) tryAllocMSpan() *mspan {
  pp := getg().m.p.ptr()
  // If we don't have a p or the cache is empty, we can't do
  // anything here.
  if pp == nil || pp.mspancache.len == 0 {
    return nil
  }
  // Pull off the last entry in the cache.
  s := pp.mspancache.buf[pp.mspancache.len-1]
  pp.mspancache.len--
  return s
}

如果申請的內存比較大的話，就會在堆上分配內存，這個過程中需要持有鎖

lock(&h.lock)
if base == 0 {
    // Try to acquire a base address.
    base, scav = h.pages.alloc(npages)
    if base == 0 {
        var ok bool
        growth, ok = h.grow(npages)
        if !ok {
            unlock(&h.lock)
            return nil
        }
        base, scav = h.pages.alloc(npages)
        if base == 0 {
            throw("grew heap, but no adequate free space found")
        }
    }
}
if s == nil {
    // We failed to get an mspan earlier, so grab
    // one now that we have the heap lock.
    s = h.allocMSpanLocked()
}
unlock(&h.lock)

首先會使用pageAlloc.alloc來為其分配足夠的頁內存，如果堆內存不夠的會就由mheap.grow來進行擴容。頁內存分配完成後，就會由鏈式分配mheap.spanalloc分配 64 個mspan到 P 本地的緩存中，64 正好是緩存數組長度的一半，然後再從 P 緩存中返回一個可用的mspan。

func (h *mheap) allocMSpanLocked() *mspan {
  assertLockHeld(&h.lock)

  pp := getg().m.p.ptr()
  if pp == nil {
    // We don't have a p so just do the normal thing.
    return (*mspan)(h.spanalloc.alloc())
  }
  // Refill the cache if necessary.
  if pp.mspancache.len == 0 {
    const refillCount = len(pp.mspancache.buf) / 2
    for i := 0; i < refillCount; i++ {
      pp.mspancache.buf[i] = (*mspan)(h.spanalloc.alloc())
    }
    pp.mspancache.len = refillCount
  }
  // Pull off the last entry in the cache.
  s := pp.mspancache.buf[pp.mspancache.len-1]
  pp.mspancache.len--
  return s
}

根據上面兩種情況，最終都能得到一個可用的mspan，最後將mspan初始化完畢後就可以返回了

HaveSpan:
  h.initSpan(s, typ, spanclass, base, npages)
  return s

釋放

既然mspan是由鏈式分配器的，自然釋放內存的時候也由它來進行釋放。

func (h *mheap) freeSpanLocked(s *mspan, typ spanAllocType) {
  assertLockHeld(&h.lock)
  // Mark the space as free.
  h.pages.free(s.base(), s.npages)
  s.state.set(mSpanDead)
  h.freeMSpanLocked(s)
}

首先會通過頁分配器mheap.pages標記指定的頁內存被釋放，然後將mspan的狀態設置為mSpanDead，最後由mheap.spanalloc分配器釋放mspan。

func (h *mheap) freeMSpanLocked(s *mspan) {
  assertLockHeld(&h.lock)

  pp := getg().m.p.ptr()
  // First try to free the mspan directly to the cache.
  if pp != nil && pp.mspancache.len < len(pp.mspancache.buf) {
    pp.mspancache.buf[pp.mspancache.len] = s
    pp.mspancache.len++
    return
  }
  // Failing that (or if we don't have a p), just free it to
  // the heap.
  h.spanalloc.free(unsafe.Pointer(s))
}

如果 P 緩存未滿的話，會將其放入 P 本地的緩存中繼續使用，否則的話它會被釋放回堆內存。

擴容

heaparena所管理的頁內存空間並非在初期就已經全部申請好了，只有需要用到內存的時候才會去分配。負責給堆內存擴容的是mheap.grow方法，下面是簡化後的代碼。

func (h *mheap) grow(npage uintptr) (uintptr, bool) {
  assertLockHeld(&h.lock)
  ask := alignUp(npage, pallocChunkPages) * pageSize
  totalGrowth := uintptr(0)
  end := h.curArena.base + ask
  nBase := alignUp(end, physPageSize)

  if nBase > h.curArena.end || end < h.curArena.base {
    av, asize := h.sysAlloc(ask, &h.arenaHints, true)
        if uintptr(av) == h.curArena.end {
      h.curArena.end = uintptr(av) + asize
    } else {
      // Switch to the new space.
      h.curArena.base = uintptr(av)
      h.curArena.end = uintptr(av) + asize
    }
    nBase = alignUp(h.curArena.base+ask, physPageSize)
  }
  ...
}

它首先會根據npage計算所需內存並進行對齊，然後判斷當前heaparena是否有足夠的內存，如果不夠的話就會由mheap.sysAlloc為當前heaparena申請更多內存或者分配一個新的heaparena。

func (h *mheap) sysAlloc(n uintptr, hintList **arenaHint, register bool) (v unsafe.Pointer, size uintptr) {
  n = alignUp(n, heapArenaBytes)
  if hintList == &h.arenaHints {
    v = h.arena.alloc(n, heapArenaBytes, &gcController.heapReleased)
    if v != nil {
      size = n
      goto mapped
    }
  }
    ...
}

首先會嘗試使用線性分配器mheap.arena在預分配的內存空間中申請一塊內存，如果失敗就根據hintList來進行擴容，hintList的類型為runtime.arenaHint，它專門記錄了用於heaparena擴容相關的地址信息。

for *hintList != nil {
    hint := *hintList
    p := hint.addr
  v = sysReserve(unsafe.Pointer(p), n)
    if p == uintptr(v) {
        hint.addr = p
        size = n
        break
    }
    if v != nil {
        sysFreeOS(v, n)
    }
    *hintList = hint.next
    h.arenaHintAlloc.free(unsafe.Pointer(hint))
}

內存申請完畢後，再將其更新到arenas二維數組中

for ri := arenaIndex(uintptr(v)); ri <= arenaIndex(uintptr(v)+size-1); ri++ {
    l2 := h.arenas[ri.l1()]
    var r *heapArena
    r = (*heapArena)(h.heapArenaAlloc.alloc(unsafe.Sizeof(*r), goarch.PtrSize, &memstats.gcMiscSys))
    atomic.StorepNoWB(unsafe.Pointer(&l2[ri.l2()]), unsafe.Pointer(r))
}

最後再由頁分配器將這片內存標記為就緒狀態。

// Update the page allocator's structures to make this
// space ready for allocation.
h.pages.grow(v, nBase-v)
totalGrowth += nBase - v

對象分配

go 在為對象分配內存的時候，根據大小劃分為了三個不同的類型：

微對象 - tiny，小於 16B
小對象 - small，小於 32KB
大對象 - large，大於 32KB

根據三種不同的類型，在分配內存的時候會執行不同的邏輯。負責為對象分配內存的函數是runtime.mallocgc，其函數簽名如下

func mallocgc(size uintptr, typ *_type, needzero bool) unsafe.Pointer

它只有三個參數，內存大小，類型，以及一個布爾值用於表示是否需要清空內存。它是所有 go 對象內存分配的入口函數，平時在使用new函數創建指針時同樣也會走入該函數，當內存分配成功後，它返回的指針就是該對象的地址。在mpan部分中提到過，每一個mspan都擁有一個spanClass，spanClass決定了mspan的固定大小，並且 go 將對象從[0, 32KB]的范圍分成了 68 種不同的大小，所以 go 內存由若干個不同的大小固定的mspan鏈表組成。在分配對象內存時，只需按照對象大小計算出對應的spanClass，然後再根據spanClass找到對應的mspan鏈表，最後再從鏈表中尋找可用的mspan，這種分級的做法能較為有效的解決內存碎片的問題。

微對象

所有小於 16B 的非指針微對象會由 P 中的微分配器被分配到同一片連續內存中，在runitme.mcache，由tiny字段記錄了這片內存的基地址。

type mcache struct {
  tiny       uintptr
  tinyoffset uintptr
  tinyAllocs uintptr
}

微對象的大小由runtime.maxTinySize常量來決定，都是 16B，用於存儲微對象的內存塊同樣也是這個大小，一般來說這裡存儲的對象都是一些小字符串，負責分配微對象的部分代碼如下所示。

if size <= maxSmallSize {
    if noscan && size < maxTinySize {
      off := c.tinyoffset
      if off+size <= maxTinySize && c.tiny != 0 {
        x = unsafe.Pointer(c.tiny + off)
        c.tinyoffset = off + size
        c.tinyAllocs++
        mp.mallocing = 0
        releasem(mp)
        return x
      }

      // Allocate a new maxTinySize block.
      span = c.alloc[tinySpanClass]
      v := nextFreeFast(span)
      if v == 0 {
        v, span, shouldhelpgc = c.nextFree(tinySpanClass)
      }
      x = unsafe.Pointer(v)
      (*[2]uint64)(x)[0] = 0
      (*[2]uint64)(x)[1] = 0

      if (size < c.tinyoffset || c.tiny == 0) {
        c.tiny = uintptr(x)
        c.tinyoffset = size
      }
      size = maxTinySize

如果當前的微內存塊還有足夠的空間來容納，就直接使用當前內存塊，即off+size <= maxTinySize。如果不夠的話，就會先嘗試從mcache的 span 緩存中尋找可用的空間，如果也不行的話就會向mcentral申請一個mspan，不管如何最終都會得到一個可用的地址，最後再用新的微對象內存塊替換掉舊的。

小對象

go 語言運行時大部分對象都是位於[16B, 32KB]這個范圍內的小對象，小對象的分配過程最麻煩，但代碼卻是最少，負責小對象分配的部分代碼如下。

var sizeclass uint8
if size <= smallSizeMax-8 {
    sizeclass = size_to_class8[divRoundUp(size, smallSizeDiv)]
} else {
    sizeclass = size_to_class128[divRoundUp(size-smallSizeMax, largeSizeDiv)]
}
size = uintptr(class_to_size[sizeclass])
spc := makeSpanClass(sizeclass, noscan)
span = c.alloc[spc]
v := nextFreeFast(span)
if v == 0 {
    v, span, shouldhelpgc = c.nextFree(spc)
}
x = unsafe.Pointer(v)
if needzero && span.needzero != 0 {
    memclrNoHeapPointers(x, size)
}

首先會根據對象的大小計算出應該使用哪一類的spanClass，然後由runtime.nextFreeFast根據spanClass嘗試去mcache中對應的緩存mspan獲取可用的內存空間。

func nextFreeFast(s *mspan) gclinkptr {
  theBit := sys.TrailingZeros64(s.allocCache) // Is there a free object in the allocCache?
  if theBit < 64 {
    result := s.freeindex + uintptr(theBit)
    if result < s.nelems {
      freeidx := result + 1
      if freeidx%64 == 0 && freeidx != s.nelems {
        return 0
      }
      s.allocCache >>= uint(theBit + 1)
      s.freeindex = freeidx
      s.allocCount++
      return gclinkptr(result*s.elemsize + s.base())
    }
  }
  return 0
}

mspan.allocCache的作用是記錄內存空間是否有對象使用，並且它是按照對象數量來將內存一個個劃分而非按照空間大小來劃分，這相當於是把mspan看了一個對象數組，如下圖所示。

allocCache是一個 64 位數字，每一位對應著一片內存空間，如果某一位為 0 表示有對象使用，如果是 1 的話表示這片內存是空閒的。sys.TrailingZeros64(s.allocCache)的目的就是計算尾隨零的數量，如果結果是 64 的話則表明沒有空閒的內存可以使用，如果有的話再計算得到空閒內存的偏移量加上mspan的基地址然後返回。

當mcache中沒有足夠的空間時，就會再去mcentral中去申請，這部分工作由mcache.nextFree方法來完成

func (c *mcache) nextFree(spc spanClass) (v gclinkptr, s *mspan, shouldhelpgc bool) {
  s = c.alloc[spc]
  shouldhelpgc = false
  freeIndex := s.nextFreeIndex()
  if freeIndex == s.nelems {
    c.refill(spc)
    shouldhelpgc = true
    s = c.alloc[spc]

    freeIndex = s.nextFreeIndex()
  }
  v = gclinkptr(freeIndex*s.elemsize + s.base())
  s.allocCount++
  return
}

其中的mcache.refill會負責向mcentral申請一個可用的mspan。

func (c *mcache) refill(spc spanClass) {
  ...
  s = mheap_.central[spc].mcentral.cacheSpan()
  ...
}

而mcentral.cacheSpan方法會在內存不足時由mcentral.grow來進行擴容，擴容則又會向mheap去申請新的mspan。

func (c *mcentral) grow() *mspan {
  ...
  s := mheap_.alloc(npages, c.spanclass)
  ...
  return s
}

所以最後看來，小對象的內存分配是一級一級往下走的，先是mcache，然後是mcentral，最後是mheap。mcache分配的成本最低，因為它是 P 本地的緩存，分配內存時不需要持有鎖，mcentral其次，直接向mheap申請內存成本最高，因為mheap.alloc方法會競爭整個堆的全局鎖。

大對象

大對象分配最為簡單，如果對象的大小超過了 32KB，就會直接向mheap申請分配一個新的mspan來容納，負責分配大對象的部分代碼如下。

shouldhelpgc = true
span = c.allocLarge(size, noscan)
span.freeindex = 1
span.allocCount = 1
size = span.elemsize
x = unsafe.Pointer(span.base())
if needzero && span.needzero != 0 {
    if noscan {
        delayedZeroing = true
    } else {
        memclrNoHeapPointers(x, size)
    }
}

其中mcache.allocLarge負責向mheap申請大對象的內存空間

func (c *mcache) allocLarge(size uintptr, noscan bool) *mspan {
  ...
  spc := makeSpanClass(0, noscan)
  s := mheap_.alloc(npages, spc)
  ...
  return s
}

從代碼中可以看到的是大對象使用的spanClass值為 0，大對象基本上都是一個對象佔用一個mpan。

其它

內存統計

go 運行時對用戶暴露了一個函數ReadMemStats，可以用於統計運行時的內存情況。

func ReadMemStats(m *MemStats) {
  _ = m.Alloc // nil check test before we switch stacks, see issue 61158
  stopTheWorld(stwReadMemStats)

  systemstack(func() {
    readmemstats_m(m)
  })

  startTheWorld()
}

但是使用它的代價非常大，從代碼中可以看到分析內存情況前需要 STW，而 STW 的時長可能是幾毫秒到幾百毫秒不等，一般只有在調試和問題排查的時候才會使用。runtime.MemStats結構體記錄了有關堆內存，棧內存，和 GC 相關的信息

type MemStats struct {
    //  總體統計
    Alloc uint64
    TotalAlloc uint64
    Sys uint64
    Lookups uint64
    Mallocs uint64
    Frees uint64

    // 堆內存統計
    HeapAlloc uint64
    HeapSys uint64
    HeapIdle uint64
    HeapInuse uint64
    HeapReleased uint64
    HeapObjects uint64

    // 棧內存統計
    StackInuse uint64
    StackSys uint64

    // 內存組件統計
    MSpanInuse uint64
    MSpanSys uint64
    MCacheInuse uint64
    MCacheSys uint64
    BuckHashSys uint64

    // gc相關的統計
    GCSys uint64
    OtherSys uint64
    NextGC uint64
    LastGC uint64
    PauseTotalNs uint64
    PauseNs [256]uint64
    PauseEnd [256]uint64
    NumGC uint32
    NumForcedGC uint32
    GCCPUFraction float64
    EnableGC bool
    DebugGC bool

    BySize [61]struct {
        Size uint32
        Mallocs uint64
        Frees uint64
    }
}

NotInHeap

內存分配器顯然用來分配堆內存的，但堆又被分為了兩部分，一部分是 go 運行時自身所需要的堆內存，另一部分是開放給用戶使用的堆內存。所以在一些結構中可以看到這樣的嵌入字段

_ sys.NotInHeap

表示該類型的內存不會分配在用戶堆上，這種嵌入字段在內存分配組件中尤為常見，比如表示用戶堆的結構體runtime.mheap

type mheap struct {
  _ sys.NotInHeap
}

sys.NotInHeap的真正作用是為了避免內存屏障以提高運行時效率，而用戶堆需要運行 GC 所以需要內存屏障。

memory ​

分配器 ​

線性分配 ​

鏈式分配 ​

內存組件 ​

mspan ​

heaparena ​

mcache ​

mcentral ​

mheap ​

初始化 ​

分配 ​

釋放 ​

擴容 ​

對象分配 ​

微對象 ​

小對象 ​

大對象 ​

其它 ​

內存統計 ​

NotInHeap ​