memory

전통적인 c/c++ 와 달리 Go 는 GC 언어로, 대부분의 경우 메모리 할당과 해제는 Go 가 자동으로 관리합니다. 객체의 메모리를 스택에 할당할지 힙에 할당할지는 컴파일러가 결정하며, 기본적으로 사용자가 메모리 관리에 참여할 필요는 없고 메모리를 사용하는 것만 하면 됩니다. Go 에서 힙 메모리 관리에는 두 가지 주요 구성 요소가 있습니다. 메모리 할당기는 힙 메모리 할당을 담당하고, 가비지 컬렉터는 무용한 힙 메모리를 회수하여 해제합니다. 이 글에서는 주로 메모리 할당기의 작동 방식에 대해 설명합니다. Go 메모리 할당기는 Google 의 TCMalloc 메모리 할당기의 영향을 많이 받았습니다.

할당기

Go 에는 두 가지 메모리 할당기가 있습니다. 하나는 선형 할당기이고, 다른 하나는 연쇄 할당기입니다.

선형 할당

선형 할당기는 runtime.linearAlloc 구조체에 해당하며, 다음과 같습니다.

type linearAlloc struct {
  next   uintptr // next free byte
  mapped uintptr // one byte past end of mapped space
  end    uintptr // end of reserved space

  mapMemory bool // transition memory from Reserved to Ready if true
}

이 할당기는 운영체제에 연속된 메모리 공간을 미리 신청하며, next 는 사용 가능한 메모리 주소를 가리키고, end 는 메모리 공간의 끝 주소를 가리킵니다. 대략적으로 아래 그림과 같이 이해할 수 있습니다.

선형 할당기의 메모리 할당 방식은 매우 이해하기 쉽습니다. 신청한 메모리 크기에 따라 남은 공간이 충분한지 확인하고, 충분하다면 next 필드를 업데이트하고 남은 공간의 시작 주소를 반환합니다. 코드는 다음과 같습니다.

func (l *linearAlloc) alloc(size, align uintptr, sysStat *sysMemStat) unsafe.Pointer {
  p := alignUp(l.next, align)
  if p+size > l.end {
    return nil
  }
  l.next = p + size
  return unsafe.Pointer(p)
}

이 할당 방식의 장점은 빠르고 간단하다는 것이지만, 단점도 매우 명확합니다. 바로 해제된 메모리를 다시 활용할 수 없다는 것입니다. next 필드는 남은 공간 메모리 주소만 가리키기 때문에, 이전에 사용 후 해제된 메모리 공간을 인식할 수 없어 많은 메모리 공간 낭비가 발생합니다. 아래 그림과 같습니다.

따라서 선형 할당은 Go 에서 주요 할당 방식이 아니며, 32 비트 머신에서만 메모리 사전 할당 기능으로 사용됩니다.

연쇄 할당

연쇄 할당기는 runtime.fixalloc 구조체에 해당하며, 연쇄 할당기가 할당하는 메모리는 연속적이지 않고 단방향 링크드 리스트 형태로 존재합니다. 연쇄 할당기는 고정 크기의 메모리 블록으로 구성되며, 각 메모리 블록은 고정 크기의 메모리 조각으로 구성됩니다. 메모리 할당을 수행할 때마다 고정 크기의 메모리 조각을 사용합니다.

type fixalloc struct {
  size   uintptr
  first  func(arg, p unsafe.Pointer) // called first time p is returned
  arg    unsafe.Pointer
  list   *mlink
  chunk  uintptr // use uintptr instead of unsafe.Pointer to avoid write barriers
  nchunk uint32  // bytes remaining in current chunk
  nalloc uint32  // size of new chunks in bytes
  inuse  uintptr // in-use bytes now
  stat   *sysMemStat
  zero   bool // zero allocations
}

type mlink struct {
  _    sys.NotInHeap
  next *mlink
}

필드가 선형 할당기처럼 간단하게 이해하기 쉽지 않으므로, 여기서는 중요한 필드만 간단히 소개합니다.

size, 메모리 할당 시 사용할 메모리 크기를 의미합니다.
list, 재사용 가능한 메모리 조각의 헤드 노드를 가리키며, 각 메모리 조각의 크기는 size 에 의해 결정됩니다.
chunk, 현재 사용 중인 메모리 블록의 유휴 주소를 가리킵니다.
nchunk, 현재 메모리 블록의 남은 사용 가능 바이트 수입니다.
nalloc, 메모리 블록의 크기로, 고정 16KB 입니다.
inuse, 총 사용된 바이트 수입니다.
zero, 메모리 블록 재사용 시 메모리를 지울지 여부를 나타냅니다.

연쇄 할당기는 현재 메모리 블록과 재사용 가능한 메모리 조각의 참조를 보유하며, 각 메모리 블록의 크기는 고정 16KB 로, 초기화 시 설정됩니다.

const _FixAllocChunk = 16 << 10

func (f *fixalloc) init(size uintptr, first func(arg, p unsafe.Pointer), arg unsafe.Pointer, stat *sysMemStat) {
  if size > _FixAllocChunk {
    throw("runtime: fixalloc size too large")
  }
  if min := unsafe.Sizeof(mlink{}); size < min {
    size = min
  }

  f.size = size
  f.first = first
  f.arg = arg
  f.list = nil
  f.chunk = 0
  f.nchunk = 0
  f.nalloc = uint32(_FixAllocChunk / size * size)
  f.inuse = 0
  f.stat = stat
  f.zero = true
}

메모리 블록 분포는 아래 그림과 같으며, 그림의 메모리 블록은 생성 시간 순서대로 배열되어 있지만, 실제로는 주소가 연속적이지 않습니다.

연쇄 할당기가 매번 할당하는 메모리 크기도 고정되어 있으며, fixalloc.size 에 의해 결정됩니다. 할당 시 먼저 재사용 가능한 메모리 블록이 있는지 확인하고, 있으면 재사용 메모리 블록을 우선 사용한 후 현재 메모리 블록을 사용합니다. 현재 메모리 블록의 남은 공간이 충분하지 않으면 새 메모리 블록을 생성합니다. 이 부분 로직은 아래 코드에 해당합니다.

func (f *fixalloc) alloc() unsafe.Pointer {
  if f.size == 0 {
    print("runtime: use of FixAlloc_Alloc before FixAlloc_Init\n")
    throw("runtime: internal error")
  }

  if f.list != nil {
    v := unsafe.Pointer(f.list)
    f.list = f.list.next
    f.inuse += f.size
    if f.zero {
      memclrNoHeapPointers(v, f.size)
    }
    return v
  }
  if uintptr(f.nchunk) < f.size {
    f.chunk = uintptr(persistentalloc(uintptr(f.nalloc), 0, f.stat))
    f.nchunk = f.nalloc
  }

  v := unsafe.Pointer(f.chunk)
  if f.first != nil {
    f.first(f.arg, v)
  }
  f.chunk = f.chunk + f.size
  f.nchunk -= uint32(f.size)
  f.inuse += f.size
  return v
}

연쇄 할당기의 장점은 해제된 메모리를 재사용할 수 있다는 것입니다. 재사용 메모리의 기본 단위는 고정 크기의 메모리 조각으로, 크기는 fixalloc.size 에 의해 결정됩니다. 메모리 해제 시 연쇄 할당기는 해당 메모리 조각을 유휴 메모리 조각 리스트의 헤드 노드로 추가합니다. 코드는 다음과 같습니다.

func (f *fixalloc) free(p unsafe.Pointer) {
  f.inuse -= f.size
  v := (*mlink)(p)
  v.next = f.list
  f.list = v
}

메모리 구성 요소

Go 의 메모리 할당기는 주로 mspan, heaparena, mcache, mcentral, mheap 이 몇 가지 구성 요소로 이루어져 있으며, 이들은 계층적으로 작용하여 전체 Go 힙 메모리를 관리합니다.

mspan

runtime.mspan은 Go 메모리 할당의 기본 단위로, 구조는 다음과 같습니다.

type mspan struct {
    next *mspan     // next span in list, or nil if none
    prev *mspan     // previous span in list, or nil if none

    startAddr uintptr // address of first byte of span aka s.base()
    npages    uintptr // number of pages in span
    freeindex uintptr

    spanclass             spanClass     // size class and noscan (uint8)
    needzero              uint8         // needs to be zeroed before allocation
    elemsize              uintptr       // computed from sizeclass or from npages
    limit                 uintptr       // end of data in span
    state                 mSpanStateBox // mSpanInUse etc; accessed atomically (get/set methods)

    nelems uintptr // number of object in the span.
    allocCache uint64
    allocCount            uint16        // number of allocated objects
    ...
}

mspan 과 mspan 은 next 와 prev 를 통해 양방향 링크드 리스트 형태로 연결되며, 메모리 주소는 연속적이지 않습니다. 각 mspan 은 mspan.npages 개의 runtime.pageSize 크기 페이지 메모리를 관리하며, 일반적으로 페이지 크기는 8KB 입니다. mspan.startAddr 는 이러한 페이지의 시작 주소를 기록하고, mspan.limit 는 사용된 메모리의 끝 주소를 기록합니다. 각 mspan 이 저장하는 요소 크기 elemsize 는 고정되어 있으므로 수용할 수 있는 요소 수도 고정되어 있습니다. 수가 고정되어 있으므로 객체 저장은 배열과 같이 mspan 내에 [0, nelems] 범위로 분포하며, freeindex 가 다음 객체 저장에 사용할 인덱스를 기록합니다. mspan 은 총 세 가지 상태가 있습니다.

mSpanDead, 메모리가 이미 해제됨
mSpanInUse, 힙에 할당됨
mSpanManual, 스택과 같이 수동으로 메모리를 관리하는 부분에 할당됨.

mspan 의 요소 크기를 결정하는 것은 spanClass 입니다. spanClass 는 uint8 유형의 정수로, 상위 7 비트는 0-67 의 class 값을 저장하고, 마지막 비트는 noscan, 즉 포인터 포함 여부를 나타냅니다.

type spanClass uint8

func (sc spanClass) sizeclass() int8 {
  return int8(sc >> 1)
}

func (sc spanClass) noscan() bool {
  return sc&1 != 0
}

총 68 가지 다른 값이 있으며, 모든 값은 runtime.sizeclasses.go 파일에 테이블 형태로 저장되어 있습니다. 런타임에는 spanClass 를 통해 runtime.class_to_size 로 mspan 의 객체 크기를 얻고, class_to_allocnpages 로 mspan 의 페이지 수를 얻을 수 있습니다.

class	최대 객체 크기	span 크기	객체 수	꼬리 낭비	최대 메모리 낭비율	최소 정렬
1	8	8192	1024	0	87.50%	8
2	16	8192	512	0	43.75%	16
3	24	8192	341	8	29.24%	8
4	32	8192	256	0	21.88%	32
5	48	8192	170	32	31.52%	16
6	64	8192	128	0	23.44%	64
7	80	8192	102	32	19.07%	16
8	96	8192	85	32	15.95%	32
9	112	8192	73	16	13.56%	16
10	128	8192	64	0	11.72%	128
11	144	8192	56	128	11.82%	16
12	160	8192	51	32	9.73%	32
13	176	8192	46	96	9.59%	16
14	192	8192	42	128	9.25%	64
15	208	8192	39	80	8.12%	16
16	224	8192	36	128	8.15%	32
17	240	8192	34	32	6.62%	16
18	256	8192	32	0	5.86%	256
19	288	8192	28	128	12.16%	32
20	320	8192	25	192	11.80%	64
21	352	8192	23	96	9.88%	32
22	384	8192	21	128	9.51%	128
23	416	8192	19	288	10.71%	32
24	448	8192	18	128	8.37%	64
25	480	8192	17	32	6.82%	32
26	512	8192	16	0	6.05%	512
27	576	8192	14	128	12.33%	64
28	640	8192	12	512	15.48%	128
29	704	8192	11	448	13.93%	64
30	768	8192	10	512	13.94%	256
31	896	8192	9	128	15.52%	128
32	1024	8192	8	0	12.40%	1024
33	1152	8192	7	128	12.41%	128
34	1280	8192	6	512	15.55%	256
35	1408	16384	11	896	14.00%	128
36	1536	8192	5	512	14.00%	512
37	1792	16384	9	256	15.57%	256
38	2048	8192	4	0	12.45%	2048
39	2304	16384	7	256	12.46%	256
40	2688	8192	3	128	15.59%	128
41	3072	24576	8	0	12.47%	1024
42	3200	16384	5	384	6.22%	128
43	3456	24576	7	384	8.83%	128
44	4096	8192	2	0	15.60%	4096
45	4864	24576	5	256	16.65%	256
46	5376	16384	3	256	10.92%	256
47	6144	24576	4	0	12.48%	2048
48	6528	32768	5	128	6.23%	128
49	6784	40960	6	256	4.36%	128
50	6912	49152	7	768	3.37%	256
51	8192	8192	1	0	15.61%	8192
52	9472	57344	6	512	14.28%	256
53	9728	49152	5	512	3.64%	512
54	10240	40960	4	0	4.99%	2048
55	10880	32768	3	128	6.24%	128
56	12288	24576	2	0	11.45%	4096
57	13568	40960	3	256	9.99%	256
58	14336	57344	4	0	5.35%	2048
59	16384	16384	1	0	12.49%	8192
60	18432	73728	4	0	11.11%	2048
61	19072	57344	3	128	3.57%	128
62	20480	40960	2	0	6.87%	4096
63	21760	65536	3	256	6.25%	256
64	24576	24576	1	0	11.45%	8192
65	27264	81920	3	128	10.00%	128
66	28672	57344	2	0	4.91%	4096
67	32768	32768	1	0	12.50%	8192

이러한 값의 계산 로직은 runtime.mksizeclasses.go 의 printComment 함수에서 찾을 수 있으며, 최대 메모리 낭비율의 계산식은 다음과 같습니다.

float64((size-prevSize-1)*objects+tailWaste) / float64(spanSize)

예를 들어, class 가 2 일 때 최대 메모리 낭비율은 다음과 같습니다.

((16-8-1)*512+0)/8192 = 0.4375

class 값이 0 일 때는 32KB 이상의 대상을 할당하는 데 사용하는 spanClass 로, 기본적으로 큰 객체 하나가 하나의 mspan 을 차지합니다. 따라서 Go 의 힙 메모리는 실제로 고정 크기가 다른 mspan 링크드 리스트로 구성됩니다.

heaparena

앞에서 mspan 이 페이지로 구성된다고 언급했지만, mspan 은 페이지의 주소 참조만 보유할 뿐 이러한 페이지를 관리하지는 않습니다. 실제로 이러한 페이지 메모리를 관리하는 것은 runtime.heaparena 입니다. 각 heaparena 는 페이지를 관리하며, heaparena 크기는 runtime.heapArenaBytes 에 의해 결정되며, 일반적으로 64MB 입니다. bitmap 은 페이지의 해당 주소에 객체가 저장되어 있는지를 식별하며, zeroedBase 는 해당 heaparena 가 관리하는 페이지 메모리의 시작 주소이며, spans 는 각 페이지를 어떤 mspan 이 사용하는지를 기록합니다.

type heapArena struct {
  _ sys.NotInHeap
  bitmap [heapArenaBitmapWords]uintptr
  noMorePtrs [heapArenaBitmapWords / 8]uint8
  spans [pagesPerArena]*mspan
  pageInUse [pagesPerArena / 8]uint8
  pageMarks [pagesPerArena / 8]uint8
  pageSpecials [pagesPerArena / 8]uint8
  checkmarks *checkmarksMap
  zeroedBase uintptr
}

페이지와 mspan 기록에 관한 로직은 mheap.setSpans 메서드에서 찾을 수 있습니다. 다음과 같습니다.

func (h *mheap) setSpans(base, npage uintptr, s *mspan) {
  p := base / pageSize
  ai := arenaIndex(base)
  ha := h.arenas[ai.l1()][ai.l2()]
  for n := uintptr(0); n < npage; n++ {
    i := (p + n) % pagesPerArena
    if i == 0 {
      ai = arenaIndex(base + n*pageSize)
      ha = h.arenas[ai.l1()][ai.l2()]
    }
    ha.spans[i] = s
  }
}

Go 힙에서는 2 차원 heaparena 배열이 모든 페이지 메모리를 관리합니다. mheap.arenas 필드를 참조하세요.

type mheap struct {
  arenas [1 << arenaL1Bits]*[1 << arenaL2Bits]*heapArena
}

64 비트 Windows 플랫폼에서는 배열의 1 차원이 1 << 6, 2 차원이 1 << 16이며, 64 비트 Linux 플랫폼에서는 1 차원이 1, 2 차원은 1 << 22입니다. 모든 heaparena 로 구성된 이 2 차원 배열이 Go 런타임의 가상 메모리 공간을 구성하며, 전체적으로 아래 그림과 같습니다.

heaparena 간은 인접해 있지만, 이들이 관리하는 페이지 메모리는 연속적이지 않습니다.

mcache

mcache 는 runtime.mcache 구조체에 해당하며, 동시성 스케줄링 문서에서 이미 언급되었습니다. 이름은 mcache 이지만 실제로는 프로세서 P 와 바인딩됩니다. mcache 는 각 프로세서 P 의 메모리 캐시로, mspan 링크드 리스트 배열 alloc 이 포함되어 있으며, 배열 크기는 고정 136 으로, spanClass 수의 두 배와 같습니다. 또한 마이크로 객체 캐시 tiny 도 포함되어 있으며, tiny 는 마이크로 객체 메모리의 시작 주소를 가리키고, tinyoffset 은 유휴 메모리의 시작 주소 대비 오프셋이며, tinyAllocs 는 할당된 마이크로 객체 수를 나타냅니다. 스택 캐시 stackcache 에 대해서는 스택 메모리 할당 에서 확인할 수 있습니다.

type mcache struct {
    _ sys.NotInHeap

    nextSample uintptr // trigger heap sample after allocating this many bytes
    scanAlloc  uintptr // bytes of scannable heap allocated
    tiny       uintptr
    tinyoffset uintptr
    tinyAllocs uintptr

    alloc [numSpanClasses]*mspan
    stackcache [_NumStackOrders]stackfreelist
    flushGen atomic.Uint32
}

초기화 시 mcache 의 alloc 에 있는 링크드 리스트는 빈 헤드 노드 runtime.emptymspan 만 포함하며, 즉 사용 가능한 메모리가 없는 mspan 입니다.

func allocmcache() *mcache {
  var c *mcache
  systemstack(func() {
    lock(&mheap_.lock)
    c = (*mcache)(mheap_.cachealloc.alloc())
    c.flushGen.Store(mheap_.sweepgen)
    unlock(&mheap_.lock)
  })
  for i := range c.alloc {
    c.alloc[i] = &emptymspan
  }
  c.nextSample = nextSample()
  return c
}

메모리 할당이 필요할 때만 mcentral 에 새 mspan 을 신청하여 원래 빈 span 을 교체합니다. 이 부분 작업은 mcache.refill 메서드가 완료하며, 유일한 호출 진입점은 runtime.mallocgc 함수입니다. 아래는 단순화된 코드입니다.

func (c *mcache) refill(spc spanClass) {
  // Return the current cached span to the central lists.
  s := c.alloc[spc]

  // Get a new cached span from the central lists.
  s = mheap_.central[spc].mcentral.cacheSpan()
  if s == nil {
    throw("out of memory")
  }

  c.scanAlloc = 0

  c.alloc[spc] = s
}

mcache 를 사용하는 장점은 메모리 할당 시 전역 잠금이 필요하지 않다는 것이지만, 메모리 부족 시 mcentral 에 액세스해야 하며, 이때는 여전히 잠금이 필요합니다.

mcentral

runtime.mcentral 은 힙의 모든 작은 객체를 저장하는 mspan 을 관리하며, mcache 가 메모리를 신청할 때도 mcentral 이 할당합니다.

type mcentral struct {
    _         sys.NotInHeap
    spanclass spanClass
    partial [2]spanSet
    full    [2]spanSet
}

mcentral 의 필드는 적으며, spanClass 는 저장된 mspan 유형을 나타내고, partial 과 full 은 두 개의 spanSet 으로, 전자는 유휴 메모리가 있는 mspan 을 저장하고, 후자는 유휴 메모리가 없는 mspan 을 저장합니다. mcentral 은 mheap 힙이 직접 관리하며, 런타임에는 총 136 개의 mcentral 이 있습니다.

type mheap struct {
    central [numSpanClasses]struct {
        mcentral mcentral
        pad      [(cpu.CacheLinePadSize - unsafe.Sizeof(mcentral{})%cpu.CacheLinePadSize) % cpu.CacheLinePadSize]byte
    }
}

mcentral 은 주로 두 가지 작업을 담당합니다. 메모리가 충분할 때 mcache 에 사용 가능한 mspan 을 할당하고, 메모리 부족 시 mheap 에 새 mspan 할당을 신청합니다. mcache 에 mspan 을 할당하는 작업은 mcentral.cacheSpan 메서드가 완료합니다. 먼저 유휴 리스트의 스윕된 집합에서 사용 가능한 mspan 을 찾습니다.

// Try partial swept spans first.
sg := mheap_.sweepgen
if s = c.partialSwept(sg).pop(); s != nil {
    goto havespan
}

찾지 못하면 유휴 리스트의 미스윕된 집합에서 사용 가능한 mspan 을 찾습니다.

for ; spanBudget >= 0; spanBudget-- {
    s = c.partialUnswept(sg).pop()
    if s == nil {
        break
    }
    if s, ok := sl.tryAcquire(s); ok {
        s.sweep(true)
        sweep.active.end(sl)
        goto havespan
    }
}

그래도 찾지 못하면 비유휴 리스트의 미스윕된 집합을 찾습니다.

for ; spanBudget >= 0; spanBudget-- {
    s = c.fullUnswept(sg).pop()
    if s == nil {
        break
    }
    if s, ok := sl.tryAcquire(s); ok {
        s.sweep(true)
        freeIndex := s.nextFreeIndex()
        if freeIndex != s.nelems {
            s.freeindex = freeIndex
            sweep.active.end(sl)
            goto havespan
        }
        c.fullSwept(sg).push(s.mspan)
    }
}

최종적으로도 찾지 못하면 mcentral.grow 메서드가 mheap 에 새 mspan 할당을 신청합니다.

s = c.grow()
if s == nil {
    return nil
}

정상적인 상황에서는 어떻게든 사용 가능한 mspan 을 반환합니다.

havespan:
  freeByteBase := s.freeindex &^ (64 - 1)
  whichByte := freeByteBase / 8
  // Init alloc bits cache.
  s.refillAllocCache(whichByte)
  s.allocCache >>= s.freeindex % 64

  return s

mheap 에 mspan 을 신청하는 과정은 실제로 mheap.alloc 메서드를 호출하는 것이며, 이 메서드는 새 mspan 을 반환합니다.

func (c *mcentral) grow() *mspan {
  npages := uintptr(class_to_allocnpages[c.spanclass.sizeclass()])
  size := uintptr(class_to_size[c.spanclass.sizeclass()])

  s := mheap_.alloc(npages, c.spanclass)
  if s == nil {
    return nil
  }

  n := s.divideByElemSize(npages << _PageShift)
  s.limit = s.base() + size*n
  s.initHeapBits(false)
  return s
}

초기화 완료 후 mcache 에 할당하여 사용할 수 있습니다.

mheap

runtime.mheap 는 Go 언어 힙 메모리의 관리자이며, 런타임에는 전역 변수 runtime.mheap_ 로 존재합니다.

var mheap_ mheap

이는 생성된 모든 mspan, 모든 mcentral, 모든 heaparena 및 기타 다양한 할당기를 관리하며, 단순화된 구조는 다음과 같습니다.

type mheap struct {
    _ sys.NotInHeap

    lock mutex

    allspans []*mspan // all spans out there

    pagesInUse         atomic.Uintptr // pages of spans in stats mSpanInUse
    pagesSwept         atomic.Uint64  // pages swept this cycle
    pagesSweptBasis    atomic.Uint64  // pagesSwept to use as the origin of the sweep ratio

    arenas [1 << arenaL1Bits]*[1 << arenaL2Bits]*heapArena
    allArenas []arenaIdx
    sweepArenas []arenaIdx
    central [numSpanClasses]struct {
        mcentral mcentral
        pad      [(cpu.CacheLinePadSize - unsafe.Sizeof(mcentral{})%cpu.CacheLinePadSize) % cpu.CacheLinePadSize]byte
    }

    pages            pageAlloc // page allocation data structure
    spanalloc              fixalloc // allocator for span*
    cachealloc             fixalloc // allocator for mcache*
    specialfinalizeralloc  fixalloc // allocator for specialfinalizer*
    specialprofilealloc    fixalloc // allocator for specialprofile*
    specialReachableAlloc  fixalloc // allocator for specialReachable
    specialPinCounterAlloc fixalloc // allocator for specialPinCounter
    arenaHintAlloc         fixalloc // allocator for arenaHints
}

mheap 의 경우 런타임에 다음 네 가지 작업을 수행해야 합니다.

힙 초기화
mspan 할당
mspan 해제
힙 확장

아래에서 순서대로 이 네 가지 일에 대해 설명하겠습니다.

초기화

힙의 초기화 시기는 프로그램의 부트 단계이며, 동시에 스케줄러의 초기화 단계이기도 합니다. 호출 순서는 다음과 같습니다.

schedinit() -> mallocinit() -> mheap_.init()

초기화 시 주로 각 할당기의 초기화 작업을 수행합니다.

func (h *mheap) init() {
  h.spanalloc.init(unsafe.Sizeof(mspan{}), recordspan, unsafe.Pointer(h), &memstats.mspan_sys)
  h.cachealloc.init(unsafe.Sizeof(mcache{}), nil, nil, &memstats.mcache_sys)
  h.specialfinalizeralloc.init(unsafe.Sizeof(specialfinalizer{}), nil, nil, &memstats.other_sys)
  h.specialprofilealloc.init(unsafe.Sizeof(specialprofile{}), nil, nil, &memstats.other_sys)
  h.specialReachableAlloc.init(unsafe.Sizeof(specialReachable{}), nil, nil, &memstats.other_sys)
  h.specialPinCounterAlloc.init(unsafe.Sizeof(specialPinCounter{}), nil, nil, &memstats.other_sys)
  h.arenaHintAlloc.init(unsafe.Sizeof(arenaHint{}), nil, nil, &memstats.other_sys)

  h.spanalloc.zero = false
  for i := range h.central {
    h.central[i].mcentral.init(spanClass(i))
  }

  h.pages.init(&h.lock, &memstats.gcMiscSys, false)
}

여기에는 mspan 을 할당하는 할당기 mheap.spanalloc 과 페이지 할당을 담당하는 할당기 mheap.pages, 그리고 모든 mcentral 의 초기화가 포함됩니다.

할당

mheap 에서 mspan 할당은 모두 mheap.allocSpan 메서드가 완료합니다.

func (h *mheap) allocSpan(npages uintptr, typ spanAllocType, spanclass spanClass) (s *mspan)

신청한 메모리가 충분히 작아 npages < pageCachePages/4 를 만족하면 로컬 P 의 mspan 캐시에서 잠금 없이 사용 가능한 mspan 을 가져오려고 시도합니다. P 캐시가 비어 있으면 먼저 초기화합니다.

// If the cache is empty, refill it.
if c.empty() {
    lock(&h.lock)
    *c = h.pages.allocToCache()
    unlock(&h.lock)
}

그런 다음 P 캐시에서 가져옵니다. mheap.tryAllocMSpan 메서드가 완료합니다.

pp := gp.m.p.ptr()
if !needPhysPageAlign && pp != nil && npages < pageCachePages/4 {
    c := &pp.pcache
    base, scav = c.alloc(npages)
    if base != 0 {
        s = h.tryAllocMSpan()
        if s != nil {
            goto HaveSpan
        }
    }
}

P 캐시에서 mspan 을 가져오는 코드는 다음과 같습니다. 캐시의 마지막 mspan 을 가져오려고 시도합니다.

func (h *mheap) tryAllocMSpan() *mspan {
  pp := getg().m.p.ptr()
  // If we don't have a p or the cache is empty, we can't do
  // anything here.
  if pp == nil || pp.mspancache.len == 0 {
    return nil
  }
  // Pull off the last entry in the cache.
  s := pp.mspancache.buf[pp.mspancache.len-1]
  pp.mspancache.len--
  return s
}

신청한 메모리가 크다면 힙에서 메모리를 할당하며, 이 과정에서 잠금을 보유해야 합니다.

lock(&h.lock)
if base == 0 {
    // Try to acquire a base address.
    base, scav = h.pages.alloc(npages)
    if base == 0 {
        var ok bool
        growth, ok = h.grow(npages)
        if !ok {
            unlock(&h.lock)
            return nil
        }
        base, scav = h.pages.alloc(npages)
        if base == 0 {
            throw("grew heap, but no adequate free space found")
        }
    }
}
if s == nil {
    // We failed to get an mspan earlier, so grab
    // one now that we have the heap lock.
    s = h.allocMSpanLocked()
}
unlock(&h.lock)

먼저 pageAlloc.alloc 을 사용하여 충분한 페이지 메모리를 할당합니다. 힙 메모리가 부족하면 mheap.grow 가 확장을 수행합니다. 페이지 메모리 할당 완료 후 연쇄 할당기 mheap.spanalloc 이 64 개의 mspan 을 P 로컬 캐시에 할당합니다. 64 는 캐시 배열 길이의 절반이며, 그런 다음 P 캐시에서 사용 가능한 mspan 을 반환합니다.

func (h *mheap) allocMSpanLocked() *mspan {
  assertLockHeld(&h.lock)

  pp := getg().m.p.ptr()
  if pp == nil {
    // We don't have a p so just do the normal thing.
    return (*mspan)(h.spanalloc.alloc())
  }
  // Refill the cache if necessary.
  if pp.mspancache.len == 0 {
    const refillCount = len(pp.mspancache.buf) / 2
    for i := 0; i < refillCount; i++ {
      pp.mspancache.buf[i] = (*mspan)(h.spanalloc.alloc())
    }
    pp.mspancache.len = refillCount
  }
  // Pull off the last entry in the cache.
  s := pp.mspancache.buf[pp.mspancache.len-1]
  pp.mspancache.len--
  return s
}

위 두 가지 상황에 따라 최종적으로 사용 가능한 mspan 을 얻을 수 있으며, 마지막으로 mspan 을 초기화 완료 후 반환할 수 있습니다.

HaveSpan:
  h.initSpan(s, typ, spanclass, base, npages)
  return s
}

해제

mspan 이 연쇄 할당기로 할당되었으므로, 메모리 해제 시에도 자연스럽게 연쇄 할당기가 해제합니다.

func (h *mheap) freeSpanLocked(s *mspan, typ spanAllocType) {
  assertLockHeld(&h.lock)
  // Mark the space as free.
  h.pages.free(s.base(), s.npages)
  s.state.set(mSpanDead)
  h.freeMSpanLocked(s)
}

먼저 페이지 할당기 mheap.pages 를 통해 지정된 페이지 메모리가 해제되었음을 표시한 후 mspan 상태를 mSpanDead 로 설정하고, 마지막으로 mheap.spanalloc 할당기가 mspan 을 해제합니다.

func (h *mheap) freeMSpanLocked(s *mspan) {
  assertLockHeld(&h.lock)

  pp := getg().m.p.ptr()
  // First try to free the mspan directly to the cache.
  if pp != nil && pp.mspancache.len < len(pp.mspancache.buf) {
    pp.mspancache.buf[pp.mspancache.len] = s
    pp.mspancache.len++
    return
  }
  // Failing that (or if we don't have a p), just free it to
  // the heap.
  h.spanalloc.free(unsafe.Pointer(s))
}

P 캐시가 가득 차지 않았다면 P 로컬 캐시에 넣어 계속 사용하고, 그렇지 않으면 힙 메모리로 해제됩니다.

확장

heaparena 가 관리하는 페이지 메모리 공간은 초기에 모두 신청된 것이 아니라, 메모리가 필요할 때만 할당됩니다. 힙 메모리 확장을 담당하는 것은 mheap.grow 메서드이며, 아래는 단순화된 코드입니다.

func (h *mheap) grow(npage uintptr) (uintptr, bool) {
  assertLockHeld(&h.lock)
  ask := alignUp(npage, pallocChunkPages) * pageSize
  totalGrowth := uintptr(0)
  end := h.curArena.base + ask
  nBase := alignUp(end, physPageSize)

  if nBase > h.curArena.end || end < h.curArena.base {
    av, asize := h.sysAlloc(ask, &h.arenaHints, true)
        if uintptr(av) == h.curArena.end {
      h.curArena.end = uintptr(av) + asize
    } else {
      // Switch to the new space.
      h.curArena.base = uintptr(av)
      h.curArena.end = uintptr(av) + asize
    }
    nBase = alignUp(h.curArena.base+ask, physPageSize)
  }
  ...
}

먼저 npage 에 따라所需 메모리를 계산하고 정렬한 후, 현재 heaparena 에 충분한 메모리가 있는지 판단합니다. 충분하지 않으면 mheap.sysAlloc 이 현재 heaparena 에 더 많은 메모리를 신청하거나 새 heaparena 를 할당합니다.

func (h *mheap) sysAlloc(n uintptr, hintList **arenaHint, register bool) (v unsafe.Pointer, size uintptr) {
  n = alignUp(n, heapArenaBytes)
  if hintList == &h.arenaHints {
    v = h.arena.alloc(n, heapArenaBytes, &gcController.heapReleased)
    if v != nil {
      size = n
      goto mapped
    }
  }
    ...
}

먼저 선형 할당기 mheap.arena 를 사용하여 사전 할당된 메모리 공간에서 메모리를 신청하려고 시도하고, 실패하면 hintList 를 따라 확장합니다. hintList 의 유형은 runtime.arenaHint 로, heaparena 확장과 관련된 주소 정보를 기록합니다.

for *hintList != nil {
    hint := *hintList
    p := hint.addr
  v = sysReserve(unsafe.Pointer(p), n)
    if p == uintptr(v) {
        hint.addr = p
        size = n
        break
    }
    if v != nil {
        sysFreeOS(v, n)
    }
    *hintList = hint.next
    h.arenaHintAlloc.free(unsafe.Pointer(hint))
}

메모리 신청 완료 후 arenas 2 차원 배열에 업데이트합니다.

for ri := arenaIndex(uintptr(v)); ri <= arenaIndex(uintptr(v)+size-1); ri++ {
    l2 := h.arenas[ri.l1()]
    var r *heapArena
    r = (*heapArena)(h.heapArenaAlloc.alloc(unsafe.Sizeof(*r), goarch.PtrSize, &memstats.gcMiscSys))
    atomic.StorepNoWB(unsafe.Pointer(&l2[ri.l2()]), unsafe.Pointer(r))
}

마지막으로 페이지 할당기가 이 메모리를 준비 상태로 표시합니다.

// Update the page allocator's structures to make this
// space ready for allocation.
h.pages.grow(v, nBase-v)
totalGrowth += nBase - v

객체 할당

Go 는 객체에 메모리를 할당할 때 크기에 따라 세 가지 다른 유형으로 나눕니다:

마이크로 객체 - tiny, 16B 미만
작은 객체 - small, 32KB 미만
큰 객체 - large, 32KB 초과

세 가지 다른 유형에 따라 메모리 할당 시 다른 로직을 실행합니다. 객체에 메모리를 할당하는 함수는 runtime.mallocgc 로, 함수 서명은 다음과 같습니다.

func mallocgc(size uintptr, typ *_type, needzero bool) unsafe.Pointer

세 가지 매개변수만 있습니다: 메모리 크기, 유형, 그리고 메모리를 지울지 여부를 나타내는 불리언 값입니다. 이는 모든 Go 객체 메모리 할당의 진입점 함수로, 평소 new 함수로 포인터를 생성할 때도 이 함수로 진입합니다. 메모리 할당 성공 후 반환하는 포인터가 해당 객체의 주소입니다. mspan 부분에서 언급했듯이, 각 mspan 은 spanClass 를 가지며, spanClass 는 mspan 의 고정 크기를 결정합니다. Go 는 객체를 [0, 32KB] 범위를 68 가지 다른 크기로 나누었으므로, Go 메모리는 고정 크기가 다른 mspan 링크드 리스트로 구성됩니다. 객체 메모리 할당 시 객체 크기에 따라 해당 spanClass 를 계산한 후 spanClass 에 따라 해당 mspan 링크드 리스트를 찾고, 마지막으로 링크드 리스트에서 사용 가능한 mspan 을 찾으면 됩니다. 이러한 계층적 접근 방식은 메모리 단편화 문제를 효과적으로 해결할 수 있습니다.

마이크로 객체

16B 미만의 모든 비포인터 마이크로 객체는 P 의 마이크로 할당기를 통해 동일한 연속 메모리에 할당됩니다. runtime.mcache 에서 tiny 필드가 이 메모리의 기본 주소를 기록합니다.

type mcache struct {
  tiny       uintptr
  tinyoffset uintptr
  tinyAllocs uintptr
}

마이크로 객체의 크기는 runtime.maxTinySize 상수에 의해 결정되며, 모두 16B 로, 마이크로 객체 저장에 사용되는 메모리 블록도 이 크기입니다. 일반적으로 여기에 저장되는 객체는 작은 문자열 등입니다. 마이크로 객체 할당을 담당하는 부분 코드는 다음과 같습니다.

if size <= maxSmallSize {
    if noscan && size < maxTinySize {
      off := c.tinyoffset
      if off+size <= maxTinySize && c.tiny != 0 {
        x = unsafe.Pointer(c.tiny + off)
        c.tinyoffset = off + size
        c.tinyAllocs++
        mp.mallocing = 0
        releasem(mp)
        return x
      }

      // Allocate a new maxTinySize block.
      span = c.alloc[tinySpanClass]
      v := nextFreeFast(span)
      if v == 0 {
        v, span, shouldhelpgc = c.nextFree(tinySpanClass)
      }
      x = unsafe.Pointer(v)
      (*[2]uint64)(x)[0] = 0
      (*[2]uint64)(x)[1] = 0

      if (size < c.tinyoffset || c.tiny == 0) {
        c.tiny = uintptr(x)
        c.tinyoffset = size
      }
      size = maxTinySize

현재 마이크로 메모리 블록에 수용할 충분한 공간이 있다면 바로 현재 메모리 블록을 사용합니다. 즉 off+size <= maxTinySize입니다. 충분하지 않으면 먼저 mcache 의 span 캐시에서 사용 가능한 공간을 찾으려고 시도하고, 그렇지 않으면 mcentral 에 mspan 을 신청합니다. 어떻게든 최종적으로 사용 가능한 주소를 얻은 후 새 마이크로 객체 메모리 블록으로 이전 블록을 교체합니다.

작은 객체

Go 언어 런타임의 대부분 객체는 [16B, 32KB] 범위 내의 작은 객체입니다. 작은 객체 할당 과정이 가장 번거롭지만 코드는 가장 적습니다. 작은 객체 할당을 담당하는 부분 코드는 다음과 같습니다.

var sizeclass uint8
if size <= smallSizeMax-8 {
    sizeclass = size_to_class8[divRoundUp(size, smallSizeDiv)]
} else {
    sizeclass = size_to_class128[divRoundUp(size-smallSizeMax, largeSizeDiv)]
}
size = uintptr(class_to_size[sizeclass])
spc := makeSpanClass(sizeclass, noscan)
span = c.alloc[spc]
v := nextFreeFast(span)
if v == 0 {
    v, span, shouldhelpgc = c.nextFree(spc)
}
x = unsafe.Pointer(v)
if needzero && span.needzero != 0 {
    memclrNoHeapPointers(x, size)
}

먼저 객체 크기에 따라 어떤 spanClass 를 사용해야 하는지 계산한 후 runtime.nextFreeFast 가 spanClass 에 따라 mcache 에서 해당 캐시 mspan 을 가져와 사용 가능한 메모리 공간을 얻습니다.

func nextFreeFast(s *mspan) gclinkptr {
  theBit := sys.TrailingZeros64(s.allocCache) // Is there a free object in the allocCache?
  if theBit < 64 {
    result := s.freeindex + uintptr(theBit)
    if result < s.nelems {
      freeidx := result + 1
      if freeidx%64 == 0 && freeidx != s.nelems {
        return 0
      }
      s.allocCache >>= uint(theBit + 1)
      s.freeindex = freeidx
      s.allocCount++
      return gclinkptr(result*s.elemsize + s.base())
    }
  }
  return 0
}

mspan.allocCache 의 역할은 메모리 공간에 객체가 사용되었는지를 기록하는 것으로, 공간 크기가 아닌 객체 수에 따라 메모리를 하나씩 나누므로, 이는 mspan 을 객체 배열로 본 것과 같습니다. 아래 그림과 같습니다.

allocCache 는 64 비트 숫자로, 각 비트는 메모리 공간에 해당하며, 어떤 비트가 0 이면 해당 메모리에 객체가 사용되었음을 나타내고, 1 이면 해당 메모리가 유휴임을 나타냅니다. sys.TrailingZeros64(s.allocCache)의 목적은 후행 0 의 수를 계산하는 것으로, 결과가 64 이면 사용 가능한 메모리가 없음을 나타내고, 있으면 유휴 메모리의 오프셋을 계산하여 mspan 의 기본 주소를 더한 후 반환합니다.

mcache 에 충분한 공간이 없으면 mcentral 에 신청합니다. 이 부분 작업은 mcache.nextFree 메서드가 완료합니다.

func (c *mcache) nextFree(spc spanClass) (v gclinkptr, s *mspan, shouldhelpgc bool) {
  s = c.alloc[spc]
  shouldhelpgc = false
  freeIndex := s.nextFreeIndex()
  if freeIndex == s.nelems {
    c.refill(spc)
    shouldhelpgc = true
    s = c.alloc[spc]

    freeIndex = s.nextFreeIndex()
  }
  v = gclinkptr(freeIndex*s.elemsize + s.base())
  s.allocCount++
  return
}

그중 mcache.refill 은 mcentral 에 사용 가능한 mspan 을 신청합니다.

func (c *mcache) refill(spc spanClass) {
  ...
  s = mheap_.central[spc].mcentral.cacheSpan()
  ...
}

mcentral.cacheSpan 메서드는 메모리 부족 시 mcentral.grow 가 확장을 수행합니다. 확장은 다시 mheap 에 새 mspan 을 신청합니다.

func (c *mcentral) grow() *mspan {
  ...
  s := mheap_.alloc(npages, c.spanclass)
  ...
  return s
}

따라서 최종적으로 작은 객체 메모리 할당은 계층적으로 이루어지며, 먼저 mcache, 그 다음 mcentral, 마지막으로 mheap 입니다. mcache 할당 비용이 가장 낮으며, P 로컬 캐시이므로 메모리 할당 시 잠금을 보유할 필요가 없고, mcentral 이 그 다음이며, mheap 에 직접 신청하는 비용이 가장 높습니다. mheap.alloc 메서드는 힙 전역 잠금을 경쟁하기 때문입니다.

큰 객체

큰 객체 할당이 가장 간단합니다. 객체 크기가 32KB 를 초과하면 바로 mheap 에 새 mspan 할당을 신청하여 수용합니다. 큰 객체 할당을 담당하는 부분 코드는 다음과 같습니다.

shouldhelpgc = true
span = c.allocLarge(size, noscan)
span.freeindex = 1
span.allocCount = 1
size = span.elemsize
x = unsafe.Pointer(span.base())
if needzero && span.needzero != 0 {
    if noscan {
        delayedZeroing = true
    } else {
        memclrNoHeapPointers(x, size)
    }
}

그중 mcache.allocLarge 는 mheap 에 큰 객체 메모리 공간을 신청합니다.

func (c *mcache) allocLarge(size uintptr, noscan bool) *mspan {
  ...
  spc := makeSpanClass(0, noscan)
  s := mheap_.alloc(npages, spc)
  ...
  return s
}

코드에서 볼 수 있듯이 큰 객체가 사용하는 spanClass 값은 0 이며, 큰 객체는 기본적으로 하나의 객체가 하나의 mspan 을 차지합니다.

기타

메모리 통계

Go 런타임은 사용자에게 ReadMemStats 함수를 노출하여 런타임의 메모리 상황을 통계할 수 있습니다.

func ReadMemStats(m *MemStats) {
  _ = m.Alloc // nil check test before we switch stacks, see issue 61158
  stopTheWorld(stwReadMemStats)

  systemstack(func() {
    readmemstats_m(m)
  })

  startTheWorld()
}

하지만 이를 사용하는 대가는 매우 큽니다. 코드에서 볼 수 있듯이 메모리 상황 분석 전 STW 가 필요하며, STW 시간은 몇 밀리초에서几百 밀리초까지 다를 수 있으므로 일반적으로 디버깅 및 문제排查 시에만 사용합니다. runtime.MemStats 구조체는 힙 메모리, 스택 메모리, GC 와 관련된 정보를 기록합니다.

type MemStats struct {
    // 总体统计
    Alloc uint64
    TotalAlloc uint64
    Sys uint64
    Lookups uint64
    Mallocs uint64
    Frees uint64

    // 堆内存统计
    HeapAlloc uint64
    HeapSys uint64
    HeapIdle uint64
    HeapInuse uint64
    HeapReleased uint64
    HeapObjects uint64

    // 栈内存统计
    StackInuse uint64
    StackSys uint64

    // 内存组件统计
    MSpanInuse uint64
    MSpanSys uint64
    MCacheInuse uint64
    MCacheSys uint64
    BuckHashSys uint64

    // gc 相关的统计
    GCSys uint64
    OtherSys uint64
    NextGC uint64
    LastGC uint64
    PauseTotalNs uint64
    PauseNs [256]uint64
    PauseEnd [256]uint64
    NumGC uint32
    NumForcedGC uint32
    GCCPUFraction float64
    EnableGC bool
    DebugGC bool

    BySize [61]struct {
        Size uint32
        Mallocs uint64
        Frees uint64
    }
}

NotInHeap

메모리 할당기는 분명히 힙 메모리를 할당하는 데 사용되지만, 힙은 두 부분으로 나뉩니다. 하나는 Go 런타임 자체에 필요한 힙 메모리이고, 다른 하나는 사용자에게 개방된 힙 메모리입니다. 따라서 일부 구조체에서 다음과 같은 임베디드 필드를 볼 수 있습니다.

_ sys.NotInHeap

이는 해당 유형의 메모리가 사용자 힙에 할당되지 않음을 나타내며, 이러한 임베디드 필드는 메모리 할당 구성 요소에서 특히 흔합니다. 예를 들어 사용자 힙을 나타내는 구조체 runtime.mheap 입니다.

type mheap struct {
  _ sys.NotInHeap
}

sys.NotInHeap 의 실제 역할은 메모리 배리어를 피해 런타임 효율성을 높이기 위한 것이며, 사용자 힙은 GC 를 실행해야 하므로 메모리 배리어가 필요합니다.

memory ​

할당기 ​

선형 할당 ​

연쇄 할당 ​

메모리 구성 요소 ​

mspan ​

heaparena ​

mcache ​

mcentral ​

mheap ​

초기화 ​

할당 ​

해제 ​

확장 ​

객체 할당 ​

마이크로 객체 ​

작은 객체 ​

큰 객체 ​

기타 ​

메모리 통계 ​

NotInHeap ​