hi,你好!欢迎访问本站!登录
本站由简数采集腾讯云宝塔系统阿里云强势驱动
当前位置:首页 - 文章 - 后端开发 - 正文 看Cosplay古风插画小姐姐,合集图集打包下载:炫龙网 · 炫龙图库

golang中gc完成道理引见_后端开发

2019-11-29后端开发ki4网26°c
A+ A-

Golang从1.5入手下手引入了三色GC, 经由屡次革新, 当前的1.9版本的GC停留时刻已能够做到极短,停留时刻的削减意味着"最大相应时刻"的收缩, 这也让go更合适编写收集服务顺序。

这篇文章将经由历程剖析golang的源代码来解说go中的三色GC的完成道理。

基本观点

内存组织

go在顺序启动时会分派一块假造内存地点是一连的内存, 组织以下:

这一块内存分为了3个地区, 在X64上大小离别是512M, 16G和512G, 它们的作用以下:

arena

arena地区就是我们一般说的heap, go从heap分派的内存都在这个地区中.

bitmap

bitmap地区用于示意arena地区中哪些地点保留了对象, 而且对象中哪些地点包含了指针。

bitmap地区中一个byte(8 bit)对应了arena地区中的四个指针大小的内存, 也就是2 bit对应一个指针大小的内存。

所以bitmap地区的大小是 512GB / 指针大小(8 byte) / 4 = 16GB。

bitmap地区中的一个byte对应arena地区的四个指针大小的内存的组织以下,每一个指针大小的内存都邑有两个bit离别示意是不是应当继承扫描和是不是包含指针:

bitmap中的byte和arena的对应关联从末端入手下手, 也就是跟着内存分派会向双方扩大:

spans

spans地区用于示意arena区中的某一页(Page)属于哪一个span, 什么是span将在下面引见。

spans地区中一个指针(8 byte)对应了arena地区中的一页(在go中一页=8KB)。

所以spans的大小是 512GB / 页大小(8KB) * 指针大小(8 byte) = 512MB。

spans地区的一个指针对应arena地区的一页的组织以下, 和bitmap不一样的是对应关联会从开头入手下手:

什么时刻从Heap分派对象

许多解说go的文章和书本中都提到过, go会自动肯定哪些对象应当放在栈上, 哪些对象应当放在堆上。

简朴的来讲, 当一个对象的内容大概在生成该对象的函数终了后被接见, 那末这个对象就会分派在堆上。

在堆上分派对象的状态包含:

1、返回对象的指针

2、传递了对象的指针到其他函数

3、在闭包中运用了对象而且须要修正对象

4、运用new

在C语言中函数返回在栈上的对象的指针是异常风险的事变, 但在go中倒是平安的, 由于这个对象会自动在堆上分派.
go决议是不是运用堆分派对象的历程也叫"逃逸剖析".

GC Bitmap

GC在标记时须要晓得哪些处所包含了指针, 比方上面提到的bitmap地区涵盖了arena地区中的指针信息。

除此之外, GC还须要晓得栈空间上哪些处所包含了指针,由于栈空间不属于arena地区, 栈空间的指针信息将会在函数信息内里。

别的, GC在分派对象时也须要依据对象的范例设置bitmap地区, 泉源的指针信息将会在范例信息内里。

总结起来go中有以下的GC Bitmap:

1、bitmap地区: 涵盖了arena地区, 运用2 bit示意一个指针大小的内存

2、函数信息: 涵盖了函数的栈空间, 运用1 bit示意一个指针大小的内存 (位于stackmap.bytedata)

3、范例信息: 在分派对象时会复制到bitmap地区, 运用1 bit示意一个指针大小的内存 (位于_type.gcdata)

Span

span是用于分派对象的区块, 下图是简朴说清楚明了Span的内部组织:

一般一个span包含了多个大小雷同的元素, 一个元素会保留一个对象, 除非:

1、span用于保留大对象, 这类状态span只要一个元素

2、span用于保留极小对象且不包含指针的对象(tiny object), 这类状态span会用一个元素保留多个对象

span中有一个freeindex标记下一次分派对象时应当入手下手搜刮的地点, 分派后freeindex会增添,在freeindex之前的元素都是已分派的, 在freeindex今后的元素有大概已分派, 也有大概未分派。

span每次GC今后都大概会接纳掉一些元素, allocBits用于标记哪些元素是已分派的, 哪些元素是未分派的。

运用freeindex + allocBits能够在分派时跳过已分派的元素, 把对象设置在未分派的元素中,但由于每次都去接见allocBits效力会比较慢, span中有一个整数型的allocCache用于缓存freeindex入手下手的bitmap, 缓存的bit值与原值相反。

gcmarkBits用于在gc时标记哪些对象存活, 每次gc今后gcmarkBits会变成allocBits.
须要注重的是span组织自身的内存是从体系分派的, 上面提到的spans地区和bitmap地区都只是一个索引.

Span的范例

span依据大小能够分为67个范例, 以下:

// class  bytes/obj  bytes/span  objects  tail waste  max waste
//     1          8        8192     1024           0     87.50%
//     2         16        8192      512           0     43.75%
//     3         32        8192      256           0     46.88%
//     4         48        8192      170          32     31.52%
//     5         64        8192      128           0     23.44%
//     6         80        8192      102          32     19.07%
//     7         96        8192       85          32     15.95%
//     8        112        8192       73          16     13.56%
//     9        128        8192       64           0     11.72%
//    10        144        8192       56         128     11.82%
//    11        160        8192       51          32      9.73%
//    12        176        8192       46          96      9.59%
//    13        192        8192       42         128      9.25%
//    14        208        8192       39          80      8.12%
//    15        224        8192       36         128      8.15%
//    16        240        8192       34          32      6.62%
//    17        256        8192       32           0      5.86%
//    18        288        8192       28         128     12.16%
//    19        320        8192       25         192     11.80%
//    20        352        8192       23          96      9.88%
//    21        384        8192       21         128      9.51%
//    22        416        8192       19         288     10.71%
//    23        448        8192       18         128      8.37%
//    24        480        8192       17          32      6.82%
//    25        512        8192       16           0      6.05%
//    26        576        8192       14         128     12.33%
//    27        640        8192       12         512     15.48%
//    28        704        8192       11         448     13.93%
//    29        768        8192       10         512     13.94%
//    30        896        8192        9         128     15.52%
//    31       1024        8192        8           0     12.40%
//    32       1152        8192        7         128     12.41%
//    33       1280        8192        6         512     15.55%
//    34       1408       16384       11         896     14.00%
//    35       1536        8192        5         512     14.00%
//    36       1792       16384        9         256     15.57%
//    37       2048        8192        4           0     12.45%
//    38       2304       16384        7         256     12.46%
//    39       2688        8192        3         128     15.59%
//    40       3072       24576        8           0     12.47%
//    41       3200       16384        5         384      6.22%
//    42       3456       24576        7         384      8.83%
//    43       4096        8192        2           0     15.60%
//    44       4864       24576        5         256     16.65%
//    45       5376       16384        3         256     10.92%
//    46       6144       24576        4           0     12.48%
//    47       6528       32768        5         128      6.23%
//    48       6784       40960        6         256      4.36%
//    49       6912       49152        7         768      3.37%
//    50       8192        8192        1           0     15.61%
//    51       9472       57344        6         512     14.28%
//    52       9728       49152        5         512      3.64%
//    53      10240       40960        4           0      4.99%
//    54      10880       32768        3         128      6.24%
//    55      12288       24576        2           0     11.45%
//    56      13568       40960        3         256      9.99%
//    57      14336       57344        4           0      5.35%
//    58      16384       16384        1           0     12.49%
//    59      18432       73728        4           0     11.11%
//    60      19072       57344        3         128      3.57%
//    61      20480       40960        2           0      6.87%
//    62      21760       65536        3         256      6.25%
//    63      24576       24576        1           0     11.45%
//    64      27264       81920        3         128     10.00%
//    65      28672       57344        2           0      4.91%
//    66      32768       32768        1           0     12.50%

以范例(class)为1的span为例,
span中的元素大小是8 byte, span自身占1页也就是8K, 一共能够保留1024个对象.

在分派对象时, 会依据对象的大小决议运用什么范例的span,
比方16 byte的对象会运用span 2, 17 byte的对象会运用span 3, 32 byte的对象会运用span 3.
从这个例子也能够看到, 分派17和32 byte的对象都邑运用span 3, 也就是说部份大小的对象在分派时会糟蹋肯定的空间.

有人大概会注重到, 上面最大的span的元素大小是32K, 那末分派凌驾32K的对象会在那里分派呢?
凌驾32K的对象称为"大对象", 分派大对象时, 会直接从heap分派一个特别的span,
这个特别的span的范例(class)是0, 只包含了一个大对象, span的大小由对象的大小决议.

特别的span加上的66个规范的span, 一共组成了67个span范例.

Span的位置

在前一篇中我提到了P是一个假造的资本, 同一时刻只能有一个线程接见同一个P, 所以P中的数据不须要锁.
为了分派对象时有更好的机能, 各个P中都有span的缓存(也叫mcache), 缓存的组织以下:

各个P中按span范例的差别, 有67*2=134个span的缓存,

个中scan和noscan的区分在于,
假如对象包含了指针, 分派对象时会运用scan的span,
假如对象不包含指针, 分派对象时会运用noscan的span.
把span分为scan和noscan的意义在于,
GC扫描对象的时刻关于noscan的span能够不去检察bitmap地区来标记子对象, 如许能够大幅提拔标记的效力.

在分派对象时将会从以下的位置猎取合适的span用于分派:

  • 起首从P的缓存(mcache)猎取, 假如有缓存的span而且未满则运用, 这个步骤不须要锁

  • 然后从全局缓存(mcentral)猎取, 假如猎取胜利则设置到P, 这个步骤须要锁

  • 末了从mheap猎取, 猎取后设置到全局缓存, 这个步骤须要锁

在P中缓存span的做法跟CoreCLR中线程缓存分派上下文(Allocation Context)的做法类似,
都能够让分派对象时大部份时刻不须要线程锁, 革新分派的机能.

分派对象的处置惩罚

分派对象的流程

go从堆分派对象时会挪用newobject函数, 这个函数的流程大抵以下:

起首会搜检GC是不是在事情中, 假如GC在事情中而且当前的G分派了肯定大小的内存则须要辅佐GC做肯定的事情,
这个机制叫GC Assist, 用于防备分派内存太快致使GC接纳跟不上的状态发作.

今后会推断是小对象照样大对象, 假如是大对象则直接挪用largeAlloc从堆中分派,
假如是小对象分3个阶段猎取可用的span, 然后从span中分派对象:

  • 起首从P的缓存(mcache)猎取

  • 然后从全局缓存(mcentral)猎取, 全局缓存中有可用的span的列表

  • 末了从mheap猎取, mheap中也有span的自在列表, 假如都猎取失利则从arena地区分派

这三个阶段的细致组织以下图:

数据范例的定义

分派对象触及的数据范例包含:

p: 前一篇提到过, P是协程中的用于运转go代码的假造资本
m: 前一篇提到过, M现在代表体系线程
g: 前一篇提到过, G就是goroutine
mspan: 用于分派对象的区块
mcentral: 全局的mspan缓存, 一共有67*2=134个
mheap: 用于治理heap的对象, 全局只要一个

源代码剖析

go从堆分派对象时会挪用newobject函数, 先从这个函数看起:

// implementation of new builtin
// compiler (both frontend and SSA backend) knows the signature
// of this function
func newobject(typ *_type) unsafe.Pointer {
    return mallocgc(typ.size, typ, true)
}

newobject挪用了mallocgc函数:

// Allocate an object of size bytes.
// Small objects are allocated from the per-P cache's free lists.
// Large objects (> 32 kB) are allocated straight from the heap.
func mallocgc(size uintptr, typ *_type, needzero bool) unsafe.Pointer {
    if gcphase == _GCmarktermination {
        throw("mallocgc called with gcphase == _GCmarktermination")
    }

    if size == 0 {
        return unsafe.Pointer(&zerobase)
    }

    if debug.sbrk != 0 {
        align := uintptr(16)
        if typ != nil {
            align = uintptr(typ.align)
        }
        return persistentalloc(size, align, &memstats.other_sys)
    }

    // 推断是不是要辅佐GC事情
    // gcBlackenEnabled在GC的标记阶段会开启
    // assistG is the G to charge for this allocation, or nil if
    // GC is not currently active.
    var assistG *g
    if gcBlackenEnabled != 0 {
        // Charge the current user G for this allocation.
        assistG = getg()
        if assistG.m.curg != nil {
            assistG = assistG.m.curg
        }
        // Charge the allocation against the G. We'll account
        // for internal fragmentation at the end of mallocgc.
        assistG.gcAssistBytes -= int64(size)

        // 会按分派的大小推断须要辅佐GC完成若干事情
        // 详细的算法将在下面解说收集器时申明
        if assistG.gcAssistBytes < 0 {
            // This G is in debt. Assist the GC to correct
            // this before allocating. This must happen
            // before disabling preemption.
            gcAssistAlloc(assistG)
        }
    }

    // 增添当前G对应的M的lock计数, 防备这个G被抢占
    // Set mp.mallocing to keep from being preempted by GC.
    mp := acquirem()
    if mp.mallocing != 0 {
        throw("malloc deadlock")
    }
    if mp.gsignal == getg() {
        throw("malloc during signal")
    }
    mp.mallocing = 1

    shouldhelpgc := false
    dataSize := size
    // 猎取当前G对应的M对应的P的当地span缓存(mcache)
    // 由于M在具有P后会把P的mcache设到M中, 这里返回的是getg().m.mcache
    c := gomcache()
    var x unsafe.Pointer
    noscan := typ == nil || typ.kind&kindNoPointers != 0
    // 推断是不是小对象, maxSmallSize当前的值是32K
    if size <= maxSmallSize {
        // 假如对象不包含指针, 而且对象的大小小于16 bytes, 能够做特别处置惩罚
        // 这里是针对异常小的对象的优化, 由于span的元素最小只能是8 byte, 假如对象更小那末许多空间都邑被糟蹋掉
        // 异常小的对象能够整合在"class 2 noscan"的元素(大小为16 byte)中
        if noscan && size < maxTinySize {
            // Tiny allocator.
            //
            // Tiny allocator combines several tiny allocation requests
            // into a single memory block. The resulting memory block
            // is freed when all subobjects are unreachable. The subobjects
            // must be noscan (don't have pointers), this ensures that
            // the amount of potentially wasted memory is bounded.
            //
            // Size of the memory block used for combining (maxTinySize) is tunable.
            // Current setting is 16 bytes, which relates to 2x worst case memory
            // wastage (when all but one subobjects are unreachable).
            // 8 bytes would result in no wastage at all, but provides less
            // opportunities for combining.
            // 32 bytes provides more opportunities for combining,
            // but can lead to 4x worst case wastage.
            // The best case winning is 8x regardless of block size.
            //
            // Objects obtained from tiny allocator must not be freed explicitly.
            // So when an object will be freed explicitly, we ensure that
            // its size >= maxTinySize.
            //
            // SetFinalizer has a special case for objects potentially coming
            // from tiny allocator, it such case it allows to set finalizers
            // for an inner byte of a memory block.
            //
            // The main targets of tiny allocator are small strings and
            // standalone escaping variables. On a json benchmark
            // the allocator reduces number of allocations by ~12% and
            // reduces heap size by ~20%.
            off := c.tinyoffset
            // Align tiny pointer for required (conservative) alignment.
            if size&7 == 0 {
                off = round(off, 8)
            } else if size&3 == 0 {
                off = round(off, 4)
            } else if size&1 == 0 {
                off = round(off, 2)
            }
            if off+size <= maxTinySize && c.tiny != 0 {
                // The object fits into existing tiny block.
                x = unsafe.Pointer(c.tiny + off)
                c.tinyoffset = off + size
                c.local_tinyallocs++
                mp.mallocing = 0
                releasem(mp)
                return x
            }
            // Allocate a new maxTinySize block.
            span := c.alloc[tinySpanClass]
            v := nextFreeFast(span)
            if v == 0 {
                v, _, shouldhelpgc = c.nextFree(tinySpanClass)
            }
            x = unsafe.Pointer(v)
            (*[2]uint64)(x)[0] = 0
            (*[2]uint64)(x)[1] = 0
            // See if we need to replace the existing tiny block with the new one
            // based on amount of remaining free space.
            if size < c.tinyoffset || c.tiny == 0 {
                c.tiny = uintptr(x)
                c.tinyoffset = size
            }
            size = maxTinySize
        } else {
            // 不然按一般的小对象分派
            // 起首猎取对象的大小应当运用哪一个span范例
            var sizeclass uint8
            if size <= smallSizeMax-8 {
                sizeclass = size_to_class8[(size+smallSizeDiv-1)/smallSizeDiv]
            } else {
                sizeclass = size_to_class128[(size-smallSizeMax+largeSizeDiv-1)/largeSizeDiv]
            }
            size = uintptr(class_to_size[sizeclass])
            // 即是sizeclass * 2 + (noscan ? 1 : 0)
            spc := makeSpanClass(sizeclass, noscan)
            span := c.alloc[spc]
            // 尝试疾速的从这个span中分派
            v := nextFreeFast(span)
            if v == 0 {
                // 分派失利, 大概须要从mcentral或许mheap中猎取
                // 假如从mcentral或许mheap猎取了新的span, 则shouldhelpgc会即是true
                // shouldhelpgc会即是true时会在下面推断是不是要触发GC
                v, span, shouldhelpgc = c.nextFree(spc)
            }
            x = unsafe.Pointer(v)
            if needzero && span.needzero != 0 {
                memclrNoHeapPointers(unsafe.Pointer(v), size)
            }
        }
    } else {
        // 大对象直接从mheap分派, 这里的s是一个特别的span, 它的class是0
        var s *mspan
        shouldhelpgc = true
        systemstack(func() {
            s = largeAlloc(size, needzero, noscan)
        })
        s.freeindex = 1
        s.allocCount = 1
        x = unsafe.Pointer(s.base())
        size = s.elemsize
    }

    // 设置arena对应的bitmap, 纪录哪些位置包含了指针, GC会运用bitmap扫描一切可抵达的对象
    var scanSize uintptr
    if !noscan {
        // If allocating a defer+arg block, now that we've picked a malloc size
        // large enough to hold everything, cut the "asked for" size down to
        // just the defer header, so that the GC bitmap will record the arg block
        // as containing nothing at all (as if it were unused space at the end of
        // a malloc block caused by size rounding).
        // The defer arg areas are scanned as part of scanstack.
        if typ == deferType {
            dataSize = unsafe.Sizeof(_defer{})
        }
        // 这个函数异常的长, 有兴致的能够看
        // https://github.com/golang/go/blob/go1.9.2/src/runtime/mbitmap.go#L855
        // 虽然代码很长然则设置的内容跟上面说过的bitmap地区的组织一样
        // 依据范例信息设置scan bit跟pointer bit, scan bit建立示意应当继承扫描, pointer bit建立示意该位置是指针
        // 须要注重的处所有
        // - 假如一个范例只要开头的处所包含指针, 比方[ptr, ptr, large non-pointer data]
        //   那末背面的部份的scan bit将会为0, 如许能够大幅提拔标记的效力
        // - 第二个slot的scan bit用处比较特别, 它并不用于标记是不是继承scan, 而是标记checkmark
        // 什么是checkmark
        // - 由于go的并行GC比较庞杂, 为了搜检完成是不是正确, go须要在有一个搜检一切应当被标记的对象是不是被标记的机制
        //   这个机制就是checkmark, 在开启checkmark时go会在标记阶段的末了住手悉数天下然后从新实行一次标记
        //   上面的第二个slot的scan bit就是用于标记对象在checkmark标记中是不是被标记的
        // - 有的人大概会发明第二个slot要求对象起码有两个指针的大小, 那末只要一个指针的大小的对象呢?
        //   只要一个指针的大小的对象能够分为两种状态
        //   对象就是指针, 由于大小刚好是1个指针所以并不须要看bitmap地区, 这时候第一个slot就是checkmark
        //   对象不是指针, 由于有tiny alloc的机制, 不是指针且只要一个指针大小的对象会分派在两个指针的span中
        //               这时候刻也不须要看bitmap地区, 所以和上面一样第一个slot就是checkmark
        heapBitsSetType(uintptr(x), size, dataSize, typ)
        if dataSize > typ.size {
            // Array allocation. If there are any
            // pointers, GC has to scan to the last
            // element.
            if typ.ptrdata != 0 {
                scanSize = dataSize - typ.size + typ.ptrdata
            }
        } else {
            scanSize = typ.ptrdata
        }
        c.local_scan += scanSize
    }

    // 内存屏蔽, 由于x86和x64的store不会乱序所以这里只是个针对编译器的屏蔽, 汇编中是ret
    // Ensure that the stores above that initialize x to
    // type-safe memory and set the heap bits occur before
    // the caller can make x observable to the garbage
    // collector. Otherwise, on weakly ordered machines,
    // the garbage collector could follow a pointer to x,
    // but see uninitialized memory or stale heap bits.
    publicationBarrier()

    // 假如当前在GC中, 须要马上标记分派后的对象为"黑色", 防备它被接纳
    // Allocate black during GC.
    // All slots hold nil so no scanning is needed.
    // This may be racing with GC so do it atomically if there can be
    // a race marking the bit.
    if gcphase != _GCoff {
        gcmarknewobject(uintptr(x), size, scanSize)
    }

    // Race Detector的处置惩罚(用于检测线程争执问题)
    if raceenabled {
        racemalloc(x, size)
    }

    // Memory Sanitizer的处置惩罚(用于检测风险指针等内存问题)
    if msanenabled {
        msanmalloc(x, size)
    }

    // 从新许可当前的G被抢占
    mp.mallocing = 0
    releasem(mp)

    // 除错纪录
    if debug.allocfreetrace != 0 {
        tracealloc(x, size, typ)
    }

    // Profiler纪录
    if rate := MemProfileRate; rate > 0 {
        if size < uintptr(rate) && int32(size) < c.next_sample {
            c.next_sample -= int32(size)
        } else {
            mp := acquirem()
            profilealloc(mp, x, size)
            releasem(mp)
        }
    }

    // gcAssistBytes减去"现实分派大小 - 要求分派大小", 调解到正确值
    if assistG != nil {
        // Account for internal fragmentation in the assist
        // debt now that we know it.
        assistG.gcAssistBytes -= int64(size - dataSize)
    }

    // 假如之前猎取了新的span, 则推断是不是须要背景启动GC
    // 这里的推断逻辑(gcTrigger)会在下面细致申明
    if shouldhelpgc {
        if t := (gcTrigger{kind: gcTriggerHeap}); t.test() {
            gcStart(gcBackgroundMode, t)
        }
    }

    return x
}

接下来看看怎样从span内里分派对象, 起首会挪用nextFreeFast尝试疾速分派:

// nextFreeFast returns the next free object if one is quickly available.
// Otherwise it returns 0.
func nextFreeFast(s *mspan) gclinkptr {
    // 猎取第一个非0的bit是第几个bit, 也就是哪一个元素是未分派的
    theBit := sys.Ctz64(s.allocCache) // Is there a free object in the allocCache?
    // 找到未分派的元素
    if theBit < 64 {
        result := s.freeindex + uintptr(theBit)
        // 要求索引值小于元素数目
        if result < s.nelems {
            // 下一个freeindex
            freeidx := result + 1
            // 能够被64整除时须要特别处置惩罚(参考nextFree)
            if freeidx%64 == 0 && freeidx != s.nelems {
                return 0
            }
            // 更新freeindex和allocCache(高位都是0, 用尽今后会更新)
            s.allocCache >>= uint(theBit + 1)
            s.freeindex = freeidx
            // 返回元素地点的地点
            v := gclinkptr(result*s.elemsize + s.base())
            // 增加已分派的元素计数
            s.allocCount++
            return v
        }
    }
    return 0
}

假如在freeindex后没法疾速找到未分派的元素, 就须要挪用nextFree做出更庞杂的处置惩罚:

// nextFree returns the next free object from the cached span if one is available.
// Otherwise it refills the cache with a span with an available object and
// returns that object along with a flag indicating that this was a heavy
// weight allocation. If it is a heavy weight allocation the caller must
// determine whether a new GC cycle needs to be started or if the GC is active
// whether this goroutine needs to assist the GC.
func (c *mcache) nextFree(spc spanClass) (v gclinkptr, s *mspan, shouldhelpgc bool) {
    // 找到下一个freeindex和更新allocCache
    s = c.alloc[spc]
    shouldhelpgc = false
    freeIndex := s.nextFreeIndex()
    // 假如span内里一切元素都已分派, 则须要猎取新的span
    if freeIndex == s.nelems {
        // The span is full.
        if uintptr(s.allocCount) != s.nelems {
            println("runtime: s.allocCount=", s.allocCount, "s.nelems=", s.nelems)
            throw("s.allocCount != s.nelems && freeIndex == s.nelems")
        }
        // 要求新的span
        systemstack(func() {
            c.refill(spc)
        })
        // 猎取要求后的新的span, 并设置须要搜检是不是实行GC
        shouldhelpgc = true
        s = c.alloc[spc]

        freeIndex = s.nextFreeIndex()
    }

    if freeIndex >= s.nelems {
        throw("freeIndex is not valid")
    }

    // 返回元素地点的地点
    v = gclinkptr(freeIndex*s.elemsize + s.base())
    // 增加已分派的元素计数
    s.allocCount++
    if uintptr(s.allocCount) > s.nelems {
        println("s.allocCount=", s.allocCount, "s.nelems=", s.nelems)
        throw("s.allocCount > s.nelems")
    }
    return
}

假如mcache中指定范例的span已满, 就须要挪用refill函数要求新的span:

// Gets a span that has a free object in it and assigns it
// to be the cached span for the given sizeclass. Returns this span.
func (c *mcache) refill(spc spanClass) *mspan {
    _g_ := getg()

    // 防备G被抢占
    _g_.m.locks++
    // Return the current cached span to the central lists.
    s := c.alloc[spc]

    // 确保当前的span一切元素都已分派
    if uintptr(s.allocCount) != s.nelems {
        throw("refill of span with free space remaining")
    }

    // 设置span的incache属性, 除非是全局运用的空span(也就是mcache内里span指针的默许值)
    if s != &emptymspan {
        s.incache = false
    }

    // 向mcentral要求一个新的span
    // Get a new cached span from the central lists.
    s = mheap_.central[spc].mcentral.cacheSpan()
    if s == nil {
        throw("out of memory")
    }

    if uintptr(s.allocCount) == s.nelems {
        throw("span has no free space")
    }

    // 设置新的span到mcache中
    c.alloc[spc] = s
    // 许可G被抢占
    _g_.m.locks--
    return s
}

向mcentral要求一个新的span会经由历程cacheSpan函数:
mcentral起首尝试从内部的链表复用原有的span, 假如复用失利则向mheap要求.

// Allocate a span to use in an MCache.
func (c *mcentral) cacheSpan() *mspan {
    // 让当前G辅佐一部份的sweep事情
    // Deduct credit for this span allocation and sweep if necessary.
    spanBytes := uintptr(class_to_allocnpages[c.spanclass.sizeclass()]) * _PageSize
    deductSweepCredit(spanBytes, 0)

    // 对mcentral上锁, 由于大概会有多个M(P)同时接见
    lock(&c.lock)
    traceDone := false
    if trace.enabled {
        traceGCSweepStart()
    }
    sg := mheap_.sweepgen
retry:
    // mcentral内里有两个span的链表
    // - nonempty示意肯定该span起码有一个未分派的元素
    // - empty示意不肯定该span起码有一个未分派的元素
    // 这里优先查找nonempty的链表
    // sweepgen每次GC都邑增添2
    // - sweepgen == 全局sweepgen, 示意span已sweep过
    // - sweepgen == 全局sweepgen-1, 示意span正在sweep
    // - sweepgen == 全局sweepgen-2, 示意span守候sweep
    var s *mspan
    for s = c.nonempty.first; s != nil; s = s.next {
        // 假如span守候sweep, 尝试原子修正sweepgen为全局sweepgen-1
        if s.sweepgen == sg-2 && atomic.Cas(&s.sweepgen, sg-2, sg-1) {
            // 修正胜利则把span移到empty链表, sweep它然后跳到havespan
            c.nonempty.remove(s)
            c.empty.insertBack(s)
            unlock(&c.lock)
            s.sweep(true)
            goto havespan
        }
        // 假如这个span正在被其他线程sweep, 就跳过
        if s.sweepgen == sg-1 {
            // the span is being swept by background sweeper, skip
            continue
        }
        // span已sweep过
        // 由于nonempty链表中的span肯定起码有一个未分派的元素, 这里能够直接运用它
        // we have a nonempty span that does not require sweeping, allocate from it
        c.nonempty.remove(s)
        c.empty.insertBack(s)
        unlock(&c.lock)
        goto havespan
    }

    // 查找empty的链表
    for s = c.empty.first; s != nil; s = s.next {
        // 假如span守候sweep, 尝试原子修正sweepgen为全局sweepgen-1
        if s.sweepgen == sg-2 && atomic.Cas(&s.sweepgen, sg-2, sg-1) {
            // 把span放到empty链表的末了
            // we have an empty span that requires sweeping,
            // sweep it and see if we can free some space in it
            c.empty.remove(s)
            // swept spans are at the end of the list
            c.empty.insertBack(s)
            unlock(&c.lock)
            // 尝试sweep
            s.sweep(true)
            // sweep今后还须要检测是不是有未分派的对象, 假如有则能够运用它
            freeIndex := s.nextFreeIndex()
            if freeIndex != s.nelems {
                s.freeindex = freeIndex
                goto havespan
            }
            lock(&c.lock)
            // the span is still empty after sweep
            // it is already in the empty list, so just retry
            goto retry
        }
        // 假如这个span正在被其他线程sweep, 就跳过
        if s.sweepgen == sg-1 {
            // the span is being swept by background sweeper, skip
            continue
        }
        // 找不到有未分派对象的span
        // already swept empty span,
        // all subsequent ones must also be either swept or in process of sweeping
        break
    }
    if trace.enabled {
        traceGCSweepDone()
        traceDone = true
    }
    unlock(&c.lock)

    // 找不到有未分派对象的span, 须要从mheap分派
    // 分派完成后加到empty链表中
    // Replenish central list if empty.
    s = c.grow()
    if s == nil {
        return nil
    }
    lock(&c.lock)
    c.empty.insertBack(s)
    unlock(&c.lock)

    // At this point s is a non-empty span, queued at the end of the empty list,
    // c is unlocked.
havespan:
    if trace.enabled && !traceDone {
        traceGCSweepDone()
    }
    // 统计span中未分派的元素数目, 加到mcentral.nmalloc中
    // 统计span中未分派的元素总大小, 加到memstats.heap_live中
    cap := int32((s.npages << _PageShift) / s.elemsize)
    n := cap - int32(s.allocCount)
    if n == 0 || s.freeindex == s.nelems || uintptr(s.allocCount) == s.nelems {
        throw("span has no free objects")
    }
    // Assume all objects from this span will be allocated in the
    // mcache. If it gets uncached, we'll adjust this.
    atomic.Xadd64(&c.nmalloc, int64(n))
    usedBytes := uintptr(s.allocCount) * s.elemsize
    atomic.Xadd64(&memstats.heap_live, int64(spanBytes)-int64(usedBytes))
    // 跟踪处置惩罚
    if trace.enabled {
        // heap_live changed.
        traceHeapAlloc()
    }
    // 假如当前在GC中, 由于heap_live转变了, 从新调解G辅佐标记事情的值
    // 细致请参考下面临revise函数的剖析
    if gcBlackenEnabled != 0 {
        // heap_live changed.
        gcController.revise()
    }
    // 设置span的incache属性, 示意span正在mcache中
    s.incache = true
    // 依据freeindex更新allocCache
    freeByteBase := s.freeindex &^ (64 - 1)
    whichByte := freeByteBase / 8
    // Init alloc bits cache.
    s.refillAllocCache(whichByte)

    // Adjust the allocCache so that s.freeindex corresponds to the low bit in
    // s.allocCache.
    s.allocCache >>= s.freeindex % 64

    return s
}

mcentral向mheap要求一个新的span会运用grow函数:

// grow allocates a new empty span from the heap and initializes it for c's size class.
func (c *mcentral) grow() *mspan {
    // 依据mcentral的范例盘算须要要求的span的大小(除以8K = 有若干页)和能够保留若干个元素
    npages := uintptr(class_to_allocnpages[c.spanclass.sizeclass()])
    size := uintptr(class_to_size[c.spanclass.sizeclass()])
    n := (npages << _PageShift) / size

    // 向mheap要求一个新的span, 以页(8K)为单元
    s := mheap_.alloc(npages, c.spanclass, false, true)
    if s == nil {
        return nil
    }

    p := s.base()
    s.limit = p + size*n

    // 分派并初始化span的allocBits和gcmarkBits
    heapBitsForSpan(s.base()).initSpan(s)
    return s
}

mheap分派span的函数是alloc:

func (h *mheap) alloc(npage uintptr, spanclass spanClass, large bool, needzero bool) *mspan {
    // 在g0的栈空间中挪用alloc_m函数
    // 关于systemstack的申明请看前一篇文章
    // Don't do any operations that lock the heap on the G stack.
    // It might trigger stack growth, and the stack growth code needs
    // to be able to allocate heap.
    var s *mspan
    systemstack(func() {
        s = h.alloc_m(npage, spanclass, large)
    })

    if s != nil {
        if needzero && s.needzero != 0 {
            memclrNoHeapPointers(unsafe.Pointer(s.base()), s.npages<<_PageShift)
        }
        s.needzero = 0
    }
    return s
}

alloc函数会在g0的栈空间中挪用alloc_m函数:

// Allocate a new span of npage pages from the heap for GC'd memory
// and record its size class in the HeapMap and HeapMapCache.
func (h *mheap) alloc_m(npage uintptr, spanclass spanClass, large bool) *mspan {
    _g_ := getg()
    if _g_ != _g_.m.g0 {
        throw("_mheap_alloc not on g0 stack")
    }
    // 对mheap上锁, 这里的锁是全局锁
    lock(&h.lock)

    // 为了防备heap增速太快, 在分派n页之前要先sweep和接纳n页
    // 会先罗列busy列表然后再罗列busyLarge列表举行sweep, 详细参考reclaim和reclaimList函数
    // To prevent excessive heap growth, before allocating n pages
    // we need to sweep and reclaim at least n pages.
    if h.sweepdone == 0 {
        // TODO(austin): This tends to sweep a large number of
        // spans in order to find a few completely free spans
        // (for example, in the garbage benchmark, this sweeps
        // ~30x the number of pages its trying to allocate).
        // If GC kept a bit for whether there were any marks
        // in a span, we could release these free spans
        // at the end of GC and eliminate this entirely.
        if trace.enabled {
            traceGCSweepStart()
        }
        h.reclaim(npage)
        if trace.enabled {
            traceGCSweepDone()
        }
    }

    // 把mcache中的当地统计数据加到全局
    // transfer stats from cache to global
    memstats.heap_scan += uint64(_g_.m.mcache.local_scan)
    _g_.m.mcache.local_scan = 0
    memstats.tinyallocs += uint64(_g_.m.mcache.local_tinyallocs)
    _g_.m.mcache.local_tinyallocs = 0

    // 挪用allocSpanLocked分派span, allocSpanLocked函数要求当前已对mheap上锁
    s := h.allocSpanLocked(npage, &memstats.heap_inuse)
    if s != nil {
        // Record span info, because gc needs to be
        // able to map interior pointer to containing span.
        // 设置span的sweepgen = 全局sweepgen
        atomic.Store(&s.sweepgen, h.sweepgen)
        // 放到全局span列表中, 这里的sweepSpans的长度是2
        // sweepSpans[h.sweepgen/2%2]保留当前正在运用的span列表
        // sweepSpans[1-h.sweepgen/2%2]保留守候sweep的span列表
        // 由于每次gcsweepgen都邑加2, 每次gc这两个列表都邑交流
        h.sweepSpans[h.sweepgen/2%2].push(s) // Add to swept in-use list.
        // 初始化span成员
        s.state = _MSpanInUse
        s.allocCount = 0
        s.spanclass = spanclass
        if sizeclass := spanclass.sizeclass(); sizeclass == 0 {
            s.elemsize = s.npages << _PageShift
            s.divShift = 0
            s.divMul = 0
            s.divShift2 = 0
            s.baseMask = 0
        } else {
            s.elemsize = uintptr(class_to_size[sizeclass])
            m := &class_to_divmagic[sizeclass]
            s.divShift = m.shift
            s.divMul = m.mul
            s.divShift2 = m.shift2
            s.baseMask = m.baseMask
        }

        // update stats, sweep lists
        h.pagesInUse += uint64(npage)
        // 上面grow函数会传入true, 也就是经由历程grow挪用到这里large会即是true
        // 增加已分派的span到busy列表, 假如页数凌驾_MaxMHeapList(128页=8K*128=1M)则放到busylarge列表
        if large {
            memstats.heap_objects++
            mheap_.largealloc += uint64(s.elemsize)
            mheap_.nlargealloc++
            atomic.Xadd64(&memstats.heap_live, int64(npage<<_PageShift))
            // Swept spans are at the end of lists.
            if s.npages < uintptr(len(h.busy)) {
                h.busy[s.npages].insertBack(s)
            } else {
                h.busylarge.insertBack(s)
            }
        }
    }
    // 假如当前在GC中, 由于heap_live转变了, 从新调解G辅佐标记事情的值
    // 细致请参考下面临revise函数的剖析
    // heap_scan and heap_live were updated.
    if gcBlackenEnabled != 0 {
        gcController.revise()
    }

    // 跟踪处置惩罚
    if trace.enabled {
        traceHeapAlloc()
    }

    // h.spans is accessed concurrently without synchronization
    // from other threads. Hence, there must be a store/store
    // barrier here to ensure the writes to h.spans above happen
    // before the caller can publish a pointer p to an object
    // allocated from s. As soon as this happens, the garbage
    // collector running on another processor could read p and
    // look up s in h.spans. The unlock acts as the barrier to
    // order these writes. On the read side, the data dependency
    // between p and the index in h.spans orders the reads.
    unlock(&h.lock)
    return s
}

继承检察allocSpanLocked函数:

// Allocates a span of the given size.  h must be locked.
// The returned span has been removed from the
// free list, but its state is still MSpanFree.
func (h *mheap) allocSpanLocked(npage uintptr, stat *uint64) *mspan {
    var list *mSpanList
    var s *mspan

    // 尝试在mheap中的自在列表分派
    // 页数小于_MaxMHeapList(128页=1M)的自在span都邑在free列表中
    // 页数大于_MaxMHeapList的自在span都邑在freelarge列表中
    // Try in fixed-size lists up to max.
    for i := int(npage); i < len(h.free); i++ {
        list = &h.free[i]
        if !list.isEmpty() {
            s = list.first
            list.remove(s)
            goto HaveSpan
        }
    }
    // free列表找不到则查找freelarge列表
    // 查找不到就向arena地区要求一个新的span加到freelarge中, 然后再查找freelarge列表
    // Best fit in list of large spans.
    s = h.allocLarge(npage) // allocLarge removed s from h.freelarge for us
    if s == nil {
        if !h.grow(npage) {
            return nil
        }
        s = h.allocLarge(npage)
        if s == nil {
            return nil
        }
    }

HaveSpan:
    // Mark span in use.
    if s.state != _MSpanFree {
        throw("MHeap_AllocLocked - MSpan not free")
    }
    if s.npages < npage {
        throw("MHeap_AllocLocked - bad npages")
    }
    // 假如span有已开释(消除假造内存和物理内存关联)的页, 提示这些页会被运用然后更新统计数据
    if s.npreleased > 0 {
        sysUsed(unsafe.Pointer(s.base()), s.npages<<_PageShift)
        memstats.heap_released -= uint64(s.npreleased << _PageShift)
        s.npreleased = 0
    }

    // 假如猎取到的span页数比要求的页数多
    // 支解盈余的页数到另一个span而且放到自在列表中
    if s.npages > npage {
        // Trim extra and put it back in the heap.
        t := (*mspan)(h.spanalloc.alloc())
        t.init(s.base()+npage<<_PageShift, s.npages-npage)
        s.npages = npage
        p := (t.base() - h.arena_start) >> _PageShift
        if p > 0 {
            h.spans[p-1] = s
        }
        h.spans[p] = t
        h.spans[p+t.npages-1] = t
        t.needzero = s.needzero
        s.state = _MSpanManual // prevent coalescing with s
        t.state = _MSpanManual
        h.freeSpanLocked(t, false, false, s.unusedsince)
        s.state = _MSpanFree
    }
    s.unusedsince = 0

    // 设置spans地区, 哪些地点对应哪一个mspan对象
    p := (s.base() - h.arena_start) >> _PageShift
    for n := uintptr(0); n < npage; n++ {
        h.spans[p+n] = s
    }

    // 更新统计数据
    *stat += uint64(npage << _PageShift)
    memstats.heap_idle -= uint64(npage << _PageShift)

    //println("spanalloc", hex(s.start<<_PageShift))
    if s.inList() {
        throw("still in list")
    }
    return s
}

继承检察allocLarge函数:

// allocLarge allocates a span of at least npage pages from the treap of large spans.
// Returns nil if no such span currently exists.
func (h *mheap) allocLarge(npage uintptr) *mspan {
    // Search treap for smallest span with >= npage pages.
    return h.freelarge.remove(npage)
}

freelarge的范例是mTreap, 挪用remove函数会在树内里搜刮一个最少npage且在树中的最小的span返回:

// remove searches for, finds, removes from the treap, and returns the smallest
// span that can hold npages. If no span has at least npages return nil.
// This is slightly more complicated than a simple binary tree search
// since if an exact match is not found the next larger node is
// returned.
// If the last node inspected > npagesKey not holding
// a left node (a smaller npages) is the "best fit" node.
func (root *mTreap) remove(npages uintptr) *mspan {
    t := root.treap
    for t != nil {
        if t.spanKey == nil {
            throw("treap node with nil spanKey found")
        }
        if t.npagesKey < npages {
            t = t.right
        } else if t.left != nil && t.left.npagesKey >= npages {
            t = t.left
        } else {
            result := t.spanKey
            root.removeNode(t)
            return result
        }
    }
    return nil
}

向arena地区要求新span的函数是mheap类的grow函数:

// Try to add at least npage pages of memory to the heap,
// returning whether it worked.
//
// h must be locked.
func (h *mheap) grow(npage uintptr) bool {
    // Ask for a big chunk, to reduce the number of mappings
    // the operating system needs to track; also amortizes
    // the overhead of an operating system mapping.
    // Allocate a multiple of 64kB.
    npage = round(npage, (64<<10)/_PageSize)
    ask := npage << _PageShift
    if ask < _HeapAllocChunk {
        ask = _HeapAllocChunk
    }

    // 挪用mheap.sysAlloc函数要求
    v := h.sysAlloc(ask)
    if v == nil {
        if ask > npage<<_PageShift {
            ask = npage << _PageShift
            v = h.sysAlloc(ask)
        }
        if v == nil {
            print("runtime: out of memory: cannot allocate ", ask, "-byte block (", memstats.heap_sys, " in use)\n")
            return false
        }
    }

    // 建立一个新的span并加到自在列表中
    // Create a fake "in use" span and free it, so that the
    // right coalescing happens.
    s := (*mspan)(h.spanalloc.alloc())
    s.init(uintptr(v), ask>>_PageShift)
    p := (s.base() - h.arena_start) >> _PageShift
    for i := p; i < p+s.npages; i++ {
        h.spans[i] = s
    }
    atomic.Store(&s.sweepgen, h.sweepgen)
    s.state = _MSpanInUse
    h.pagesInUse += uint64(s.npages)
    h.freeSpanLocked(s, false, true, 0)
    return true
}

继承检察mheap的sysAlloc函数:

// sysAlloc allocates the next n bytes from the heap arena. The
// returned pointer is always _PageSize aligned and between
// h.arena_start and h.arena_end. sysAlloc returns nil on failure.
// There is no corresponding free function.
func (h *mheap) sysAlloc(n uintptr) unsafe.Pointer {
    // strandLimit is the maximum number of bytes to strand from
    // the current arena block. If we would need to strand more
    // than this, we fall back to sysAlloc'ing just enough for
    // this allocation.
    const strandLimit = 16 << 20

    // 假如arena地区当前已提交的地区不足, 则挪用sysReserve预留更多的空间, 然后更新arena_end
    // sysReserve在linux上挪用的是mmap函数
    // mmap(v, n, _PROT_NONE, _MAP_ANON|_MAP_PRIVATE, -1, 0)
    if n > h.arena_end-h.arena_alloc {
        // If we haven't grown the arena to _MaxMem yet, try
        // to reserve some more address space.
        p_size := round(n+_PageSize, 256<<20)
        new_end := h.arena_end + p_size // Careful: can overflow
        if h.arena_end <= new_end && new_end-h.arena_start-1 <= _MaxMem {
            // TODO: It would be bad if part of the arena
            // is reserved and part is not.
            var reserved bool
            p := uintptr(sysReserve(unsafe.Pointer(h.arena_end), p_size, &reserved))
            if p == 0 {
                // TODO: Try smaller reservation
                // growths in case we're in a crowded
                // 32-bit address space.
                goto reservationFailed
            }
            // p can be just about anywhere in the address
            // space, including before arena_end.
            if p == h.arena_end {
                // The new block is contiguous with
                // the current block. Extend the
                // current arena block.
                h.arena_end = new_end
                h.arena_reserved = reserved
            } else if h.arena_start <= p && p+p_size-h.arena_start-1 <= _MaxMem && h.arena_end-h.arena_alloc < strandLimit {
                // We were able to reserve more memory
                // within the arena space, but it's
                // not contiguous with our previous
                // reservation. It could be before or
                // after our current arena_used.
                //
                // Keep everything page-aligned.
                // Our pages are bigger than hardware pages.
                h.arena_end = p + p_size
                p = round(p, _PageSize)
                h.arena_alloc = p
                h.arena_reserved = reserved
            } else {
                // We got a mapping, but either
                //
                // 1) It's not in the arena, so we
                // can't use it. (This should never
                // happen on 32-bit.)
                //
                // 2) We would need to discard too
                // much of our current arena block to
                // use it.
                //
                // We haven't added this allocation to
                // the stats, so subtract it from a
                // fake stat (but avoid underflow).
                //
                // We'll fall back to a small sysAlloc.
                stat := uint64(p_size)
                sysFree(unsafe.Pointer(p), p_size, &stat)
            }
        }
    }

    // 预留的空间充足时只须要增添arena_alloc
    if n <= h.arena_end-h.arena_alloc {
        // Keep taking from our reservation.
        p := h.arena_alloc
        sysMap(unsafe.Pointer(p), n, h.arena_reserved, &memstats.heap_sys)
        h.arena_alloc += n
        if h.arena_alloc > h.arena_used {
            h.setArenaUsed(h.arena_alloc, true)
        }

        if p&(_PageSize-1) != 0 {
            throw("misrounded allocation in MHeap_SysAlloc")
        }
        return unsafe.Pointer(p)
    }

    // 预留空间失利后的处置惩罚
reservationFailed:
    // If using 64-bit, our reservation is all we have.
    if sys.PtrSize != 4 {
        return nil
    }

    // On 32-bit, once the reservation is gone we can
    // try to get memory at a location chosen by the OS.
    p_size := round(n, _PageSize) + _PageSize
    p := uintptr(sysAlloc(p_size, &memstats.heap_sys))
    if p == 0 {
        return nil
    }

    if p < h.arena_start || p+p_size-h.arena_start > _MaxMem {
        // This shouldn't be possible because _MaxMem is the
        // whole address space on 32-bit.
        top := uint64(h.arena_start) + _MaxMem
        print("runtime: memory allocated by OS (", hex(p), ") not in usable range [", hex(h.arena_start), ",", hex(top), ")\n")
        sysFree(unsafe.Pointer(p), p_size, &memstats.heap_sys)
        return nil
    }

    p += -p & (_PageSize - 1)
    if p+n > h.arena_used {
        h.setArenaUsed(p+n, true)
    }

    if p&(_PageSize-1) != 0 {
        throw("misrounded allocation in MHeap_SysAlloc")
    }
    return unsafe.Pointer(p)
}

以上就是分派对象的完全流程了, 接下来剖析GC标记和接纳对象的处置惩罚.

接纳对象的处置惩罚

接纳对象的流程

GO的GC是并行GC, 也就是GC的大部份处置惩罚和一般的go代码是同时运转的, 这让GO的GC流程比较庞杂.

起首GC有四个阶段, 它们离别是:

Sweep Termination: 对未排除的span举行排除, 只要上一轮的GC的排除事情完成才能够入手下手新一轮的GC

Mark: 扫描一切根对象, 和根对象能够抵达的一切对象, 标记它们不被接纳

Mark Termination: 完成标记事情, 从新扫描部份根对象(要求STW)

Sweep: 按标记效果排除span

下图是比较完全的GC流程, 并按色彩对这四个阶段举行了分类:

在GC历程当中会有两种背景使命(G), 一种是标记用的背景使命, 一种是排除用的背景使命.

标记用的背景使命会在须要时启动, 能够同时事情的背景使命数目约莫是P的数目的25%, 也就是go所讲的让25%的cpu用在GC上的依据.

排除用的背景使命在顺序启动时会启动一个, 进入排除阶段时叫醒.

现在悉数GC流程会举行两次STW(Stop The World), 第一次是Mark阶段的入手下手, 第二次是Mark Termination阶段.

第一次STW会预备根对象的扫描, 启动写屏蔽(Write Barrier)和辅佐GC(mutator assist).

第二次STW会从新扫描部份根对象, 禁用写屏蔽(Write Barrier)和辅佐GC(mutator assist).

须要注重的是, 不是一切根对象的扫描都须要STW, 比方扫描栈上的对象只须要住手具有该栈的G.

从go 1.9入手下手, 写屏蔽的完成运用了Hybrid Write Barrier, 大幅削减了第二次STW的时刻.

GC的触发前提

GC在满足肯定前提后会被触发, 触发前提有以下几种:

  • gcTriggerAlways: 强迫触发GC

  • gcTriggerHeap: 当前分派的内存到达肯定值就触发GC

  • gcTriggerTime: 当肯定时刻没有实行过GC就触发GC

  • gcTriggerCycle: 要求启动新一轮的GC, 已启动则跳过, 手动触发GC的runtime.GC()会运用这个前提

触发前提的推断在gctrigger的test函数.

个中gcTriggerHeap和gcTriggerTime这两个前提是天然触发的, gcTriggerHeap的推断代码以下:

return memstats.heap_live >= memstats.gc_trigger

heap_live的增添在上面临分派器的代码剖析中能够看到, 当值到达gc_trigger就会触发GC, 那末gc_trigger是怎样决议的?
gc_trigger的盘算在gcSetTriggerRatio函数中, 公式是:

trigger = uint64(float64(memstats.heap_marked) * (1 + triggerRatio))

当前标记存活的大小乘以1+系数triggerRatio, 就是下次动身GC须要的分派量.
triggerRatio在每次GC后都邑调解, 盘算triggerRatio的函数是encCycle, 公式是:

const triggerGain = 0.5
// 目的Heap增长率, 默许是1.0
goalGrowthRatio := float64(gcpercent) / 100
// 现实Heap增长率, 即是总大小/存活大小-1
actualGrowthRatio := float64(memstats.heap_live)/float64(memstats.heap_marked) - 1
// GC标记阶段的运用时刻(由于endCycle是在Mark Termination阶段挪用的)
assistDuration := nanotime() - c.markStartTime
// GC标记阶段的CPU占用率, 目的值是0.25
utilization := gcGoalUtilization
if assistDuration > 0 {
    // assistTime是G辅佐GC标记对象所运用的时刻算计
    // (nanosecnds spent in mutator assists during this cycle)
    // 分外的CPU占用率 = 辅佐GC标记对象的总时刻 / (GC标记运用时刻 * P的数目)
    utilization += float64(c.assistTime) / float64(assistDuration*int64(gomaxprocs))
}
// 触发系数偏移值 = 目的增长率 - 原触发系数 - CPU占用率 / 目的CPU占用率 * (现实增长率 - 原触发系数)
// 参数的剖析:
// 现实增长率越大, 触发系数偏移值越小, 小于0时下次触发GC会提早
// CPU占用率越大, 触发系数偏移值越小, 小于0时下次触发GC会提早
// 原触发系数越大, 触发系数偏移值越小, 小于0时下次触发GC会提早
triggerError := goalGrowthRatio - memstats.triggerRatio - utilization/gcGoalUtilization*(actualGrowthRatio-memstats.triggerRatio)
// 依据偏移值调解触发系数, 每次只调解偏移值的一半(渐进式调解)
triggerRatio := memstats.triggerRatio + triggerGain*triggerError

公式中的"目的Heap增长率"能够经由历程设置环境变量"GOGC"调解, 默许值是100, 增添它的值能够削减GC的触发.
设置"GOGC=off"能够完全关掉GC.

gcTriggerTime的推断代码以下:

lastgc := int64(atomic.Load64(&memstats.last_gc_nanotime))
return lastgc != 0 && t.now-lastgc > forcegcperiod

forcegcperiod的定义是2分钟, 也就是2分钟内没有实行过GC就会强迫触发.

三色的定义(黑, 灰, 白)

我看过的对三色GC的"三色"这个观点诠释的最好的文章就是这一篇了, 强烈发起先看这一篇中的解说.

"三色"的观点能够简朴的明白为:

  • 黑色: 对象在这次GC中已标记, 且这个对象包含的子对象也已标记

  • 灰色: 对象在这次GC中已标记, 但这个对象包含的子对象未标记

  • 白色: 对象在这次GC中未标记

在go内部对象并没有保留色彩的属性, 三色只是对它们的状态的形貌,

白色的对象在它地点的span的gcmarkBits中对应的bit为0,

灰色的对象在它地点的span的gcmarkBits中对应的bit为1, 而且对象在标记行列中,

黑色的对象在它地点的span的gcmarkBits中对应的bit为1, 而且对象已从标记行列中掏出并处置惩罚.

gc完成后, gcmarkBits会挪动到allocBits然后从新分派一个悉数为0的bitmap, 如许黑色的对象就变成了白色.

写屏蔽(Write Barrier)

由于go支撑并行GC, GC的扫描和go代码能够同时运转, 如许带来的问题是GC扫描的历程当中go代码有大概转变了对象的依靠树,

比方入手下手扫描时发明根对象A和B, B具有C的指针, GC先扫描A, 然后B把C的指针交给A, GC再扫描B, 这时候C就不会被扫描到.

为了防止这个问题, go在GC的标记阶段会启用写屏蔽(Write Barrier).

启用了写屏蔽(Write Barrier)后, 当B把C的指针交给A时, GC会以为在这一轮的扫描中C的指针是存活的,

纵然A大概会在稍后丢掉C, 那末C就在下一轮接纳.

写屏蔽只针对指针启用, 而且只在GC的标记阶段启用, 平时会直接把值写入到目的地点.

go在1.9入手下手启用了混合写屏蔽(Hybrid Write Barrier), 伪代码以下:

writePointer(slot, ptr):
    shade(*slot)
    if any stack is grey:
        shade(ptr)
    *slot = ptr

混合写屏蔽会同时标记指针写入目的的"原指针"和“新指针".

标记原指针的原因是, 其他运转中的线程有大概会同时把这个指针的值复制到寄存器或许栈上的当地变量,
由于复制指针到寄存器或许栈上的当地变量不会经由写屏蔽, 所以有大概会致使指针不被标记, 试想下面的状态:

[go] b = obj
[go] oldx = nil
[gc] scan oldx...
[go] oldx = b.x // 复制b.x到当地变量, 不进过写屏蔽
[go] b.x = ptr // 写屏蔽应当标记b.x的原值
[gc] scan b...
假如写屏蔽不标记原值, 那末oldx就不会被扫描到.

标记新指针的原因是, 其他运转中的线程有大概会转移指针的位置, 试想下面的状态:

[go] a = ptr
[go] b = obj
[gc] scan b...
[go] b.x = a // 写屏蔽应当标记b.x的新值
[go] a = nil
[gc] scan a...
假如写屏蔽不标记新值, 那末ptr就不会被扫描到.

混合写屏蔽能够让GC在并行标记终了后不须要从新扫描各个G的客栈, 能够削减Mark Termination中的STW时刻.
除了写屏蔽外, 在GC的历程当中一切新分派的对象都邑马上变成黑色, 在上面的mallocgc函数中能够看到.

辅佐GC(mutator assist)

为了防备heap增速太快, 在GC实行的历程当中假如同时运转的G分派了内存, 那末这个G会被要求辅佐GC做一部份的事情.
在GC的历程当中同时运转的G称为"mutator", "mutator assist"机制就是G辅佐GC做一部份事情的机制.

辅佐GC做的事情有两种范例, 一种是标记(Mark), 另一种是排除(Sweep).
辅佐标记的触发能够检察上面的mallocgc函数, 触发时G会协助扫描"事情量"个对象, 事情量的盘算公式是:

debtBytes * assistWorkPerByte

意义是分派的大小乘以系数assistWorkPerByte, assistWorkPerByte的盘算在函数revise中, 公式是:

// 守候扫描的对象数目 = 未扫描的对象数目 - 已扫描的对象数目
scanWorkExpected := int64(memstats.heap_scan) - c.scanWork
if scanWorkExpected < 1000 {
    scanWorkExpected = 1000
}
// 间隔触发GC的Heap大小 = 期待触发GC的Heap大小 - 当前的Heap大小
// 注重next_gc的盘算跟gc_trigger不一样, next_gc即是heap_marked * (1 + gcpercent / 100)
heapDistance := int64(memstats.next_gc) - int64(atomic.Load64(&memstats.heap_live))
if heapDistance <= 0 {
    heapDistance = 1
}
// 每分派1 byte须要辅佐扫描的对象数目 = 守候扫描的对象数目 / 间隔触发GC的Heap大小
c.assistWorkPerByte = float64(scanWorkExpected) / float64(heapDistance)
c.assistBytesPerWork = float64(heapDistance) / float64(scanWorkExpected)

和辅佐标记不一样的是, 辅佐排除要求新span时才会搜检, 而辅佐标记是每次分派对象时都邑搜检.
辅佐排除的触发能够看上面的cacheSpan函数, 触发时G会协助接纳"事情量"页的对象, 事情量的盘算公式是:

spanBytes * sweepPagesPerByte // 不完全雷同, 详细看deductSweepCredit函数

意义是分派的大小乘以系数sweepPagesPerByte, sweepPagesPerByte的盘算在函数gcSetTriggerRatio中, 公式是:

// 当前的Heap大小
heapLiveBasis := atomic.Load64(&memstats.heap_live)
// 间隔触发GC的Heap大小 = 下次触发GC的Heap大小 - 当前的Heap大小
heapDistance := int64(trigger) - int64(heapLiveBasis)
heapDistance -= 1024 * 1024
if heapDistance < _PageSize {
    heapDistance = _PageSize
}
// 已排除的页数
pagesSwept := atomic.Load64(&mheap_.pagesSwept)
// 未排除的页数 = 运用中的页数 - 已排除的页数
sweepDistancePages := int64(mheap_.pagesInUse) - int64(pagesSwept)
if sweepDistancePages <= 0 {
    mheap_.sweepPagesPerByte = 0
} else {
    // 每分派1 byte(的span)须要辅佐排除的页数 = 未排除的页数 / 间隔触发GC的Heap大小
    mheap_.sweepPagesPerByte = float64(sweepDistancePages) / float64(heapDistance)
}

根对象

在GC的标记阶段起首须要标记的就是"根对象", 从根对象入手下手可抵达的一切对象都邑被以为是存活的.
根对象包含了全局变量, 各个G的栈上的变量等, GC会先扫描根对象然后再扫描根对象可抵达的一切对象.
扫描根对象包含了一系列的事情, 它们定义在函数:

Fixed Roots: 特别的扫描事情

fixedRootFinalizers: 扫描析构器行列

fixedRootFreeGStacks: 开释已中断的G的栈

Flush Cache Roots: 开释mcache中的一切span, 要求STW

Data Roots: 扫描可读写的全局变量

BSS Roots: 扫描只读的全局变量

Span Roots: 扫描各个span中特别对象(析构器列表)

Stack Roots: 扫描各个G的栈

标记阶段(Mark)会做个中的"Fixed Roots", "Data Roots", "BSS Roots", "Span Roots", "Stack Roots".

完成标记阶段(Mark Termination)会做个中的"Fixed Roots", "Flush Cache Roots".

标记行列

GC的标记阶段会运用"标记行列"来肯定一切可从根对象抵达的对象都已标记, 上面提到的"灰色"的对象就是在标记行列中的对象.

举例来讲, 假如当前有[A, B, C]这三个根对象, 那末扫描根对象时就会把它们放到标记行列:

work queue: [A, B, C]

背景标记使命从标记行列中掏出A, 假如A援用了D, 则把D放入标记行列:

work queue: [B, C, D]

背景标记使命从标记行列掏出B, 假如B也援用了D, 这时候由于D在gcmarkBits中对应的bit已是1所以会跳过:

work queue: [C, D]

假如并行运转的go代码分派了一个对象E, 对象E会被马上标记, 但不会进入标记行列(由于肯定E没有援用其他对象).

然后并行运转的go代码把对象F设置给对象E的成员, 写屏蔽会标记对象F然后把对象F加到运转行列:

work queue: [C, D, F]

背景标记使命从标记行列掏出C, 假如C没有援用其他对象, 则不须要处置惩罚:

work queue: [D, F]

背景标记使命从标记行列掏出D, 假如D援用了X, 则把X放入标记行列:

work queue: [F, X]

背景标记使命从标记行列掏出F, 假如F没有援用其他对象, 则不须要处置惩罚.

背景标记使命从标记行列掏出X, 假如X没有援用其他对象, 则不须要处置惩罚.

末了标记行列为空, 标记完成, 存活的对象有[A, B, C, D, E, F, X].

现实的状态会比上面引见的状态轻微庞杂一点.

标记行列会分为全局标记行列和各个P的当地标记行列, 这点和协程中的运转行列类似.

而且标记行列为空今后, 还须要住手悉数天下并制止写屏蔽, 然后再次搜检是不是为空.

源代码剖析

go触发gc会从gcStart函数入手下手:

// gcStart transitions the GC from _GCoff to _GCmark (if
// !mode.stwMark) or _GCmarktermination (if mode.stwMark) by
// performing sweep termination and GC initialization.
//
// This may return without performing this transition in some cases,
// such as when called on a system stack or with locks held.
func gcStart(mode gcMode, trigger gcTrigger) {
    // 推断当前G是不是可抢占, 不可抢占时不触发GC
    // Since this is called from malloc and malloc is called in
    // the guts of a number of libraries that might be holding
    // locks, don't attempt to start GC in non-preemptible or
    // potentially unstable situations.
    mp := acquirem()
    if gp := getg(); gp == mp.g0 || mp.locks > 1 || mp.preemptoff != "" {
        releasem(mp)
        return
    }
    releasem(mp)
    mp = nil

    // 并行排除上一轮GC未排除的span
    // Pick up the remaining unswept/not being swept spans concurrently
    //
    // This shouldn't happen if we're being invoked in background
    // mode since proportional sweep should have just finished
    // sweeping everything, but rounding errors, etc, may leave a
    // few spans unswept. In forced mode, this is necessary since
    // GC can be forced at any point in the sweeping cycle.
    //
    // We check the transition condition continuously here in case
    // this G gets delayed in to the next GC cycle.
    for trigger.test() && gosweepone() != ^uintptr(0) {
        sweep.nbgsweep++
    }

    // 上锁, 然后从新搜检gcTrigger的前提是不是建立, 不建立时不触发GC
    // Perform GC initialization and the sweep termination
    // transition.
    semacquire(&work.startSema)
    // Re-check transition condition under transition lock.
    if !trigger.test() {
        semrelease(&work.startSema)
        return
    }

    // 纪录是不是强迫触发, gcTriggerCycle是runtime.GC用的
    // For stats, check if this GC was forced by the user.
    work.userForced = trigger.kind == gcTriggerAlways || trigger.kind == gcTriggerCycle

    // 推断是不是指定了制止并行GC的参数
    // In gcstoptheworld debug mode, upgrade the mode accordingly.
    // We do this after re-checking the transition condition so
    // that multiple goroutines that detect the heap trigger don't
    // start multiple STW GCs.
    if mode == gcBackgroundMode {
        if debug.gcstoptheworld == 1 {
            mode = gcForceMode
        } else if debug.gcstoptheworld == 2 {
            mode = gcForceBlockMode
        }
    }

    // Ok, we're doing it!  Stop everybody else
    semacquire(&worldsema)

    // 跟踪处置惩罚
    if trace.enabled {
        traceGCStart()
    }

    // 启动背景扫描使命(G)
    if mode == gcBackgroundMode {
        gcBgMarkStartWorkers()
    }

    // 重置标记相干的状态
    gcResetMarkState()

    // 重置参数
    work.stwprocs, work.maxprocs = gcprocs(), gomaxprocs
    work.heap0 = atomic.Load64(&memstats.heap_live)
    work.pauseNS = 0
    work.mode = mode

    // 纪录入手下手时刻
    now := nanotime()
    work.tSweepTerm = now
    work.pauseStart = now
    
    // 住手一切运转中的G, 并制止它们运转
    systemstack(stopTheWorldWithSema)
    
    // !!!!!!!!!!!!!!!!
    // 天下已住手(STW)...
    // !!!!!!!!!!!!!!!!
    
    // 排除上一轮GC未排除的span, 确保上一轮GC已完成
    // Finish sweep before we start concurrent scan.
    systemstack(func() {
        finishsweep_m()
    })
    // 排除sched.sudogcache和sched.deferpool
    // clearpools before we start the GC. If we wait they memory will not be
    // reclaimed until the next GC cycle.
    clearpools()

    // 增添GC计数
    work.cycles++
    
    // 推断是不是并行GC形式
    if mode == gcBackgroundMode { // Do as much work concurrently as possible
        // 标记新一轮GC已入手下手
        gcController.startCycle()
        work.heapGoal = memstats.next_gc

        // 设置全局变量中的GC状态为_GCmark
        // 然后启用写屏蔽
        // Enter concurrent mark phase and enable
        // write barriers.
        //
        // Because the world is stopped, all Ps will
        // observe that write barriers are enabled by
        // the time we start the world and begin
        // scanning.
        //
        // Write barriers must be enabled before assists are
        // enabled because they must be enabled before
        // any non-leaf heap objects are marked. Since
        // allocations are blocked until assists can
        // happen, we want enable assists as early as
        // possible.
        setGCPhase(_GCmark)

        // 重置背景标记使命的计数
        gcBgMarkPrepare() // Must happen before assist enable.

        // 盘算扫描根对象的使命数目
        gcMarkRootPrepare()

        // 标记一切tiny alloc守候兼并的对象
        // Mark all active tinyalloc blocks. Since we're
        // allocating from these, they need to be black like
        // other allocations. The alternative is to blacken
        // the tiny block on every allocation from it, which
        // would slow down the tiny allocator.
        gcMarkTinyAllocs()

        // 启用辅佐GC
        // At this point all Ps have enabled the write
        // barrier, thus maintaining the no white to
        // black invariant. Enable mutator assists to
        // put back-pressure on fast allocating
        // mutators.
        atomic.Store(&gcBlackenEnabled, 1)

        // 纪录标记入手下手的时刻
        // Assists and workers can start the moment we start
        // the world.
        gcController.markStartTime = now

        // 从新启动天下
        // 前面建立的背景标记使命会入手下手事情, 一切背景标记使命都完成事情后, 进入完成标记阶段
        // Concurrent mark.
        systemstack(startTheWorldWithSema)
        
        // !!!!!!!!!!!!!!!
        // 天下已从新启动...
        // !!!!!!!!!!!!!!!
        
        // 纪录住手了多久, 和标记阶段入手下手的时刻
        now = nanotime()
        work.pauseNS += now - work.pauseStart
        work.tMark = now
    } else {
        // 不是并行GC形式
        // 纪录完成标记阶段入手下手的时刻
        t := nanotime()
        work.tMark, work.tMarkTerm = t, t
        work.heapGoal = work.heap0

        // 跳过标记阶段, 实行完成标记阶段
        // 一切标记事情都邑在天下已住手的状态实行
        // (标记阶段会设置work.markrootDone=true, 假如跳过则它的值是false, 完成标记阶段会实行一切事情)
        // 完成标记阶段会从新启动天下
        // Perform mark termination. This will restart the world.
        gcMarkTermination(memstats.triggerRatio)
    }

    semrelease(&work.startSema)
}

接下来一个个剖析gcStart挪用的函数, 发起合营上面的"接纳对象的流程"中的图明白.

函数gcBgMarkStartWorkers用于启动背景标记使命, 先离别对每一个P启动一个:

// gcBgMarkStartWorkers prepares background mark worker goroutines.
// These goroutines will not run until the mark phase, but they must
// be started while the work is not stopped and from a regular G
// stack. The caller must hold worldsema.
func gcBgMarkStartWorkers() {
    // Background marking is performed by per-P G's. Ensure that
    // each P has a background GC G.
    for _, p := range &allp {
        if p == nil || p.status == _Pdead {
            break
        }
        // 假如已启动则不反复启动
        if p.gcBgMarkWorker == 0 {
            go gcBgMarkWorker(p)
            // 启动后守候该使命关照信号量bgMarkReady再继承
            notetsleepg(&work.bgMarkReady, -1)
            noteclear(&work.bgMarkReady)
        }
    }
}

这里虽然为每一个P启动了一个背景标记使命, 然则能够同时事情的只要25%, 这个逻辑在协程M猎取G时挪用的findRunnableGCWorker中:

// findRunnableGCWorker returns the background mark worker for _p_ if it
// should be run. This must only be called when gcBlackenEnabled != 0.
func (c *gcControllerState) findRunnableGCWorker(_p_ *p) *g {
    if gcBlackenEnabled == 0 {
        throw("gcControllerState.findRunnable: blackening not enabled")
    }
    if _p_.gcBgMarkWorker == 0 {
        // The mark worker associated with this P is blocked
        // performing a mark transition. We can't run it
        // because it may be on some other run or wait queue.
        return nil
    }

    if !gcMarkWorkAvailable(_p_) {
        // No work to be done right now. This can happen at
        // the end of the mark phase when there are still
        // assists tapering off. Don't bother running a worker
        // now because it'll just return immediately.
        return nil
    }

    // 原子削减对应的值, 假如削减后大于即是0则返回true, 不然返回false
    decIfPositive := func(ptr *int64) bool {
        if *ptr > 0 {
            if atomic.Xaddint64(ptr, -1) >= 0 {
                return true
            }
            // We lost a race
            atomic.Xaddint64(ptr, +1)
        }
        return false
    }

    // 削减dedicatedMarkWorkersNeeded, 胜利时背景标记使命的形式是Dedicated
    // dedicatedMarkWorkersNeeded是当前P的数目的25%去除小数点
    // 详见startCycle函数
    if decIfPositive(&c.dedicatedMarkWorkersNeeded) {
        // This P is now dedicated to marking until the end of
        // the concurrent mark phase.
        _p_.gcMarkWorkerMode = gcMarkWorkerDedicatedMode
    } else {
        // 削减fractionalMarkWorkersNeeded, 胜利是背景标记使命的形式是Fractional
        // 上面的盘算假如小数点后有数值(不能够整除)则fractionalMarkWorkersNeeded为1, 不然为0
        // 详见startCycle函数
        // 举例来讲, 4个P时会实行1个Dedicated形式的使命, 5个P时会实行1个Dedicated形式和1个Fractional形式的使命
        if !decIfPositive(&c.fractionalMarkWorkersNeeded) {
            // No more workers are need right now.
            return nil
        }

        // 按Dedicated形式的使命的实行时刻推断cpu占用率是不是凌驾预算值, 凌驾时不启动
        // This P has picked the token for the fractional worker.
        // Is the GC currently under or at the utilization goal?
        // If so, do more work.
        //
        // We used to check whether doing one time slice of work
        // would remain under the utilization goal, but that has the
        // effect of delaying work until the mutator has run for
        // enough time slices to pay for the work. During those time
        // slices, write barriers are enabled, so the mutator is running slower.
        // Now instead we do the work whenever we're under or at the
        // utilization work and pay for it by letting the mutator run later.
        // This doesn't change the overall utilization averages, but it
        // front loads the GC work so that the GC finishes earlier and
        // write barriers can be turned off sooner, effectively giving
        // the mutator a faster machine.
        //
        // The old, slower behavior can be restored by setting
        //  gcForcePreemptNS = forcePreemptNS.
        const gcForcePreemptNS = 0

        // TODO(austin): We could fast path this and basically
        // eliminate contention on c.fractionalMarkWorkersNeeded by
        // precomputing the minimum time at which it's worth
        // next scheduling the fractional worker. Then Ps
        // don't have to fight in the window where we've
        // passed that deadline and no one has started the
        // worker yet.
        //
        // TODO(austin): Shorter preemption interval for mark
        // worker to improve fairness and give this
        // finer-grained control over schedule?
        now := nanotime() - gcController.markStartTime
        then := now + gcForcePreemptNS
        timeUsed := c.fractionalMarkTime + gcForcePreemptNS
        if then > 0 && float64(timeUsed)/float64(then) > c.fractionalUtilizationGoal {
            // Nope, we'd overshoot the utilization goal
            atomic.Xaddint64(&c.fractionalMarkWorkersNeeded, +1)
            return nil
        }
        _p_.gcMarkWorkerMode = gcMarkWorkerFractionalMode
    }

    // 部署背景标记使命实行
    // Run the background mark worker
    gp := _p_.gcBgMarkWorker.ptr()
    casgstatus(gp, _Gwaiting, _Grunnable)
    if trace.enabled {
        traceGoUnpark(gp, 0)
    }
    return gp
}

gcResetMarkState函数会重置标记相干的状态:

// gcResetMarkState resets global state prior to marking (concurrent
// or STW) and resets the stack scan state of all Gs.
//
// This is safe to do without the world stopped because any Gs created
// during or after this will start out in the reset state.
func gcResetMarkState() {
    // This may be called during a concurrent phase, so make sure
    // allgs doesn't change.
    lock(&allglock)
    for _, gp := range allgs {
        gp.gcscandone = false  // set to true in gcphasework
        gp.gcscanvalid = false // stack has not been scanned
        gp.gcAssistBytes = 0
    }
    unlock(&allglock)

    work.bytesMarked = 0
    work.initialHeapLive = atomic.Load64(&memstats.heap_live)
    work.markrootDone = false
}

stopTheWorldWithSema函数会住手悉数天下, 这个函数必需在g0中运转:

// stopTheWorldWithSema is the core implementation of stopTheWorld.
// The caller is responsible for acquiring worldsema and disabling
// preemption first and then should stopTheWorldWithSema on the system
// stack:
//
//  semacquire(&worldsema, 0)
//  m.preemptoff = "reason"
//  systemstack(stopTheWorldWithSema)
//
// When finished, the caller must either call startTheWorld or undo
// these three operations separately:
//
//  m.preemptoff = ""
//  systemstack(startTheWorldWithSema)
//  semrelease(&worldsema)
//
// It is allowed to acquire worldsema once and then execute multiple
// startTheWorldWithSema/stopTheWorldWithSema pairs.
// Other P's are able to execute between successive calls to
// startTheWorldWithSema and stopTheWorldWithSema.
// Holding worldsema causes any other goroutines invoking
// stopTheWorld to block.
func stopTheWorldWithSema() {
    _g_ := getg()

    // If we hold a lock, then we won't be able to stop another M
    // that is blocked trying to acquire the lock.
    if _g_.m.locks > 0 {
        throw("stopTheWorld: holding locks")
    }

    lock(&sched.lock)
    
    // 须要住手的P数目
    sched.stopwait = gomaxprocs
    
    // 设置gc守候标记, 调理时看见此标记会进入守候
    atomic.Store(&sched.gcwaiting, 1)
    
    // 抢占一切运转中的G
    preemptall()
    
    // 住手当前的P
    // stop current P
    _g_.m.p.ptr().status = _Pgcstop // Pgcstop is only diagnostic.
    
    // 削减须要住手的P数目(当前的P算一个)
    sched.stopwait--
    
    // 抢占一切在Psyscall状态的P, 防备它们从新介入调理
    // try to retake all P's in Psyscall status
    for i := 0; i < int(gomaxprocs); i++ {
        p := allp[i]
        s := p.status
        if s == _Psyscall && atomic.Cas(&p.status, s, _Pgcstop) {
            if trace.enabled {
                traceGoSysBlock(p)
                traceProcStop(p)
            }
            p.syscalltick++
            sched.stopwait--
        }
    }
    
    // 防备一切余暇的P从新介入调理
    // stop idle P's
    for {
        p := pidleget()
        if p == nil {
            break
        }
        p.status = _Pgcstop
        sched.stopwait--
    }
    wait := sched.stopwait > 0
    unlock(&sched.lock)

    // 假如仍有须要住手的P, 则守候它们住手
    // wait for remaining P's to stop voluntarily
    if wait {
        for {
            // 轮回守候 + 抢占一切运转中的G
            // wait for 100us, then try to re-preempt in case of any races
            if notetsleep(&sched.stopnote, 100*1000) {
                noteclear(&sched.stopnote)
                break
            }
            preemptall()
        }
    }

    // 逻辑正确性搜检
    // sanity checks
    bad := ""
    if sched.stopwait != 0 {
        bad = "stopTheWorld: not stopped (stopwait != 0)"
    } else {
        for i := 0; i < int(gomaxprocs); i++ {
            p := allp[i]
            if p.status != _Pgcstop {
                bad = "stopTheWorld: not stopped (status != _Pgcstop)"
            }
        }
    }
    if atomic.Load(&freezing) != 0 {
        // Some other thread is panicking. This can cause the
        // sanity checks above to fail if the panic happens in
        // the signal handler on a stopped thread. Either way,
        // we should halt this thread.
        lock(&deadlock)
        lock(&deadlock)
    }
    if bad != "" {
        throw(bad)
    }
    
    // 到这里一切运转中的G都邑变成待运转, 而且一切的P都不能被M猎取
    // 也就是说一切的go代码(除了当前的)都邑住手运转, 而且不能运转新的go代码
}

finishsweep_m函数会排除上一轮GC未排除的span, 确保上一轮GC已完成:

// finishsweep_m ensures that all spans are swept.
//
// The world must be stopped. This ensures there are no sweeps in
// progress.
//
//go:nowritebarrier
func finishsweep_m() {
    // sweepone会掏出一个未sweep的span然后实行sweep
    // 细致将在下面sweep阶段时剖析
    // Sweeping must be complete before marking commences, so
    // sweep any unswept spans. If this is a concurrent GC, there
    // shouldn't be any spans left to sweep, so this should finish
    // instantly. If GC was forced before the concurrent sweep
    // finished, there may be spans to sweep.
    for sweepone() != ^uintptr(0) {
        sweep.npausesweep++
    }

    // 一切span都sweep完成后, 启动一个新的markbit时期
    // 这个函数是完成span的gcmarkBits和allocBits的分派和复用的症结, 流程以下
    // - span分派gcmarkBits和allocBits
    // - span完成sweep
    //   - 原allocBits不再被运用
    //   - gcmarkBits变成allocBits
    //   - 分派新的gcmarkBits
    // - 开启新的markbit时期
    // - span完成sweep, 同上
    // - 开启新的markbit时期
    //   - 2个时期之前的bitmap将不再被运用, 能够复用这些bitmap
    nextMarkBitArenaEpoch()
}

clearpools函数会清算sched.sudogcache和sched.deferpool, 让它们的内存能够被接纳:

func clearpools() {
    // clear sync.Pools
    if poolcleanup != nil {
        poolcleanup()
    }

    // Clear central sudog cache.
    // Leave per-P caches alone, they have strictly bounded size.
    // Disconnect cached list before dropping it on the floor,
    // so that a dangling ref to one entry does not pin all of them.
    lock(&sched.sudoglock)
    var sg, sgnext *sudog
    for sg = sched.sudogcache; sg != nil; sg = sgnext {
        sgnext = sg.next
        sg.next = nil
    }
    sched.sudogcache = nil
    unlock(&sched.sudoglock)

    // Clear central defer pools.
    // Leave per-P pools alone, they have strictly bounded size.
    lock(&sched.deferlock)
    for i := range sched.deferpool {
        // disconnect cached list before dropping it on the floor,
        // so that a dangling ref to one entry does not pin all of them.
        var d, dlink *_defer
        for d = sched.deferpool[i]; d != nil; d = dlink {
            dlink = d.link
            d.link = nil
        }
        sched.deferpool[i] = nil
    }
    unlock(&sched.deferlock)
}

startCycle标记入手下手了新一轮的GC:

// startCycle resets the GC controller's state and computes estimates
// for a new GC cycle. The caller must hold worldsema.
func (c *gcControllerState) startCycle() {
    c.scanWork = 0
    c.bgScanCredit = 0
    c.assistTime = 0
    c.dedicatedMarkTime = 0
    c.fractionalMarkTime = 0
    c.idleMarkTime = 0

    // 假装heap_marked的值假如gc_trigger的值很小, 防备背面临triggerRatio做出毛病的调解
    // If this is the first GC cycle or we're operating on a very
    // small heap, fake heap_marked so it looks like gc_trigger is
    // the appropriate growth from heap_marked, even though the
    // real heap_marked may not have a meaningful value (on the
    // first cycle) or may be much smaller (resulting in a large
    // error response).
    if memstats.gc_trigger <= heapminimum {
        memstats.heap_marked = uint64(float64(memstats.gc_trigger) / (1 + memstats.triggerRatio))
    }

    // 从新盘算next_gc, 注重next_gc的盘算跟gc_trigger不一样
    // Re-compute the heap goal for this cycle in case something
    // changed. This is the same calculation we use elsewhere.
    memstats.next_gc = memstats.heap_marked + memstats.heap_marked*uint64(gcpercent)/100
    if gcpercent < 0 {
        memstats.next_gc = ^uint64(0)
    }

    // 确保next_gc和heap_live之间起码有1MB
    // Ensure that the heap goal is at least a little larger than
    // the current live heap size. This may not be the case if GC
    // start is delayed or if the allocation that pushed heap_live
    // over gc_trigger is large or if the trigger is really close to
    // GOGC. Assist is proportional to this distance, so enforce a
    // minimum distance, even if it means going over the GOGC goal
    // by a tiny bit.
    if memstats.next_gc < memstats.heap_live+1024*1024 {
        memstats.next_gc = memstats.heap_live + 1024*1024
    }

    // 盘算能够同时实行的背景标记使命的数目
    // dedicatedMarkWorkersNeeded即是P的数目的25%去除小数点
    // 假如能够整除则fractionalMarkWorkersNeeded即是0不然即是1
    // totalUtilizationGoal是GC所占的P的目的值(比方P一共有5个时目的是1.25个P)
    // fractionalUtilizationGoal是Fractiona形式的使命所占的P的目的值(比方P一共有5个时目的是0.25个P)
    // Compute the total mark utilization goal and divide it among
    // dedicated and fractional workers.
    totalUtilizationGoal := float64(gomaxprocs) * gcGoalUtilization
    c.dedicatedMarkWorkersNeeded = int64(totalUtilizationGoal)
    c.fractionalUtilizationGoal = totalUtilizationGoal - float64(c.dedicatedMarkWorkersNeeded)
    if c.fractionalUtilizationGoal > 0 {
        c.fractionalMarkWorkersNeeded = 1
    } else {
        c.fractionalMarkWorkersNeeded = 0
    }

    // 重置P中的辅佐GC所用的时刻统计
    // Clear per-P state
    for _, p := range &allp {
        if p == nil {
            break
        }
        p.gcAssistTime = 0
    }

    // 盘算辅佐GC的参数
    // 参考上面临盘算assistWorkPerByte的公式的剖析
    // Compute initial values for controls that are updated
    // throughout the cycle.
    c.revise()

    if debug.gcpacertrace > 0 {
        print("pacer: assist ratio=", c.assistWorkPerByte,
            " (scan ", memstats.heap_scan>>20, " MB in ",
            work.initialHeapLive>>20, "->",
            memstats.next_gc>>20, " MB)",
            " workers=", c.dedicatedMarkWorkersNeeded,
            "+", c.fractionalMarkWorkersNeeded, "\n")
    }
}

setGCPhase函数会修正示意当前GC阶段的全局变量和是不是开启写屏蔽的全局变量:

//go:nosplit
func setGCPhase(x uint32) {
    atomic.Store(&gcphase, x)
    writeBarrier.needed = gcphase == _GCmark || gcphase == _GCmarktermination
    writeBarrier.enabled = writeBarrier.needed || writeBarrier.cgo
}

gcBgMarkPrepare函数会重置背景标记使命的计数:

// gcBgMarkPrepare sets up state for background marking.
// Mutator assists must not yet be enabled.
func gcBgMarkPrepare() {
    // Background marking will stop when the work queues are empty
    // and there are no more workers (note that, since this is
    // concurrent, this may be a transient state, but mark
    // termination will clean it up). Between background workers
    // and assists, we don't really know how many workers there
    // will be, so we pretend to have an arbitrarily large number
    // of workers, almost all of which are "waiting". While a
    // worker is working it decrements nwait. If nproc == nwait,
    // there are no workers.
    work.nproc = ^uint32(0)
    work.nwait = ^uint32(0)
}

gcMarkRootPrepare函数会盘算扫描根对象的使命数目:

// gcMarkRootPrepare queues root scanning jobs (stacks, globals, and
// some miscellany) and initializes scanning-related state.
//
// The caller must have call gcCopySpans().
//
// The world must be stopped.
//
//go:nowritebarrier
func gcMarkRootPrepare() {
    // 开释mcache中的一切span的使命, 只在完成标记阶段(mark termination)中实行
    if gcphase == _GCmarktermination {
        work.nFlushCacheRoots = int(gomaxprocs)
    } else {
        work.nFlushCacheRoots = 0
    }

    // 盘算block数目的函数, rootBlockBytes是256KB
    // Compute how many data and BSS root blocks there are.
    nBlocks := func(bytes uintptr) int {
        return int((bytes + rootBlockBytes - 1) / rootBlockBytes)
    }

    work.nDataRoots = 0
    work.nBSSRoots = 0

    // data和bss每一轮GC只扫描一次
    // 并行GC中会在背景标记使命中扫描, 完成标记阶段(mark termination)中不扫描
    // 非并行GC会在完成标记阶段(mark termination)中扫描
    // Only scan globals once per cycle; preferably concurrently.
    if !work.markrootDone {
        // 盘算扫描可读写的全局变量的使命数目
        for _, datap := range activeModules() {
            nDataRoots := nBlocks(datap.edata - datap.data)
            if nDataRoots > work.nDataRoots {
                work.nDataRoots = nDataRoots
            }
        }

        // 盘算扫描只读的全局变量的使命数目
        for _, datap := range activeModules() {
            nBSSRoots := nBlocks(datap.ebss - datap.bss)
            if nBSSRoots > work.nBSSRoots {
                work.nBSSRoots = nBSSRoots
            }
        }
    }

    // span中的finalizer和各个G的栈每一轮GC只扫描一次
    // 同上
    if !work.markrootDone {
        // 盘算扫描span中的finalizer的使命数目
        // On the first markroot, we need to scan span roots.
        // In concurrent GC, this happens during concurrent
        // mark and we depend on addfinalizer to ensure the
        // above invariants for objects that get finalizers
        // after concurrent mark. In STW GC, this will happen
        // during mark termination.
        //
        // We're only interested in scanning the in-use spans,
        // which will all be swept at this point. More spans
        // may be added to this list during concurrent GC, but
        // we only care about spans that were allocated before
        // this mark phase.
        work.nSpanRoots = mheap_.sweepSpans[mheap_.sweepgen/2%2].numBlocks()

        // 盘算扫描各个G的栈的使命数目
        // On the first markroot, we need to scan all Gs. Gs
        // may be created after this point, but it's okay that
        // we ignore them because they begin life without any
        // roots, so there's nothing to scan, and any roots
        // they create during the concurrent phase will be
        // scanned during mark termination. During mark
        // termination, allglen isn't changing, so we'll scan
        // all Gs.
        work.nStackRoots = int(atomic.Loaduintptr(&allglen))
    } else {
        // We've already scanned span roots and kept the scan
        // up-to-date during concurrent mark.
        work.nSpanRoots = 0

        // The hybrid barrier ensures that stacks can't
        // contain pointers to unmarked objects, so on the
        // second markroot, there's no need to scan stacks.
        work.nStackRoots = 0

        if debug.gcrescanstacks > 0 {
            // Scan stacks anyway for debugging.
            work.nStackRoots = int(atomic.Loaduintptr(&allglen))
        }
    }

    // 盘算总使命数目
    // 背景标记使命会对markrootNext举行原子递增, 来决议做哪一个使命
    // 这类用数值来完成锁自在行列的方法挺智慧的, 只管google工程师以为不好(看背面markroot函数的剖析)
    work.markrootNext = 0
    work.markrootJobs = uint32(fixedRootCount + work.nFlushCacheRoots + work.nDataRoots + work.nBSSRoots + work.nSpanRoots + work.nStackRoots)
}

gcMarkTinyAllocs函数会标记一切tiny alloc守候兼并的对象:

// gcMarkTinyAllocs greys all active tiny alloc blocks.
//
// The world must be stopped.
func gcMarkTinyAllocs() {
    for _, p := range &allp {
        if p == nil || p.status == _Pdead {
            break
        }
        c := p.mcache
        if c == nil || c.tiny == 0 {
            continue
        }
        // 标记各个P中的mcache中的tiny
        // 在上面的mallocgc函数中能够看到tiny是当前守候兼并的对象
        _, hbits, span, objIndex := heapBitsForObject(c.tiny, 0, 0)
        gcw := &p.gcw
        // 标记一个对象存活, 并把它加到标记行列(该对象变成灰色)
        greyobject(c.tiny, 0, 0, hbits, span, gcw, objIndex)
        // gcBlackenPromptly变量示意当前是不是制止当地行列, 假如已制止则把标记使命flush到全局行列
        if gcBlackenPromptly {
            gcw.dispose()
        }
    }
}

startTheWorldWithSema函数会从新启动天下:

func startTheWorldWithSema() {
    _g_ := getg()
    
    // 制止G被抢占
    _g_.m.locks++        // disable preemption because it can be holding p in a local var
    
    // 推断收到的收集事宜(fd可读可写或毛病)并增加对应的G到待运转行列
    gp := netpoll(false) // non-blocking
    injectglist(gp)
    
    // 推断是不是要启动gc helper
    add := needaddgcproc()
    lock(&sched.lock)
    
    // 假如要求转变gomaxprocs则调解P的数目
    // procresize会返回有可运转使命的P的链表
    procs := gomaxprocs
    if newprocs != 0 {
        procs = newprocs
        newprocs = 0
    }
    p1 := procresize(procs)
    
    // 作废GC守候标记
    sched.gcwaiting = 0
    
    // 假如sysmon在守候则叫醒它
    if sched.sysmonwait != 0 {
        sched.sysmonwait = 0
        notewakeup(&sched.sysmonnote)
    }
    unlock(&sched.lock)
    
    // 叫醒有可运转使命的P
    for p1 != nil {
        p := p1
        p1 = p1.link.ptr()
        if p.m != 0 {
            mp := p.m.ptr()
            p.m = 0
            if mp.nextp != 0 {
                throw("startTheWorld: inconsistent mp->nextp")
            }
            mp.nextp.set(p)
            notewakeup(&mp.park)
        } else {
            // Start M to run P.  Do not start another M below.
            newm(nil, p)
            add = false
        }
    }
    
    // 假如有余暇的P,而且没有自旋中的M则叫醒或许建立一个M
    // Wakeup an additional proc in case we have excessive runnable goroutines
    // in local queues or in the global queue. If we don't, the proc will park itself.
    // If we have lots of excessive work, resetspinning will unpark additional procs as necessary.
    if atomic.Load(&sched.npidle) != 0 && atomic.Load(&sched.nmspinning) == 0 {
        wakep()
    }
    
    // 启动gc helper
    if add {
        // If GC could have used another helper proc, start one now,
        // in the hope that it will be available next time.
        // It would have been even better to start it before the collection,
        // but doing so requires allocating memory, so it's tricky to
        // coordinate. This lazy approach works out in practice:
        // we don't mind if the first couple gc rounds don't have quite
        // the maximum number of procs.
        newm(mhelpgc, nil)
    }
    
    // 许可G被抢占
    _g_.m.locks--
    
    // 假如当前G要求被抢占则从新尝试
    if _g_.m.locks == 0 && _g_.preempt { // restore the preemption request in case we've cleared it in newstack
        _g_.stackguard0 = stackPreempt
    }
}

重启天下后各个M会从新入手下手调理, 调理时会优先运用上面提到的findRunnableGCWorker函数查找使命, 今后就有约莫25%的P运转背景标记使命.
背景标记使命的函数是gcBgMarkWorker:

func gcBgMarkWorker(_p_ *p) {
    gp := getg()
    
    // 用于休眠后从新猎取P的组织体
    type parkInfo struct {
        m      muintptr // Release this m on park.
        attach puintptr // If non-nil, attach to this p on park.
    }
    // We pass park to a gopark unlock function, so it can't be on
    // the stack (see gopark). Prevent deadlock from recursively
    // starting GC by disabling preemption.
    gp.m.preemptoff = "GC worker init"
    park := new(parkInfo)
    gp.m.preemptoff = ""
    
    // 设置当前的M并制止抢占
    park.m.set(acquirem())
    // 设置当前的P(须要关联到的P)
    park.attach.set(_p_)
    
    // 关照gcBgMarkStartWorkers能够继承处置惩罚
    // Inform gcBgMarkStartWorkers that this worker is ready.
    // After this point, the background mark worker is scheduled
    // cooperatively by gcController.findRunnable. Hence, it must
    // never be preempted, as this would put it into _Grunnable
    // and put it on a run queue. Instead, when the preempt flag
    // is set, this puts itself into _Gwaiting to be woken up by
    // gcController.findRunnable at the appropriate time.
    notewakeup(&work.bgMarkReady)
    
    for {
        // 让当前G进入休眠
        // Go to sleep until woken by gcController.findRunnable.
        // We can't releasem yet since even the call to gopark
        // may be preempted.
        gopark(func(g *g, parkp unsafe.Pointer) bool {
            park := (*parkInfo)(parkp)
            
            // 从新许可抢占
            // The worker G is no longer running, so it's
            // now safe to allow preemption.
            releasem(park.m.ptr())
            
            // 设置关联的P
            // 把当前的G设到P的gcBgMarkWorker成员, 下次findRunnableGCWorker会运用
            // 设置失利时不休眠
            // If the worker isn't attached to its P,
            // attach now. During initialization and after
            // a phase change, the worker may have been
            // running on a different P. As soon as we
            // attach, the owner P may schedule the
            // worker, so this must be done after the G is
            // stopped.
            if park.attach != 0 {
                p := park.attach.ptr()
                park.attach.set(nil)
                // cas the worker because we may be
                // racing with a new worker starting
                // on this P.
                if !p.gcBgMarkWorker.cas(0, guintptr(unsafe.Pointer(g))) {
                    // The P got a new worker.
                    // Exit this worker.
                    return false
                }
            }
            return true
        }, unsafe.Pointer(park), "GC worker (idle)", traceEvGoBlock, 0)
        
        // 搜检P的gcBgMarkWorker是不是和当前的G一致, 不一致时终了当前的使命
        // Loop until the P dies and disassociates this
        // worker (the P may later be reused, in which case
        // it will get a new worker) or we failed to associate.
        if _p_.gcBgMarkWorker.ptr() != gp {
            break
        }
        
        // 制止G被抢占
        // Disable preemption so we can use the gcw. If the
        // scheduler wants to preempt us, we'll stop draining,
        // dispose the gcw, and then preempt.
        park.m.set(acquirem())
        
        if gcBlackenEnabled == 0 {
            throw("gcBgMarkWorker: blackening not enabled")
        }
        
        // 纪录入手下手时刻
        startTime := nanotime()
        
        decnwait := atomic.Xadd(&work.nwait, -1)
        if decnwait == work.nproc {
            println("runtime: work.nwait=", decnwait, "work.nproc=", work.nproc)
            throw("work.nwait was > work.nproc")
        }
        
        // 切换到g0运转
        systemstack(func() {
            // 设置G的状态为守候中如许它的栈能够被扫描(两个背景标记使命能够相互扫描对方的栈)
            // Mark our goroutine preemptible so its stack
            // can be scanned. This lets two mark workers
            // scan each other (otherwise, they would
            // deadlock). We must not modify anything on
            // the G stack. However, stack shrinking is
            // disabled for mark workers, so it is safe to
            // read from the G stack.
            casgstatus(gp, _Grunning, _Gwaiting)
            
            // 推断背景标记使命的形式
            switch _p_.gcMarkWorkerMode {
            default:
                throw("gcBgMarkWorker: unexpected gcMarkWorkerMode")
            case gcMarkWorkerDedicatedMode:
                // 这个形式下P应当用心实行标记
                // 实行标记, 直到被抢占, 而且须要盘算背景的扫描量来削减辅佐GC和叫醒守候中的G
                gcDrain(&_p_.gcw, gcDrainUntilPreempt|gcDrainFlushBgCredit)
                // 被抢占时把当地运转行列中的一切G都踢到全局运转行列
                if gp.preempt {
                    // We were preempted. This is
                    // a useful signal to kick
                    // everything out of the run
                    // queue so it can run
                    // somewhere else.
                    lock(&sched.lock)
                    for {
                        gp, _ := runqget(_p_)
                        if gp == nil {
                            break
                        }
                        globrunqput(gp)
                    }
                    unlock(&sched.lock)
                }
                // 继承实行标记, 直到无更多使命, 而且须要盘算背景的扫描量来削减辅佐GC和叫醒守候中的G
                // Go back to draining, this time
                // without preemption.
                gcDrain(&_p_.gcw, gcDrainNoBlock|gcDrainFlushBgCredit)
            case gcMarkWorkerFractionalMode:
                // 这个形式下P应当恰当实行标记
                // 实行标记, 直到被抢占, 而且须要盘算背景的扫描量来削减辅佐GC和叫醒守候中的G
                gcDrain(&_p_.gcw, gcDrainUntilPreempt|gcDrainFlushBgCredit)
            case gcMarkWorkerIdleMode:
                // 这个形式下P只在余暇时实行标记
                // 实行标记, 直到被抢占或许到达肯定的量, 而且须要盘算背景的扫描量来削减辅佐GC和叫醒守候中的G
                gcDrain(&_p_.gcw, gcDrainIdle|gcDrainUntilPreempt|gcDrainFlushBgCredit)
            }
            
            // 恢复G的状态到运转中
            casgstatus(gp, _Gwaiting, _Grunning)
        })
        
        // 假如标记了制止当地标记行列则flush到全局标记行列
        // If we are nearing the end of mark, dispose
        // of the cache promptly. We must do this
        // before signaling that we're no longer
        // working so that other workers can't observe
        // no workers and no work while we have this
        // cached, and before we compute done.
        if gcBlackenPromptly {
            _p_.gcw.dispose()
        }
        
        // 累加所用时刻
        // Account for time.
        duration := nanotime() - startTime
        switch _p_.gcMarkWorkerMode {
        case gcMarkWorkerDedicatedMode:
            atomic.Xaddint64(&gcController.dedicatedMarkTime, duration)
            atomic.Xaddint64(&gcController.dedicatedMarkWorkersNeeded, 1)
        case gcMarkWorkerFractionalMode:
            atomic.Xaddint64(&gcController.fractionalMarkTime, duration)
            atomic.Xaddint64(&gcController.fractionalMarkWorkersNeeded, 1)
        case gcMarkWorkerIdleMode:
            atomic.Xaddint64(&gcController.idleMarkTime, duration)
        }
        
        // Was this the last worker and did we run out
        // of work?
        incnwait := atomic.Xadd(&work.nwait, +1)
        if incnwait > work.nproc {
            println("runtime: p.gcMarkWorkerMode=", _p_.gcMarkWorkerMode,
                "work.nwait=", incnwait, "work.nproc=", work.nproc)
            throw("work.nwait > work.nproc")
        }
        
        // 推断是不是一切背景标记使命都完成, 而且没有更多的使命
        // If this worker reached a background mark completion
        // point, signal the main GC goroutine.
        if incnwait == work.nproc && !gcMarkWorkAvailable(nil) {
            // 作废和P的关联
            // Make this G preemptible and disassociate it
            // as the worker for this P so
            // findRunnableGCWorker doesn't try to
            // schedule it.
            _p_.gcBgMarkWorker.set(nil)
            
            // 许可G被抢占
            releasem(park.m.ptr())
            
            // 预备进入完成标记阶段
            gcMarkDone()
            
            // 休眠之前会从新关联P
            // 由于上面许可被抢占, 到这里的时刻大概就会变成其他P
            // 假如从新关联P失利则这个使命会终了
            // Disable preemption and prepare to reattach
            // to the P.
            //
            // We may be running on a different P at this
            // point, so we can't reattach until this G is
            // parked.
            park.m.set(acquirem())
            park.attach.set(_p_)
        }
    }
}

gcDrain函数用于实行标记:

// gcDrain scans roots and objects in work buffers, blackening grey
// objects until all roots and work buffers have been drained.
//
// If flags&gcDrainUntilPreempt != 0, gcDrain returns when g.preempt
// is set. This implies gcDrainNoBlock.
//
// If flags&gcDrainIdle != 0, gcDrain returns when there is other work
// to do. This implies gcDrainNoBlock.
//
// If flags&gcDrainNoBlock != 0, gcDrain returns as soon as it is
// unable to get more work. Otherwise, it will block until all
// blocking calls are blocked in gcDrain.
//
// If flags&gcDrainFlushBgCredit != 0, gcDrain flushes scan work
// credit to gcController.bgScanCredit every gcCreditSlack units of
// scan work.
//
//go:nowritebarrier
func gcDrain(gcw *gcWork, flags gcDrainFlags) {
    if !writeBarrier.needed {
        throw("gcDrain phase incorrect")
    }
    
    gp := getg().m.curg
    
    // 看到抢占标志时是不是要返回
    preemptible := flags&gcDrainUntilPreempt != 0
    
    // 没有使命时是不是要守候使命
    blocking := flags&(gcDrainUntilPreempt|gcDrainIdle|gcDrainNoBlock) == 0
    
    // 是不是盘算背景的扫描量来削减辅佐GC和叫醒守候中的G
    flushBgCredit := flags&gcDrainFlushBgCredit != 0
    
    // 是不是只实行肯定量的事情
    idle := flags&gcDrainIdle != 0
    
    // 纪录初始的已扫描数目
    initScanWork := gcw.scanWork
    
    // 扫描idleCheckThreshold(100000)个对象今后搜检是不是要返回
    // idleCheck is the scan work at which to perform the next
    // idle check with the scheduler.
    idleCheck := initScanWork + idleCheckThreshold
    
    // 假如根对象未扫描完, 则先扫描根对象
    // Drain root marking jobs.
    if work.markrootNext < work.markrootJobs {
        // 假如标记了preemptible, 轮回直到被抢占
        for !(preemptible && gp.preempt) {
            // 从根对象扫描行列掏出一个值(原子递增)
            job := atomic.Xadd(&work.markrootNext, +1) - 1
            if job >= work.markrootJobs {
                break
            }
            // 实行根对象扫描事情
            markroot(gcw, job)
            // 假如是idle形式而且有其他事情, 则返回
            if idle && pollWork() {
                goto done
            }
        }
    }
    
    // 根对象已在标记行列中, 消耗标记行列
    // 假如标记了preemptible, 轮回直到被抢占
    // Drain heap marking jobs.
    for !(preemptible && gp.preempt) {
        // 假如全局标记行列为空, 把当地标记行列的一部份事情分过去
        // (假如wbuf2不为空则挪动wbuf2过去, 不然挪动wbuf1的一半过去)
        // Try to keep work available on the global queue. We used to
        // check if there were waiting workers, but it's better to
        // just keep work available than to make workers wait. In the
        // worst case, we'll do O(log(_WorkbufSize)) unnecessary
        // balances.
        if work.full == 0 {
            gcw.balance()
        }
        
        // 从当地标记行列中猎取对象, 猎取不到则从全局标记行列猎取
        var b uintptr
        if blocking {
            // 壅塞猎取
            b = gcw.get()
        } else {
            // 非壅塞猎取
            b = gcw.tryGetFast()
            if b == 0 {
                b = gcw.tryGet()
            }
        }
        
        // 猎取不到对象, 标记行列已为空, 跳出轮回
        if b == 0 {
            // work barrier reached or tryGet failed.
            break
        }
        
        // 扫描猎取到的对象
        scanobject(b, gcw)
        
        // 假如已扫描了肯定数目的对象(gcCreditSlack的值是2000)
        // Flush background scan work credit to the global
        // account if we've accumulated enough locally so
        // mutator assists can draw on it.
        if gcw.scanWork >= gcCreditSlack {
            // 把扫描的对象数目增加到全局
            atomic.Xaddint64(&gcController.scanWork, gcw.scanWork)
            // 削减辅佐GC的事情量和叫醒守候中的G
            if flushBgCredit {
                gcFlushBgCredit(gcw.scanWork - initScanWork)
                initScanWork = 0
            }
            idleCheck -= gcw.scanWork
            gcw.scanWork = 0
            
            // 假如是idle形式且到达了搜检的扫描量, 则搜检是不是有其他使命(G), 假如有则跳出轮回
            if idle && idleCheck <= 0 {
                idleCheck += idleCheckThreshold
                if pollWork() {
                    break
                }
            }
        }
    }
    
    // In blocking mode, write barriers are not allowed after this
    // point because we must preserve the condition that the work
    // buffers are empty.
    
done:
    // 把扫描的对象数目增加到全局
    // Flush remaining scan work credit.
    if gcw.scanWork > 0 {
        atomic.Xaddint64(&gcController.scanWork, gcw.scanWork)
        // 削减辅佐GC的事情量和叫醒守候中的G
        if flushBgCredit {
            gcFlushBgCredit(gcw.scanWork - initScanWork)
        }
        gcw.scanWork = 0
    }
}

markroot函数用于实行根对象扫描事情:

// markroot scans the i'th root.
//
// Preemption must be disabled (because this uses a gcWork).
//
// nowritebarrier is only advisory here.
//
//go:nowritebarrier
func markroot(gcw *gcWork, i uint32) {
    // 推断掏出的数值对应哪一种使命
    // (google的工程师以为这类方法好笑)
    // TODO(austin): This is a bit ridiculous. Compute and store
    // the bases in gcMarkRootPrepare instead of the counts.
    baseFlushCache := uint32(fixedRootCount)
    baseData := baseFlushCache + uint32(work.nFlushCacheRoots)
    baseBSS := baseData + uint32(work.nDataRoots)
    baseSpans := baseBSS + uint32(work.nBSSRoots)
    baseStacks := baseSpans + uint32(work.nSpanRoots)
    end := baseStacks + uint32(work.nStackRoots)

    // Note: if you add a case here, please also update heapdump.go:dumproots.
    switch {
    // 开释mcache中的一切span, 要求STW
    case baseFlushCache <= i && i < baseData:
        flushmcache(int(i - baseFlushCache))

    // 扫描可读写的全局变量
    // 这里只会扫描i对应的block, 扫描时传入包含那里有指针的bitmap数据
    case baseData <= i && i < baseBSS:
        for _, datap := range activeModules() {
            markrootBlock(datap.data, datap.edata-datap.data, datap.gcdatamask.bytedata, gcw, int(i-baseData))
        }

    // 扫描只读的全局变量
    // 这里只会扫描i对应的block, 扫描时传入包含那里有指针的bitmap数据
    case baseBSS <= i && i < baseSpans:
        for _, datap := range activeModules() {
            markrootBlock(datap.bss, datap.ebss-datap.bss, datap.gcbssmask.bytedata, gcw, int(i-baseBSS))
        }

    // 扫描析构器行列
    case i == fixedRootFinalizers:
        // Only do this once per GC cycle since we don't call
        // queuefinalizer during marking.
        if work.markrootDone {
            break
        }
        for fb := allfin; fb != nil; fb = fb.alllink {
            cnt := uintptr(atomic.Load(&fb.cnt))
            scanblock(uintptr(unsafe.Pointer(&fb.fin[0])), cnt*unsafe.Sizeof(fb.fin[0]), &finptrmask[0], gcw)
        }

    // 开释已中断的G的栈
    case i == fixedRootFreeGStacks:
        // Only do this once per GC cycle; preferably
        // concurrently.
        if !work.markrootDone {
            // Switch to the system stack so we can call
            // stackfree.
            systemstack(markrootFreeGStacks)
        }

    // 扫描各个span中特别对象(析构器列表)
    case baseSpans <= i && i < baseStacks:
        // mark MSpan.specials
        markrootSpans(gcw, int(i-baseSpans))

    // 扫描各个G的栈
    default:
        // 猎取须要扫描的G
        // the rest is scanning goroutine stacks
        var gp *g
        if baseStacks <= i && i < end {
            gp = allgs[i-baseStacks]
        } else {
            throw("markroot: bad index")
        }

        // 纪录守候入手下手的时刻
        // remember when we've first observed the G blocked
        // needed only to output in traceback
        status := readgstatus(gp) // We are not in a scan state
        if (status == _Gwaiting || status == _Gsyscall) && gp.waitsince == 0 {
            gp.waitsince = work.tstart
        }

        // 切换到g0运转(有大概会扫到本身的栈)
        // scang must be done on the system stack in case
        // we're trying to scan our own stack.
        systemstack(func() {
            // 推断扫描的栈是不是本身的
            // If this is a self-scan, put the user G in
            // _Gwaiting to prevent self-deadlock. It may
            // already be in _Gwaiting if this is a mark
            // worker or we're in mark termination.
            userG := getg().m.curg
            selfScan := gp == userG && readgstatus(userG) == _Grunning
            
            // 假如正在扫描本身的栈则切换状态到守候中防备死锁
            if selfScan {
                casgstatus(userG, _Grunning, _Gwaiting)
                userG.waitreason = "garbage collection scan"
            }
            
            // 扫描G的栈
            // TODO: scang blocks until gp's stack has
            // been scanned, which may take a while for
            // running goroutines. Consider doing this in
            // two phases where the first is non-blocking:
            // we scan the stacks we can and ask running
            // goroutines to scan themselves; and the
            // second blocks.
            scang(gp, gcw)
            
            // 假如正在扫描本身的栈则把状态切换回运转中
            if selfScan {
                casgstatus(userG, _Gwaiting, _Grunning)
            }
        })
    }
}

scang函数担任扫描G的栈:

// scang blocks until gp's stack has been scanned.
// It might be scanned by scang or it might be scanned by the goroutine itself.
// Either way, the stack scan has completed when scang returns.
func scang(gp *g, gcw *gcWork) {
    // Invariant; we (the caller, markroot for a specific goroutine) own gp.gcscandone.
    // Nothing is racing with us now, but gcscandone might be set to true left over
    // from an earlier round of stack scanning (we scan twice per GC).
    // We use gcscandone to record whether the scan has been done during this round.

    // 标记扫描未完成
    gp.gcscandone = false

    // See http://golang.org/cl/21503 for justification of the yield delay.
    const yieldDelay = 10 * 1000
    var nextYield int64

    // 轮回直到扫描完成
    // Endeavor to get gcscandone set to true,
    // either by doing the stack scan ourselves or by coercing gp to scan itself.
    // gp.gcscandone can transition from false to true when we're not looking
    // (if we asked for preemption), so any time we lock the status using
    // castogscanstatus we have to double-check that the scan is still not done.
loop:
    for i := 0; !gp.gcscandone; i++ {
        // 推断G的当前状态
        switch s := readgstatus(gp); s {
        default:
            dumpgstatus(gp)
            throw("stopg: invalid status")

        // G已中断, 不须要扫描它
        case _Gdead:
            // No stack.
            gp.gcscandone = true
            break loop

        // G的栈正在扩大, 下一轮重试
        case _Gcopystack:
        // Stack being switched. Go around again.

        // G不是运转中, 起首须要防备它运转
        case _Grunnable, _Gsyscall, _Gwaiting:
            // Claim goroutine by setting scan bit.
            // Racing with execution or readying of gp.
            // The scan bit keeps them from running
            // the goroutine until we're done.
            if castogscanstatus(gp, s, s|_Gscan) {
                // 原子切换状态胜利时扫描它的栈
                if !gp.gcscandone {
                    scanstack(gp, gcw)
                    gp.gcscandone = true
                }
                // 恢复G的状态, 并跳出轮回
                restartg(gp)
                break loop
            }

        // G正在扫描它本身, 守候扫描终了
        case _Gscanwaiting:
        // newstack is doing a scan for us right now. Wait.

        // G正在运转
        case _Grunning:
            // Goroutine running. Try to preempt execution so it can scan itself.
            // The preemption handler (in newstack) does the actual scan.

            // 假如已有抢占要求, 则抢占胜利时会帮我们处置惩罚
            // Optimization: if there is already a pending preemption request
            // (from the previous loop iteration), don't bother with the atomics.
            if gp.preemptscan && gp.preempt && gp.stackguard0 == stackPreempt {
                break
            }

            // 抢占G, 抢占胜利时G会扫描它本身
            // Ask for preemption and self scan.
            if castogscanstatus(gp, _Grunning, _Gscanrunning) {
                if !gp.gcscandone {
                    gp.preemptscan = true
                    gp.preempt = true
                    gp.stackguard0 = stackPreempt
                }
                casfrom_Gscanstatus(gp, _Gscanrunning, _Grunning)
            }
        }

        // 第一轮休眠10毫秒, 第二轮休眠5毫秒
        if i == 0 {
            nextYield = nanotime() + yieldDelay
        }
        if nanotime() < nextYield {
            procyield(10)
        } else {
            osyield()
            nextYield = nanotime() + yieldDelay/2
        }
    }

    // 扫描完成, 作废抢占扫描的要求
    gp.preemptscan = false // cancel scan request if no longer needed
}

设置preemptscan后, 在抢占G胜利时会挪用scanstack扫描它本身的栈, 详细代码在这里.
扫描栈用的函数是scanstack:

// scanstack scans gp's stack, greying all pointers found on the stack.
//
// scanstack is marked go:systemstack because it must not be preempted
// while using a workbuf.
//
//go:nowritebarrier
//go:systemstack
func scanstack(gp *g, gcw *gcWork) {
    if gp.gcscanvalid {
        return
    }

    if readgstatus(gp)&_Gscan == 0 {
        print("runtime:scanstack: gp=", gp, ", goid=", gp.goid, ", gp->atomicstatus=", hex(readgstatus(gp)), "\n")
        throw("scanstack - bad status")
    }

    switch readgstatus(gp) &^ _Gscan {
    default:
        print("runtime: gp=", gp, ", goid=", gp.goid, ", gp->atomicstatus=", readgstatus(gp), "\n")
        throw("mark - bad status")
    case _Gdead:
        return
    case _Grunning:
        print("runtime: gp=", gp, ", goid=", gp.goid, ", gp->atomicstatus=", readgstatus(gp), "\n")
        throw("scanstack: goroutine not stopped")
    case _Grunnable, _Gsyscall, _Gwaiting:
        // ok
    }

    if gp == getg() {
        throw("can't scan our own stack")
    }
    mp := gp.m
    if mp != nil && mp.helpgc != 0 {
        throw("can't scan gchelper stack")
    }

    // Shrink the stack if not much of it is being used. During
    // concurrent GC, we can do this during concurrent mark.
    if !work.markrootDone {
        shrinkstack(gp)
    }

    // Scan the stack.
    var cache pcvalueCache
    scanframe := func(frame *stkframe, unused unsafe.Pointer) bool {
        // scanframeworker会依据代码地点(pc)猎取函数信息
        // 然后找到函数信息中的stackmap.bytedata, 它保留了函数的栈上哪些处所有指针
        // 再挪用scanblock来扫描函数的栈空间, 同时函数的参数也会如许扫描
        scanframeworker(frame, &cache, gcw)
        return true
    }
    // 罗列一切挪用帧, 离别挪用scanframe函数
    gentraceback(^uintptr(0), ^uintptr(0), 0, gp, 0, nil, 0x7fffffff, scanframe, nil, 0)
    // 罗列一切defer的挪用帧, 离别挪用scanframe函数
    tracebackdefers(gp, scanframe, nil)
    gp.gcscanvalid = true
}

scanblock函数是一个通用的扫描函数, 扫描全局变量和栈空间都邑用它, 和scanobject差别的是bitmap须要手动传入:

// scanblock scans b as scanobject would, but using an explicit
// pointer bitmap instead of the heap bitmap.
//
// This is used to scan non-heap roots, so it does not update
// gcw.bytesMarked or gcw.scanWork.
//
//go:nowritebarrier
func scanblock(b0, n0 uintptr, ptrmask *uint8, gcw *gcWork) {
    // Use local copies of original parameters, so that a stack trace
    // due to one of the throws below shows the original block
    // base and extent.
    b := b0
    n := n0

    arena_start := mheap_.arena_start
    arena_used := mheap_.arena_used

    // 罗列扫描的地点
    for i := uintptr(0); i < n; {
        // 找到bitmap中对应的byte
        // Find bits for the next word.
        bits := uint32(*addb(ptrmask, i/(sys.PtrSize*8)))
        if bits == 0 {
            i += sys.PtrSize * 8
            continue
        }
        // 罗列byte
        for j := 0; j < 8 && i < n; j++ {
            // 假如该地点包含指针
            if bits&1 != 0 {
                // 标记在该地点的对象存活, 并把它加到标记行列(该对象变成灰色)
                // Same work as in scanobject; see comments there.
                obj := *(*uintptr)(unsafe.Pointer(b + i))
                if obj != 0 && arena_start <= obj && obj < arena_used {
                    // 找到该对象对应的span和bitmap
                    if obj, hbits, span, objIndex := heapBitsForObject(obj, b, i); obj != 0 {
                        // 标记一个对象存活, 并把它加到标记行列(该对象变成灰色)
                        greyobject(obj, b, i, hbits, span, gcw, objIndex)
                    }
                }
            }
            // 处置惩罚下一个指针下一个bit
            bits >>= 1
            i += sys.PtrSize
        }
    }
}

greyobject用于标记一个对象存活, 并把它加到标记行列(该对象变成灰色):

// obj is the start of an object with mark mbits.
// If it isn't already marked, mark it and enqueue into gcw.
// base and off are for debugging only and could be removed.
//go:nowritebarrierrec
func greyobject(obj, base, off uintptr, hbits heapBits, span *mspan, gcw *gcWork, objIndex uintptr) {
    // obj should be start of allocation, and so must be at least pointer-aligned.
    if obj&(sys.PtrSize-1) != 0 {
        throw("greyobject: obj not pointer-aligned")
    }
    mbits := span.markBitsForIndex(objIndex)

    if useCheckmark {
        // checkmark是用于搜检是不是一切可抵达的对象都被正确标记的机制, 仅除错运用
        if !mbits.isMarked() {
            printlock()
            print("runtime:greyobject: checkmarks finds unexpected unmarked object obj=", hex(obj), "\n")
            print("runtime: found obj at *(", hex(base), "+", hex(off), ")\n")

            // Dump the source (base) object
            gcDumpObject("base", base, off)

            // Dump the object
            gcDumpObject("obj", obj, ^uintptr(0))

            getg().m.traceback = 2
            throw("checkmark found unmarked object")
        }
        if hbits.isCheckmarked(span.elemsize) {
            return
        }
        hbits.setCheckmarked(span.elemsize)
        if !hbits.isCheckmarked(span.elemsize) {
            throw("setCheckmarked and isCheckmarked disagree")
        }
    } else {
        if debug.gccheckmark > 0 && span.isFree(objIndex) {
            print("runtime: marking free object ", hex(obj), " found at *(", hex(base), "+", hex(off), ")\n")
            gcDumpObject("base", base, off)
            gcDumpObject("obj", obj, ^uintptr(0))
            getg().m.traceback = 2
            throw("marking free object")
        }

        // 假如对象地点的span中的gcmarkBits对应的bit已设置为1则能够跳过处置惩罚
        // If marked we have nothing to do.
        if mbits.isMarked() {
            return
        }
        
        // 设置对象地点的span中的gcmarkBits对应的bit为1
        // mbits.setMarked() // Avoid extra call overhead with manual inlining.
        atomic.Or8(mbits.bytep, mbits.mask)
        
        // 假如肯定对象不包含指针(地点span的范例是noscan), 则不须要把对象放入标记行列
        // If this is a noscan object, fast-track it to black
        // instead of greying it.
        if span.spanclass.noscan() {
            gcw.bytesMarked += uint64(span.elemsize)
            return
        }
    }

    // 把对象放入标记行列
    // 先放入当地标记行列, 失利时把当地标记行列中的部份事情转移到全局标记行列, 再放入当地标记行列
    // Queue the obj for scanning. The PREFETCH(obj) logic has been removed but
    // seems like a nice optimization that can be added back in.
    // There needs to be time between the PREFETCH and the use.
    // Previously we put the obj in an 8 element buffer that is drained at a rate
    // to give the PREFETCH time to do its work.
    // Use of PREFETCHNTA might be more appropriate than PREFETCH
    if !gcw.putFast(obj) {
        gcw.put(obj)
    }
}

gcDrain函数扫描完根对象, 就会入手下手消耗标记行列, 对从标记行列中掏出的对象挪用scanobject函数:

// scanobject scans the object starting at b, adding pointers to gcw.
// b must point to the beginning of a heap object or an oblet.
// scanobject consults the GC bitmap for the pointer mask and the
// spans for the size of the object.
//
//go:nowritebarrier
func scanobject(b uintptr, gcw *gcWork) {
    // Note that arena_used may change concurrently during
    // scanobject and hence scanobject may encounter a pointer to
    // a newly allocated heap object that is *not* in
    // [start,used). It will not mark this object; however, we
    // know that it was just installed by a mutator, which means
    // that mutator will execute a write barrier and take care of
    // marking it. This is even more pronounced on relaxed memory
    // architectures since we access arena_used without barriers
    // or synchronization, but the same logic applies.
    arena_start := mheap_.arena_start
    arena_used := mheap_.arena_used

    // Find the bits for b and the size of the object at b.
    //
    // b is either the beginning of an object, in which case this
    // is the size of the object to scan, or it points to an
    // oblet, in which case we compute the size to scan below.
    // 猎取对象对应的bitmap
    hbits := heapBitsForAddr(b)
    
    // 猎取对象地点的span
    s := spanOfUnchecked(b)
    
    // 猎取对象的大小
    n := s.elemsize
    if n == 0 {
        throw("scanobject n == 0")
    }

    // 对象大小过大时(maxObletBytes是128KB)须要支解扫描
    // 每次最多只扫描128KB
    if n > maxObletBytes {
        // Large object. Break into oblets for better
        // parallelism and lower latency.
        if b == s.base() {
            // It's possible this is a noscan object (not
            // from greyobject, but from other code
            // paths), in which case we must *not* enqueue
            // oblets since their bitmaps will be
            // uninitialized.
            if s.spanclass.noscan() {
                // Bypass the whole scan.
                gcw.bytesMarked += uint64(n)
                return
            }

            // Enqueue the other oblets to scan later.
            // Some oblets may be in b's scalar tail, but
            // these will be marked as "no more pointers",
            // so we'll drop out immediately when we go to
            // scan those.
            for oblet := b + maxObletBytes; oblet < s.base()+s.elemsize; oblet += maxObletBytes {
                if !gcw.putFast(oblet) {
                    gcw.put(oblet)
                }
            }
        }

        // Compute the size of the oblet. Since this object
        // must be a large object, s.base() is the beginning
        // of the object.
        n = s.base() + s.elemsize - b
        if n > maxObletBytes {
            n = maxObletBytes
        }
    }

    // 扫描对象中的指针
    var i uintptr
    for i = 0; i < n; i += sys.PtrSize {
        // 猎取对应的bit
        // Find bits for this word.
        if i != 0 {
            // Avoid needless hbits.next() on last iteration.
            hbits = hbits.next()
        }
        // Load bits once. See CL 22712 and issue 16973 for discussion.
        bits := hbits.bits()
        
        // 搜检scan bit推断是不是继承扫描, 注重第二个scan bit是checkmark
        // During checkmarking, 1-word objects store the checkmark
        // in the type bit for the one word. The only one-word objects
        // are pointers, or else they'd be merged with other non-pointer
        // data into larger allocations.
        if i != 1*sys.PtrSize && bits&bitScan == 0 {
            break // no more pointers in this object
        }
        
        // 搜检pointer bit, 不是指针则继承
        if bits&bitPointer == 0 {
            continue // not a pointer
        }

        // 掏出指针的值
        // Work here is duplicated in scanblock and above.
        // If you make changes here, make changes there too.
        obj := *(*uintptr)(unsafe.Pointer(b + i))

        // 假如指针在arena地区中, 则挪用greyobject标记对象并把对象放到标记行列中
        // At this point we have extracted the next potential pointer.
        // Check if it points into heap and not back at the current object.
        if obj != 0 && arena_start <= obj && obj < arena_used && obj-b >= n {
            // Mark the object.
            if obj, hbits, span, objIndex := heapBitsForObject(obj, b, i); obj != 0 {
                greyobject(obj, b, i, hbits, span, gcw, objIndex)
            }
        }
    }
    
    // 统计扫描过的大小和对象数目
    gcw.bytesMarked += uint64(n)
    gcw.scanWork += int64(i)
}

在一切背景标记使命都把标记行列消耗终了时, 会实行gcMarkDone函数预备进入完成标记阶段(mark termination):
在并行GC中gcMarkDone会被实行两次, 第一次会制止当地标记行列然后从新入手下手背景标记使命, 第二次会进入完成标记阶段(mark termination)。

// gcMarkDone transitions the GC from mark 1 to mark 2 and from mark 2
// to mark termination.
//
// This should be called when all mark work has been drained. In mark
// 1, this includes all root marking jobs, global work buffers, and
// active work buffers in assists and background workers; however,
// work may still be cached in per-P work buffers. In mark 2, per-P
// caches are disabled.
//
// The calling context must be preemptible.
//
// Note that it is explicitly okay to have write barriers in this
// function because completion of concurrent mark is best-effort
// anyway. Any work created by write barriers here will be cleaned up
// by mark termination.
func gcMarkDone() {
top:
    semacquire(&work.markDoneSema)

    // Re-check transition condition under transition lock.
    if !(gcphase == _GCmark && work.nwait == work.nproc && !gcMarkWorkAvailable(nil)) {
        semrelease(&work.markDoneSema)
        return
    }

    // 临时制止启动新的背景标记使命
    // Disallow starting new workers so that any remaining workers
    // in the current mark phase will drain out.
    //
    // TODO(austin): Should dedicated workers keep an eye on this
    // and exit gcDrain promptly?
    atomic.Xaddint64(&gcController.dedicatedMarkWorkersNeeded, -0xffffffff)
    atomic.Xaddint64(&gcController.fractionalMarkWorkersNeeded, -0xffffffff)

    // 推断当地标记行列是不是已禁用
    if !gcBlackenPromptly {
        // 当地标记行列是不是未禁用, 禁用然后从新入手下手背景标记使命
        // Transition from mark 1 to mark 2.
        //
        // The global work list is empty, but there can still be work
        // sitting in the per-P work caches.
        // Flush and disable work caches.

        // 禁用当地标记行列
        // Disallow caching workbufs and indicate that we're in mark 2.
        gcBlackenPromptly = true

        // Prevent completion of mark 2 until we've flushed
        // cached workbufs.
        atomic.Xadd(&work.nwait, -1)

        // GC is set up for mark 2. Let Gs blocked on the
        // transition lock go while we flush caches.
        semrelease(&work.markDoneSema)

        // 把一切当地标记行列中的对象都推到全局标记行列
        systemstack(func() {
            // Flush all currently cached workbufs and
            // ensure all Ps see gcBlackenPromptly. This
            // also blocks until any remaining mark 1
            // workers have exited their loop so we can
            // start new mark 2 workers.
            forEachP(func(_p_ *p) {
                _p_.gcw.dispose()
            })
        })

        // 除错用
        // Check that roots are marked. We should be able to
        // do this before the forEachP, but based on issue
        // #16083 there may be a (harmless) race where we can
        // enter mark 2 while some workers are still scanning
        // stacks. The forEachP ensures these scans are done.
        //
        // TODO(austin): Figure out the race and fix this
        // properly.
        gcMarkRootCheck()

        // 许可启动新的背景标记使命
        // Now we can start up mark 2 workers.
        atomic.Xaddint64(&gcController.dedicatedMarkWorkersNeeded, 0xffffffff)
        atomic.Xaddint64(&gcController.fractionalMarkWorkersNeeded, 0xffffffff)

        // 假如肯定没有更多的使命则能够直接跳到函数顶部
        // 如许就当作是第二次挪用了
        incnwait := atomic.Xadd(&work.nwait, +1)
        if incnwait == work.nproc && !gcMarkWorkAvailable(nil) {
            // This loop will make progress because
            // gcBlackenPromptly is now true, so it won't
            // take this same "if" branch.
            goto top
        }
    } else {
        // 纪录完成标记阶段入手下手的时刻和STW入手下手的时刻
        // Transition to mark termination.
        now := nanotime()
        work.tMarkTerm = now
        work.pauseStart = now
        
        // 制止G被抢占
        getg().m.preemptoff = "gcing"
        
        // 住手一切运转中的G, 并制止它们运转
        systemstack(stopTheWorldWithSema)
        
        // !!!!!!!!!!!!!!!!
        // 天下已住手(STW)...
        // !!!!!!!!!!!!!!!!
        
        // The gcphase is _GCmark, it will transition to _GCmarktermination
        // below. The important thing is that the wb remains active until
        // all marking is complete. This includes writes made by the GC.
        
        // 标记对根对象的扫描已完成, 会影响gcMarkRootPrepare中的处置惩罚
        // Record that one root marking pass has completed.
        work.markrootDone = true
        
        // 制止辅佐GC和背景标记使命的运转
        // Disable assists and background workers. We must do
        // this before waking blocked assists.
        atomic.Store(&gcBlackenEnabled, 0)
        
        // 叫醒一切由于辅佐GC而休眠的G
        // Wake all blocked assists. These will run when we
        // start the world again.
        gcWakeAllAssists()
        
        // Likewise, release the transition lock. Blocked
        // workers and assists will run when we start the
        // world again.
        semrelease(&work.markDoneSema)
        
        // 盘算下一次触发gc须要的heap大小
        // endCycle depends on all gcWork cache stats being
        // flushed. This is ensured by mark 2.
        nextTriggerRatio := gcController.endCycle()
        
        // 进入完成标记阶段, 会从新启动天下
        // Perform mark termination. This will restart the world.
        gcMarkTermination(nextTriggerRatio)
    }
}

gcMarkTermination函数会进入完成标记阶段:

func gcMarkTermination(nextTriggerRatio float64) {
    // World is stopped.
    // Start marktermination which includes enabling the write barrier.
    // 制止辅佐GC和背景标记使命的运转
    atomic.Store(&gcBlackenEnabled, 0)
    
    // 从新许可当地标记行列(下次GC运用)
    gcBlackenPromptly = false
    
    // 设置当前GC阶段到完成标记阶段, 并启用写屏蔽
    setGCPhase(_GCmarktermination)

    // 纪录入手下手时刻
    work.heap1 = memstats.heap_live
    startTime := nanotime()

    // 制止G被抢占
    mp := acquirem()
    mp.preemptoff = "gcing"
    _g_ := getg()
    _g_.m.traceback = 2
    
    // 设置G的状态为守候中如许它的栈能够被扫描
    gp := _g_.m.curg
    casgstatus(gp, _Grunning, _Gwaiting)
    gp.waitreason = "garbage collection"

    // 切换到g0运转
    // Run gc on the g0 stack. We do this so that the g stack
    // we're currently running on will no longer change. Cuts
    // the root set down a bit (g0 stacks are not scanned, and
    // we don't need to scan gc's internal state).  We also
    // need to switch to g0 so we can shrink the stack.
    systemstack(func() {
        // 入手下手STW中的标记
        gcMark(startTime)
        
        // 必需马上返回, 由于表面的G的栈有大概被挪动, 不能在这今后接见表面的变量
        // Must return immediately.
        // The outer function's stack may have moved
        // during gcMark (it shrinks stacks, including the
        // outer function's stack), so we must not refer
        // to any of its variables. Return back to the
        // non-system stack to pick up the new addresses
        // before continuing.
    })

    // 从新切换到g0运转
    systemstack(func() {
        work.heap2 = work.bytesMarked
        
        // 假如启用了checkmark则实行搜检, 搜检是不是一切可抵达的对象都有标记
        if debug.gccheckmark > 0 {
            // Run a full stop-the-world mark using checkmark bits,
            // to check that we didn't forget to mark anything during
            // the concurrent mark process.
            gcResetMarkState()
            initCheckmarks()
            gcMark(startTime)
            clearCheckmarks()
        }

        // 设置当前GC阶段到封闭, 并禁用写屏蔽
        // marking is complete so we can turn the write barrier off
        setGCPhase(_GCoff)
        
        // 叫醒背景排除使命, 将在STW终了后入手下手运转
        gcSweep(work.mode)

        // 除错用
        if debug.gctrace > 1 {
            startTime = nanotime()
            // The g stacks have been scanned so
            // they have gcscanvalid==true and gcworkdone==true.
            // Reset these so that all stacks will be rescanned.
            gcResetMarkState()
            finishsweep_m()

            // Still in STW but gcphase is _GCoff, reset to _GCmarktermination
            // At this point all objects will be found during the gcMark which
            // does a complete STW mark and object scan.
            setGCPhase(_GCmarktermination)
            gcMark(startTime)
            setGCPhase(_GCoff) // marking is done, turn off wb.
            gcSweep(work.mode)
        }
    })

    // 设置G的状态为运转中
    _g_.m.traceback = 0
    casgstatus(gp, _Gwaiting, _Grunning)

    // 跟踪处置惩罚
    if trace.enabled {
        traceGCDone()
    }

    // all done
    mp.preemptoff = ""

    if gcphase != _GCoff {
        throw("gc done but gcphase != _GCoff")
    }

    // 更新下一次触发gc须要的heap大小(gc_trigger)
    // Update GC trigger and pacing for the next cycle.
    gcSetTriggerRatio(nextTriggerRatio)

    // 更新用时纪录
    // Update timing memstats
    now := nanotime()
    sec, nsec, _ := time_now()
    unixNow := sec*1e9 + int64(nsec)
    work.pauseNS += now - work.pauseStart
    work.tEnd = now
    atomic.Store64(&memstats.last_gc_unix, uint64(unixNow)) // must be Unix time to make sense to user
    atomic.Store64(&memstats.last_gc_nanotime, uint64(now)) // monotonic time for us
    memstats.pause_ns[memstats.numgc%uint32(len(memstats.pause_ns))] = uint64(work.pauseNS)
    memstats.pause_end[memstats.numgc%uint32(len(memstats.pause_end))] = uint64(unixNow)
    memstats.pause_total_ns += uint64(work.pauseNS)

    // 更新所用cpu纪录
    // Update work.totaltime.
    sweepTermCpu := int64(work.stwprocs) * (work.tMark - work.tSweepTerm)
    // We report idle marking time below, but omit it from the
    // overall utilization here since it's "free".
    markCpu := gcController.assistTime + gcController.dedicatedMarkTime + gcController.fractionalMarkTime
    markTermCpu := int64(work.stwprocs) * (work.tEnd - work.tMarkTerm)
    cycleCpu := sweepTermCpu + markCpu + markTermCpu
    work.totaltime += cycleCpu

    // Compute overall GC CPU utilization.
    totalCpu := sched.totaltime + (now-sched.procresizetime)*int64(gomaxprocs)
    memstats.gc_cpu_fraction = float64(work.totaltime) / float64(totalCpu)

    // 重置排除状态
    // Reset sweep state.
    sweep.nbgsweep = 0
    sweep.npausesweep = 0

    // 统计强迫入手下手GC的次数
    if work.userForced {
        memstats.numforcedgc++
    }

    // 统计实行GC的次数然后叫醒守候排除的G
    // Bump GC cycle count and wake goroutines waiting on sweep.
    lock(&work.sweepWaiters.lock)
    memstats.numgc++
    injectglist(work.sweepWaiters.head.ptr())
    work.sweepWaiters.head = 0
    unlock(&work.sweepWaiters.lock)

    // 机能统计用
    // Finish the current heap profiling cycle and start a new
    // heap profiling cycle. We do this before starting the world
    // so events don't leak into the wrong cycle.
    mProf_NextCycle()

    // 从新启动天下
    systemstack(startTheWorldWithSema)

    // !!!!!!!!!!!!!!!
    // 天下已从新启动...
    // !!!!!!!!!!!!!!!

    // 机能统计用
    // Flush the heap profile so we can start a new cycle next GC.
    // This is relatively expensive, so we don't do it with the
    // world stopped.
    mProf_Flush()

    // 挪动标记行列运用的缓冲区到自在列表, 使得它们能够被接纳
    // Prepare workbufs for freeing by the sweeper. We do this
    // asynchronously because it can take non-trivial time.
    prepareFreeWorkbufs()

    // 开释未运用的栈
    // Free stack spans. This must be done between GC cycles.
    systemstack(freeStackSpans)

    // 除错用
    // Print gctrace before dropping worldsema. As soon as we drop
    // worldsema another cycle could start and smash the stats
    // we're trying to print.
    if debug.gctrace > 0 {
        util := int(memstats.gc_cpu_fraction * 100)

        var sbuf [24]byte
        printlock()
        print("gc ", memstats.numgc,
            " @", string(itoaDiv(sbuf[:], uint64(work.tSweepTerm-runtimeInitTime)/1e6, 3)), "s ",
            util, "%: ")
        prev := work.tSweepTerm
        for i, ns := range []int64{work.tMark, work.tMarkTerm, work.tEnd} {
            if i != 0 {
                print("+")
            }
            print(string(fmtNSAsMS(sbuf[:], uint64(ns-prev))))
            prev = ns
        }
        print(" ms clock, ")
        for i, ns := range []int64{sweepTermCpu, gcController.assistTime, gcController.dedicatedMarkTime + gcController.fractionalMarkTime, gcController.idleMarkTime, markTermCpu} {
            if i == 2 || i == 3 {
                // Separate mark time components with /.
                print("/")
            } else if i != 0 {
                print("+")
            }
            print(string(fmtNSAsMS(sbuf[:], uint64(ns))))
        }
        print(" ms cpu, ",
            work.heap0>>20, "->", work.heap1>>20, "->", work.heap2>>20, " MB, ",
            work.heapGoal>>20, " MB goal, ",
            work.maxprocs, " P")
        if work.userForced {
            print(" (forced)")
        }
        print("\n")
        printunlock()
    }

    semrelease(&worldsema)
    // Careful: another GC cycle may start now.

    // 从新许可当前的G被抢占
    releasem(mp)
    mp = nil

    // 假如是并行GC, 让当前M继承运转(会回到gcBgMarkWorker然后休眠)
    // 假如不是并行GC, 则让当前M入手下手调理
    // now that gc is done, kick off finalizer thread if needed
    if !concurrentSweep {
        // give the queued finalizers, if any, a chance to run
        Gosched()
    }
}

gcSweep函数会叫醒背景排除使命:
背景排除使命会在顺序启动时挪用的gcenable函数中启动.

func gcSweep(mode gcMode) {
    if gcphase != _GCoff {
        throw("gcSweep being done but phase is not GCoff")
    }

    // 增添sweepgen, 如许sweepSpans中两个行列角色会交流, 一切span都邑变成"待排除"的span
    lock(&mheap_.lock)
    mheap_.sweepgen += 2
    mheap_.sweepdone = 0
    if mheap_.sweepSpans[mheap_.sweepgen/2%2].index != 0 {
        // We should have drained this list during the last
        // sweep phase. We certainly need to start this phase
        // with an empty swept list.
        throw("non-empty swept list")
    }
    mheap_.pagesSwept = 0
    unlock(&mheap_.lock)

    // 假如非并行GC则在这里完成一切事情(STW中)
    if !_ConcurrentSweep || mode == gcForceBlockMode {
        // Special case synchronous sweep.
        // Record that no proportional sweeping has to happen.
        lock(&mheap_.lock)
        mheap_.sweepPagesPerByte = 0
        unlock(&mheap_.lock)
        // Sweep all spans eagerly.
        for sweepone() != ^uintptr(0) {
            sweep.npausesweep++
        }
        // Free workbufs eagerly.
        prepareFreeWorkbufs()
        for freeSomeWbufs(false) {
        }
        // All "free" events for this mark/sweep cycle have
        // now happened, so we can make this profile cycle
        // available immediately.
        mProf_NextCycle()
        mProf_Flush()
        return
    }

    // 叫醒背景排除使命
    // Background sweep.
    lock(&sweep.lock)
    if sweep.parked {
        sweep.parked = false
        ready(sweep.g, 0, true)
    }
    unlock(&sweep.lock)
}

背景排除使命的函数是bgsweep:

func bgsweep(c chan int) {
    sweep.g = getg()

    // 守候叫醒
    lock(&sweep.lock)
    sweep.parked = true
    c <- 1
    goparkunlock(&sweep.lock, "GC sweep wait", traceEvGoBlock, 1)

    // 轮回排除
    for {
        // 排除一个span, 然后进入调理(一次只做少许事情)
        for gosweepone() != ^uintptr(0) {
            sweep.nbgsweep++
            Gosched()
        }
        // 开释一些未运用的标记行列缓冲区到heap
        for freeSomeWbufs(true) {
            Gosched()
        }
        // 假如排除未完成则继承轮回
        lock(&sweep.lock)
        if !gosweepdone() {
            // This can happen if a GC runs between
            // gosweepone returning ^0 above
            // and the lock being acquired.
            unlock(&sweep.lock)
            continue
        }
        // 不然让背景排除使命进入休眠, 当前M继承调理
        sweep.parked = true
        goparkunlock(&sweep.lock, "GC sweep wait", traceEvGoBlock, 1)
    }
}

gosweepone函数会从sweepSpans中掏出单个span排除:

//go:nowritebarrier
func gosweepone() uintptr {
    var ret uintptr
    // 切换到g0运转
    systemstack(func() {
        ret = sweepone()
    })
    return ret
}

sweepone函数以下:

// sweeps one span
// returns number of pages returned to heap, or ^uintptr(0) if there is nothing to sweep
//go:nowritebarrier
func sweepone() uintptr {
    _g_ := getg()
    sweepRatio := mheap_.sweepPagesPerByte // For debugging

    // 制止G被抢占
    // increment locks to ensure that the goroutine is not preempted
    // in the middle of sweep thus leaving the span in an inconsistent state for next GC
    _g_.m.locks++
    
    // 搜检是不是已完成排除
    if atomic.Load(&mheap_.sweepdone) != 0 {
        _g_.m.locks--
        return ^uintptr(0)
    }
    
    // 更新同时实行sweep的使命数目
    atomic.Xadd(&mheap_.sweepers, +1)

    npages := ^uintptr(0)
    sg := mheap_.sweepgen
    for {
        // 从sweepSpans中掏出一个span
        s := mheap_.sweepSpans[1-sg/2%2].pop()
        // 悉数排除终了时跳出轮回
        if s == nil {
            atomic.Store(&mheap_.sweepdone, 1)
            break
        }
        // 其他M已在排除这个span时跳过
        if s.state != mSpanInUse {
            // This can happen if direct sweeping already
            // swept this span, but in that case the sweep
            // generation should always be up-to-date.
            if s.sweepgen != sg {
                print("runtime: bad span s.state=", s.state, " s.sweepgen=", s.sweepgen, " sweepgen=", sg, "\n")
                throw("non in-use span in unswept list")
            }
            continue
        }
        // 原子增添span的sweepgen, 失利示意其他M已入手下手排除这个span, 跳过
        if s.sweepgen != sg-2 || !atomic.Cas(&s.sweepgen, sg-2, sg-1) {
            continue
        }
        // 排除这个span, 然后跳出轮回
        npages = s.npages
        if !s.sweep(false) {
            // Span is still in-use, so this returned no
            // pages to the heap and the span needs to
            // move to the swept in-use list.
            npages = 0
        }
        break
    }

    // 更新同时实行sweep的使命数目
    // Decrement the number of active sweepers and if this is the
    // last one print trace information.
    if atomic.Xadd(&mheap_.sweepers, -1) == 0 && atomic.Load(&mheap_.sweepdone) != 0 {
        if debug.gcpacertrace > 0 {
            print("pacer: sweep done at heap size ", memstats.heap_live>>20, "MB; allocated ", (memstats.heap_live-mheap_.sweepHeapLiveBasis)>>20, "MB during sweep; swept ", mheap_.pagesSwept, " pages at ", sweepRatio, " pages/byte\n")
        }
    }
    // 许可G被抢占
    _g_.m.locks--
    // 返回排除的页数
    return npages
}

span的sweep函数用于排除单个span:

// Sweep frees or collects finalizers for blocks not marked in the mark phase.
// It clears the mark bits in preparation for the next GC round.
// Returns true if the span was returned to heap.
// If preserve=true, don't return it to heap nor relink in MCentral lists;
// caller takes care of it.
//TODO go:nowritebarrier
func (s *mspan) sweep(preserve bool) bool {
    // It's critical that we enter this function with preemption disabled,
    // GC must not start while we are in the middle of this function.
    _g_ := getg()
    if _g_.m.locks == 0 && _g_.m.mallocing == 0 && _g_ != _g_.m.g0 {
        throw("MSpan_Sweep: m is not locked")
    }
    sweepgen := mheap_.sweepgen
    if s.state != mSpanInUse || s.sweepgen != sweepgen-1 {
        print("MSpan_Sweep: state=", s.state, " sweepgen=", s.sweepgen, " mheap.sweepgen=", sweepgen, "\n")
        throw("MSpan_Sweep: bad span state")
    }

    if trace.enabled {
        traceGCSweepSpan(s.npages * _PageSize)
    }

    // 统计已清算的页数
    atomic.Xadd64(&mheap_.pagesSwept, int64(s.npages))

    spc := s.spanclass
    size := s.elemsize
    res := false

    c := _g_.m.mcache
    freeToHeap := false

    // The allocBits indicate which unmarked objects don't need to be
    // processed since they were free at the end of the last GC cycle
    // and were not allocated since then.
    // If the allocBits index is >= s.freeindex and the bit
    // is not marked then the object remains unallocated
    // since the last GC.
    // This situation is analogous to being on a freelist.

    // 推断在special中的析构器, 假如对应的对象已不再存活则标记对象存活防备接纳, 然后把析构器移到运转行列
    // Unlink & free special records for any objects we're about to free.
    // Two complications here:
    // 1. An object can have both finalizer and profile special records.
    //    In such case we need to queue finalizer for execution,
    //    mark the object as live and preserve the profile special.
    // 2. A tiny object can have several finalizers setup for different offsets.
    //    If such object is not marked, we need to queue all finalizers at once.
    // Both 1 and 2 are possible at the same time.
    specialp := &s.specials
    special := *specialp
    for special != nil {
        // A finalizer can be set for an inner byte of an object, find object beginning.
        objIndex := uintptr(special.offset) / size
        p := s.base() + objIndex*size
        mbits := s.markBitsForIndex(objIndex)
        if !mbits.isMarked() {
            // This object is not marked and has at least one special record.
            // Pass 1: see if it has at least one finalizer.
            hasFin := false
            endOffset := p - s.base() + size
            for tmp := special; tmp != nil && uintptr(tmp.offset) < endOffset; tmp = tmp.next {
                if tmp.kind == _KindSpecialFinalizer {
                    // Stop freeing of object if it has a finalizer.
                    mbits.setMarkedNonAtomic()
                    hasFin = true
                    break
                }
            }
            // Pass 2: queue all finalizers _or_ handle profile record.
            for special != nil && uintptr(special.offset) < endOffset {
                // Find the exact byte for which the special was setup
                // (as opposed to object beginning).
                p := s.base() + uintptr(special.offset)
                if special.kind == _KindSpecialFinalizer || !hasFin {
                    // Splice out special record.
                    y := special
                    special = special.next
                    *specialp = special
                    freespecial(y, unsafe.Pointer(p), size)
                } else {
                    // This is profile record, but the object has finalizers (so kept alive).
                    // Keep special record.
                    specialp = &special.next
                    special = *specialp
                }
            }
        } else {
            // object is still live: keep special record
            specialp = &special.next
            special = *specialp
        }
    }

    // 除错用
    if debug.allocfreetrace != 0 || raceenabled || msanenabled {
        // Find all newly freed objects. This doesn't have to
        // efficient; allocfreetrace has massive overhead.
        mbits := s.markBitsForBase()
        abits := s.allocBitsForIndex(0)
        for i := uintptr(0); i < s.nelems; i++ {
            if !mbits.isMarked() && (abits.index < s.freeindex || abits.isMarked()) {
                x := s.base() + i*s.elemsize
                if debug.allocfreetrace != 0 {
                    tracefree(unsafe.Pointer(x), size)
                }
                if raceenabled {
                    racefree(unsafe.Pointer(x), size)
                }
                if msanenabled {
                    msanfree(unsafe.Pointer(x), size)
                }
            }
            mbits.advance()
            abits.advance()
        }
    }

    // 盘算开释的对象数目
    // Count the number of free objects in this span.
    nalloc := uint16(s.countAlloc())
    if spc.sizeclass() == 0 && nalloc == 0 {
        // 假如span的范例是0(大对象)而且个中的对象已不存活则开释到heap
        s.needzero = 1
        freeToHeap = true
    }
    nfreed := s.allocCount - nalloc
    if nalloc > s.allocCount {
        print("runtime: nelems=", s.nelems, " nalloc=", nalloc, " previous allocCount=", s.allocCount, " nfreed=", nfreed, "\n")
        throw("sweep increased allocation count")
    }

    // 设置新的allocCount
    s.allocCount = nalloc

    // 推断span是不是无未分派的对象
    wasempty := s.nextFreeIndex() == s.nelems

    // 重置freeindex, 下次分派从0入手下手搜刮
    s.freeindex = 0 // reset allocation index to start of span.
    if trace.enabled {
        getg().m.p.ptr().traceReclaimed += uintptr(nfreed) * s.elemsize
    }

    // gcmarkBits变成新的allocBits
    // 然后从新分派一块悉数为0的gcmarkBits
    // 下次分派对象时能够依据allocBits得知哪些元素是未分派的
    // gcmarkBits becomes the allocBits.
    // get a fresh cleared gcmarkBits in preparation for next GC
    s.allocBits = s.gcmarkBits
    s.gcmarkBits = newMarkBits(s.nelems)

    // 更新freeindex入手下手的allocCache
    // Initialize alloc bits cache.
    s.refillAllocCache(0)

    // 假如span中已无存活的对象则更新sweepgen到最新
    // 下面会把span加到mcentral或许mheap
    // We need to set s.sweepgen = h.sweepgen only when all blocks are swept,
    // because of the potential for a concurrent free/SetFinalizer.
    // But we need to set it before we make the span available for allocation
    // (return it to heap or mcentral), because allocation code assumes that a
    // span is already swept if available for allocation.
    if freeToHeap || nfreed == 0 {
        // The span must be in our exclusive ownership until we update sweepgen,
        // check for potential races.
        if s.state != mSpanInUse || s.sweepgen != sweepgen-1 {
            print("MSpan_Sweep: state=", s.state, " sweepgen=", s.sweepgen, " mheap.sweepgen=", sweepgen, "\n")
            throw("MSpan_Sweep: bad span state after sweep")
        }
        // Serialization point.
        // At this point the mark bits are cleared and allocation ready
        // to go so release the span.
        atomic.Store(&s.sweepgen, sweepgen)
    }

    if nfreed > 0 && spc.sizeclass() != 0 {
        // 把span加到mcentral, res即是是不是增加胜利
        c.local_nsmallfree[spc.sizeclass()] += uintptr(nfreed)
        res = mheap_.central[spc].mcentral.freeSpan(s, preserve, wasempty)
        // freeSpan会更新sweepgen
        // MCentral_FreeSpan updates sweepgen
    } else if freeToHeap {
        // 把span开释到mheap
        // Free large span to heap

        // NOTE(rsc,dvyukov): The original implementation of efence
        // in CL 22060046 used SysFree instead of SysFault, so that
        // the operating system would eventually give the memory
        // back to us again, so that an efence program could run
        // longer without running out of memory. Unfortunately,
        // calling SysFree here without any kind of adjustment of the
        // heap data structures means that when the memory does
        // come back to us, we have the wrong metadata for it, either in
        // the MSpan structures or in the garbage collection bitmap.
        // Using SysFault here means that the program will run out of
        // memory fairly quickly in efence mode, but at least it won't
        // have mysterious crashes due to confused memory reuse.
        // It should be possible to switch back to SysFree if we also
        // implement and then call some kind of MHeap_DeleteSpan.
        if debug.efence > 0 {
            s.limit = 0 // prevent mlookup from finding this span
            sysFault(unsafe.Pointer(s.base()), size)
        } else {
            mheap_.freeSpan(s, 1)
        }
        c.local_nlargefree++
        c.local_largefree += size
        res = true
    }
    
    // 假如span未加到mcentral或许未开释到mheap, 则示意span仍在运用
    if !res {
        // 把仍在运用的span加到sweepSpans的"已排除"行列中
        // The span has been swept and is still in-use, so put
        // it on the swept in-use list.
        mheap_.sweepSpans[sweepgen/2%2].push(s)
    }
    return res
}

从bgsweep和前面的分派器能够看出扫描阶段的事情是非常懒散(lazy)的,
现实大概会涌现前一阶段的扫描还未完成, 就须要入手下手新一轮的GC的状态,
所以每一轮GC入手下手之前都须要完成前一轮GC的扫描事情(Sweep Termination阶段).

GC的悉数流程都剖析终了了, 末了贴上写屏蔽函数writebarrierptr的完成:

// NOTE: Really dst *unsafe.Pointer, src unsafe.Pointer,
// but if we do that, Go inserts a write barrier on *dst = src.
//go:nosplit
func writebarrierptr(dst *uintptr, src uintptr) {
    if writeBarrier.cgo {
        cgoCheckWriteBarrier(dst, src)
    }
    if !writeBarrier.needed {
        *dst = src
        return
    }
    if src != 0 && src < minPhysPageSize {
        systemstack(func() {
            print("runtime: writebarrierptr *", dst, " = ", hex(src), "\n")
            throw("bad pointer in write barrier")
        })
    }
    // 标记指针
    writebarrierptr_prewrite1(dst, src)
    // 设置指针到目的
    *dst = src
}

writebarrierptr_prewrite1函数以下:

// writebarrierptr_prewrite1 invokes a write barrier for *dst = src
// prior to the write happening.
//
// Write barrier calls must not happen during critical GC and scheduler
// related operations. In particular there are times when the GC assumes
// that the world is stopped but scheduler related code is still being
// executed, dealing with syscalls, dealing with putting gs on runnable
// queues and so forth. This code cannot execute write barriers because
// the GC might drop them on the floor. Stopping the world involves removing
// the p associated with an m. We use the fact that m.p == nil to indicate
// that we are in one these critical section and throw if the write is of
// a pointer to a heap object.
//go:nosplit
func writebarrierptr_prewrite1(dst *uintptr, src uintptr) {
    mp := acquirem()
    if mp.inwb || mp.dying > 0 {
        releasem(mp)
        return
    }
    systemstack(func() {
        if mp.p == 0 && memstats.enablegc && !mp.inwb && inheap(src) {
            throw("writebarrierptr_prewrite1 called with mp.p == nil")
        }
        mp.inwb = true
        gcmarkwb_m(dst, src)
    })
    mp.inwb = false
    releasem(mp)
}

gcmarkwb_m函数以下:

func gcmarkwb_m(slot *uintptr, ptr uintptr) {    if writeBarrier.needed {        // Note: This turns bad pointer writes into bad
        // pointer reads, which could be confusing. We avoid
        // reading from obviously bad pointers, which should
        // take care of the vast majority of these. We could
        // patch this up in the signal handler, or use XCHG to
        // combine the read and the write. Checking inheap is
        // insufficient since we need to track changes to
        // roots outside the heap.
        //
        // Note: profbuf.go omits a barrier during signal handler
        // profile logging; that's safe only because this deletion barrier exists.
        // If we remove the deletion barrier, we'll have to work out
        // a new way to handle the profile logging.
        if slot1 := uintptr(unsafe.Pointer(slot)); slot1 >= minPhysPageSize {            if optr := *slot; optr != 0 {                // 标记旧指针
                shade(optr)
            }
        }        // TODO: Make this conditional on the caller's stack color.
        if ptr != 0 && inheap(ptr) {            // 标记新指针
            shade(ptr)
        }
    }
}

shade函数以下:

// Shade the object if it isn't already.
// The object is not nil and known to be in the heap.
// Preemption must be disabled.
//go:nowritebarrier
func shade(b uintptr) {
    if obj, hbits, span, objIndex := heapBitsForObject(b, 0, 0); obj != 0 {
        gcw := &getg().m.p.ptr().gcw
        // 标记一个对象存活, 并把它加到标记行列(该对象变成灰色)
        greyobject(obj, 0, 0, hbits, span, gcw, objIndex)
        // 假如标记了制止当地标记行列则flush到全局标记行列
        if gcphase == _GCmarktermination || gcBlackenPromptly {
            // Ps aren't allowed to cache work during mark
            // termination.
            gcw.dispose()
        }
    }
}

更多golang学问请关注golang教程栏目。

以上就是golang中gc完成道理引见的细致内容,更多请关注ki4网别的相干文章!

  选择打赏方式
微信赞助

打赏

QQ钱包

打赏

支付宝赞助

打赏

  选择分享方式
  移步手机端
golang中gc完成道理引见_后端开发

1、打开你手机的二维码扫描APP
2、扫描左则的二维码
3、点击扫描获得的网址
4、可以在手机端阅读此文章
标签:

发表评论

选填

必填

必填

选填

请拖动滑块解锁
>>