sync.Mutex

Locks are an important synchronization primitive in operating systems. Go provides two implementations in its standard library: mutex and read-write lock, corresponding to:

sync.Mutex, mutex lock: read-read exclusive, read-write exclusive, write-write exclusive
sync.RWMutex, read-write lock: read-read shared, read-write exclusive, write-write exclusive

Their usage scenarios are very common, used to protect shared memory for sequential access and modification under concurrent conditions, as shown in the following example:

import (
	"fmt"
	"sync"
)

func main() {
	var i int
	var wg sync.WaitGroup
	var mu sync.Mutex

	for range 10 {
		wg.Add(1)
		go func() {
			defer wg.Done()
			mu.Lock()
			viewI := i
			mu.Unlock()

			viewI++

			mu.Lock()
			i = viewI
			mu.Unlock()
		}()
	}

	wg.Wait()
	fmt.Println(i)
}

Without lock protection, the output of this function may be different each time it's executed and unpredictable. Obviously, in most scenarios we don't want such situations to occur. This example is very simple for most people, and you may already be proficient in using locks, but you may not understand how Go's locks are implemented internally. The code itself is not complex, and this article will provide a detailed explanation.

Locker

Before we begin, let's look at a type sync.Locker, which is an interface defined by Go:

// A Locker represents an object that can be locked and unlocked.
type Locker interface {
	Lock()
	Unlock()
}

The methods it provides are very simple and easy to understand: lock and unlock. However, due to Go's interface implementation being better than convention, most people may have never seen it. It's only mentioned here briefly because it's not that important. Both locks discussed later implement this interface.

Mutex

The mutex Mutex type definition is located in the sync/mutex.go file. It's a struct type:

type Mutex struct {
	state int32
	sema  uint32
}

Field definitions:

state, field representing the lock's state
sema, i.e., semaphore, which will be explained later

Let's first talk about state:

const (
    mutexLocked = 1 << iota // mutex is locked
    mutexWoken
    mutexStarving
)

state is a 32-bit integer type. The low 3 bits are used to represent the three states above. These three states are not independent; they can coexist.

mutexLocked=1, locked
mutexWoken=2, awakened
mutexStarving=4, starvation mode

The high 29 bits are used to represent how many goroutines are waiting for the lock. So theoretically, a mutex can be used by at most 2^29+1 goroutines simultaneously. However, in reality, it's unlikely to have so many goroutines. Even if each only occupies 2KB (initial stack space), the memory space required to create such a number of goroutines would be around 1TB.

+-----------------------------------+---------------+------------+-------------+
|              waiter               | mutexStarving | mutexWoken | mutexLocked |
+-----------------------------------+---------------+------------+-------------+
|              29 bits              | 1 bit         | 1 bit      | 1 bit       |
+-----------------------------------+---------------+------------+-------------+

A mutex has two operation modes: normal mode and starvation mode. Normal mode follows the order in which goroutines arrive in the blocking wait queue, i.e., FIFO. This is the general case and also when performance is best, because everyone follows the access order to hold the lock without issues. Starvation mode is the unusual case. This starvation refers to waiting goroutines being unable to hold the lock for a long time and remaining blocked. It doesn't mean the mutex itself is in a starvation state. So when do goroutines enter starvation state? Officials give an example: there's an earlier-arrived goroutine that blocks because it cannot hold the mutex. Later, when the lock is released and it's awakened, another goroutine that just runs to this point tries to hold the lock (likes to cut in line). Since the latter is in a running state (occupying CPU time slice), the latter has a high chance of successfully competing for the lock. In extreme cases, there may be many such goroutines, so the just-awakened goroutine will never be able to hold the lock (endless cutting in line). It arrived first but can never obtain the lock.

const (
	starvationThresholdNs = 1e6
)

To avoid this situation, Go sets a wait threshold starvationThresholdNs. If a goroutine still hasn't held the lock after more than 1ms, the mutex enters starvation mode. In starvation mode, the mutex ownership is directly transferred to the goroutine at the front of the wait queue. New goroutines won't try to hold the lock and instead go to the back of the queue to wait. In this way, in starvation mode, the mutex ownership is held one by one by goroutines in the wait queue (let the queued people get the lock first, line-cutters go to the back). When a goroutine successfully holds the lock, if it's the last waiting goroutine or the wait time is less than 1ms, it switches the mutex back to normal mode. This starvation mode design avoids some goroutines being unable to hold the lock for a long time and "starving to death".

TryLock

The mutex provides two methods for locking:

Lock(), acquire lock in blocking mode
TryLock(), acquire lock in non-blocking mode

Let's first look at the TryLock code, as its implementation is simpler:

func (m *Mutex) TryLock() bool {
	old := m.state
	if old&(mutexLocked|mutexStarving) != 0 {
		return false
	}

	if !atomic.CompareAndSwapInt32(&m.state, old, old|mutexLocked) {
		return false
	}

	return true
}

It starts with a check. If the lock is already held or in starvation mode (i.e., many goroutines are waiting for the lock), the current goroutine cannot obtain the lock. Otherwise, it uses CAS operation to try to update the state to mutexLocked. If the CAS operation returns false, it means another goroutine successfully obtained the lock during this period, so the current goroutine cannot obtain the lock. Otherwise, it successfully obtains the lock. From this code, it can be seen that the caller of TryLock() is the one trying to cut in line, because it directly grabs the lock regardless of whether there are goroutines waiting (old may not equal 0).

Lock

Below is the Lock code. It also uses CAS operation to try to directly hold the lock, but it's more "honest". It only directly holds the lock when no goroutines are blocking and waiting (old=0).

func (m *Mutex) Lock() {
    // Fast path: grab unlocked mutex.
    if atomic.CompareAndSwapInt32(&m.state, 0, mutexLocked) {
       return
    }
    // Slow path (outlined so that the fast path can be inlined)
    m.lockSlow()
}

If it finds goroutines blocking and waiting, it "honestly" queues at the back and enters the lockSlow spinning flow to wait for the lock (the core of the mutex). First, some variables are prepared:

func (m *Mutex) lockSlow() {
    var waitStartTime int64
    starving := false
    awoke := false
    iter := 0
    old := m.state

waitStartTime: used to record the start time of waiting, to check whether to enter starvation mode.
starving: indicates whether the current goroutine has been unable to obtain the lock for more than 1ms.
awoke: marks whether the current goroutine has been awakened.
iter: counter, records the number of spin iterations.
old: gets the current mutex state.

Then it enters a for loop to determine whether the current goroutine can enter spinning state.

Spinning is a multi-thread synchronization mechanism, also known as busy-waiting. When a thread doesn't hold the lock, it doesn't directly suspend and switch thread context but enters idle spinning. During this process, it continuously occupies CPU time slices. If lock contention is low or the lock hold time is very short, doing this can indeed avoid frequent thread context switches and effectively improve performance. However, it's not a panacea. In Go, abusing spinning can lead to the following dangerous consequences:

High CPU usage: Too many spinning goroutines consume大量 CPU resources, especially when the lock is held for a long time.
Affects goroutine scheduling: The total number of processor P is limited. If many spinning goroutines occupy P, other goroutines executing user code cannot be scheduled in time.
Cache coherency issues: The busy-wait characteristic of spin locks causes threads to repeatedly read the lock state in cache. If spinning goroutines run on different cores and the lock state isn't updated to global memory in time, goroutines read inaccurate lock states. Frequent cache coherency synchronization also significantly reduces performance.

So not all goroutines can enter spinning state. It must pass the following strict judgment:

for {
    if old&(mutexLocked|mutexStarving) == mutexLocked && runtime_canSpin(iter) {
        if !awoke && old&mutexWoken == 0 && old>>mutexWaiterShift != 0 &&
        atomic.CompareAndSwapInt32(&m.state, old, old|mutexWoken) {
            awoke = true
        }
        runtime_doSpin()
        iter++
        old = m.state
        continue
    }
    ...
}

Conditions are as follows:

Current lock is already held and cannot be in starvation mode. Otherwise, it means some goroutines have been unable to obtain the lock for a long time, so directly enter blocking flow.

Enter runtime.sync_runtime_canSpin judgment flow:

 const (
	active_spin = 4
)
func sync_runtime_canSpin(i int) bool {
	if i >= active_spin || ncpu <= 1 || gomaxprocs <= sched.npidle.Load()+sched.nmspinning.Load()+1 {
		return false
	}
	if p := getg().m.p.ptr(); !runqempty(p) {
		return false
	}
	return true
}

Spin count is less than runtime.active_spin, default is 4 times. More times waste resources.
CPU core count is greater than 1. Spinning on single-core systems has no meaning.
Current gomaxprocs is greater than the sum of idle P and spinning P plus 1, meaning there aren't enough available processors for spinning.
Current P's local queue must be empty. Otherwise, it means there are other user tasks to execute, and spinning cannot proceed.

If spinning is allowed, it calls runtime.sync_runtime_doSpin to enter spinning. It actually executes the PAUSE instruction 30 times.

const (
	active_spin_cnt = 30
)

func sync_runtime_doSpin() {
	procyield(active_spin_cnt)
}

asm

TEXT runtime·procyield(SB),NOSPLIT,$0-0
	MOVL	cycles+0(FP), AX
again:
	PAUSE
	SUBL	$1, AX
	JNZ	again
	RET

If spinning cannot proceed, there are only two outcomes: successfully obtain the lock or enter the wait queue and block. However, before that, there are many things to handle:

If not in starvation mode, try to acquire the lock:

new := old
if old&mutexStarving == 0 {
	new |= mutexLocked
}

If the lock is already occupied or now in starvation mode, increment the waiting goroutine count:

if old&(mutexLocked|mutexStarving) != 0 {
	new += 1 << mutexWaiterShift
}

If the current goroutine is already in starvation state and the lock is still occupied, enter starvation mode:

if starving && old&mutexLocked != 0 {
	new |= mutexStarving
}

If the current goroutine was awakened during spinning, add the mutexWoken flag:

if awoke {
	new &^= mutexWoken
}

Then start trying to update the lock state through CAS. If update fails, directly start the next iteration:

if atomic.CompareAndSwapInt32(&m.state, old, new) {
    ...
}else {
    ...
}

If update succeeds, start the following judgment:

Original state is not starvation mode, and no goroutine occupies the lock. Then the current goroutine can directly hold the lock, exit the flow, and continue executing user code.
go
```
if old&(mutexLocked|mutexStarving) == 0 {
		break
}
```
Try to acquire the lock fails. Record wait time, where LIFO if true means queue is last-in-first-out, otherwise it's FIFO first-in-first-out.
go
```
queueLifo := waitStartTime != 0
if waitStartTime == 0 {
	waitStartTime = runtime_nanotime()
}
```
Try to acquire semaphore, enter runtime.semacquire1 function. If semaphore can be acquired, directly return without blocking. Otherwise, call runtime.gopark to suspend the current goroutine and wait for semaphore release.
go
```
runtime_SemacquireMutex(&m.sema, queueLifo, 1)
```

Reaching this step has two possibilities: either directly successfully acquired semaphore, or just awakened from blocking and successfully obtained semaphore. Either way, semaphore is successfully obtained. If now in starvation mode, can directly obtain the lock.

starving = starving || runtime_nanotime()-waitStartTime > starvationThresholdNs
old = m.state
if old&mutexStarving != 0 {
	delta := int32(mutexLocked - 1<<mutexWaiterShift)
	if !starving || old>>mutexWaiterShift == 1 {
		delta -= mutexStarving
	}
	atomic.AddInt32(&m.state, delta)
	break
}

If not in starvation mode, reset iter and restart spinning flow.
go
```
awoke = true
iter = 0
```

At this point, the locking flow ends. The whole process is quite complex, using both spinning wait and semaphore blocking wait methods, balancing performance and fairness, suitable for most lock contention scenarios.

Unlock

The unlock flow is relatively much simpler. It first tries fast unlock. If new is 0, it means there are no waiting goroutines and not in starvation mode, i.e., unlock successful, can directly return.

func (m *Mutex) Unlock() {
	new := atomic.AddInt32(&m.state, -mutexLocked)
	if new != 0 {
		m.unlockSlow(new)
	}
}

Otherwise, it needs to enter the unlockSlow flow:

First check whether already unlocked:

if (new+mutexLocked)&mutexLocked == 0 {
	fatal("sync: unlock of unlocked mutex")
}

If in starvation mode, directly release semaphore to complete unlock. In starvation mode, the current unlocking goroutine directly transfers lock ownership to the next waiting goroutine.
go
```
if new&mutexStarving == 0 {
	...
} else {
	runtime_Semrelease(&m.sema, true, 1)
}
```

If not in starvation mode, enter normal unlock flow:

If no goroutines are waiting, or other awakened goroutines have already obtained the lock, or the lock has entered starvation mode:
go
```
if old>>mutexWaiterShift == 0 || old&(mutexLocked|mutexWoken|mutexStarving) != 0 {
    return
}
```

Otherwise, release semaphore to wake up the next waiting goroutine, set current lock state to mutexWoken:

new = (old - 1<<mutexWaiterShift) | mutexWoken
if atomic.CompareAndSwapInt32(&m.state, old, new) {
    runtime_Semrelease(&m.sema, false, 1)
    return
}
old = m.state

Finally, the unlock flow ends.

RWMutex

The read-write mutex RWMutex type definition is located in the sync/rwmutex.go file. Its implementation is also based on mutex.

type RWMutex struct {
	w           Mutex        // held if there are pending writers
	writerSem   uint32       // semaphore for writers to wait for completing readers
	readerSem   uint32       // semaphore for readers to wait for completing writers
	readerCount atomic.Int32 // number of pending readers
	readerWait  atomic.Int32 // number of departing readers
}

Field definitions:

w, a mutex. When writer goroutine holds this mutex, other writer goroutines and reader goroutines are blocked.
writerSem, write semaphore, used to block writer goroutines waiting for reader goroutines. Writer goroutine acquires semaphore, reader goroutine releases semaphore.
readerSem, read semaphore, used to block reader goroutines waiting for writer goroutines. Reader goroutine acquires semaphore, writer goroutine releases semaphore.
readerCount, core field, the entire read-write lock depends on it to maintain state.
readerWait, indicates the number of reader goroutines that need to wait when writer goroutine is blocked.

The general principle of read-write lock is: through mutex to make writer goroutines mutually exclusive, through two semaphores writerSem and readerSem to make read-write mutually exclusive, read-read shared.

readerCount

Since this readerCount changes quite a bit and is very important, it's singled out for discussion. It roughly summarizes to the following states:

0: Currently neither active reader goroutines nor active writer goroutines, in idle state.
-rwmutexMaxReaders: A writer goroutine has already held the mutex, currently no active reader goroutines.
-rwmutexMaxReaders+N: A writer goroutine has already held the write lock, current reader goroutines need to block and wait for writer goroutine to release write lock.
N-rwmutexMaxReaders: A writer goroutine has already held the mutex, needs to block and wait for remaining reader goroutines to release read lock.
N: Currently has N active reader goroutines, i.e., added N read locks, no active writer goroutines.

Among them, rwmutexMaxReaders is a constant value, its value is twice the number of goroutines that can block and wait on the mutex, because half are reader goroutines and half are writer goroutines.

const rwmutexMaxReaders = 1 << 30

Part of the entire read-write lock, this readerCount is the most complex. Understanding its changes clarifies the read-write lock's workflow.

TryLock

As usual, let's first look at the simplest TryLock():

func (rw *RWMutex) TryLock() bool {
	if !rw.w.TryLock() {
		return false
	}
	if !rw.readerCount.CompareAndSwap(0, -rwmutexMaxReaders) {
		rw.w.Unlock()
		return false
	}
	return true
}

It starts by trying to call the mutex's TryLock(). If it fails, directly return. Then use CAS operation to try to update readerCount value from 0 to -rwmutexMaxReaders. 0 represents no reader goroutines working, -rwmutexMaxReaders indicates writer goroutine has already held the mutex. If CAS operation update fails, unlock the mutex. If successful, return true.

Lock

Next is Lock(), its implementation is also very simple.

func (rw *RWMutex) Lock() {
	rw.w.Lock()
	r := rw.readerCount.Add(-rwmutexMaxReaders) + rwmutexMaxReaders
	if r != 0 && rw.readerWait.Add(r) != 0 {
		runtime_SemacquireRWMutex(&rw.writerSem, false, 0)
	}
}

First, it competes with other writer goroutines until holding the mutex, then performs this operation: first atomically subtract -rwmutexMaxReaders, then non-atomically add rwmutexMaxReaders to the new value:

r = rw.readerCount.Add(-rwmutexMaxReaders) + rwmutexMaxReaders

Break it into two steps to look at:

This is to notify other reader goroutines that a writer goroutine is now trying to hold the lock, as already covered in the TryLock section.
go
```
rw.readerCount.Add(-rwmutexMaxReaders)
```
Then add rwmutexMaxReaders to get r, which represents the number of reader goroutines currently working.
go
```
r = rw.readerCount.Add(-rwmutexMaxReaders) + rwmutexMaxReaders
```

Then check if reader goroutines are working, then add r to readerWait value. If still not 0, it means need to wait for these reader goroutines to finish work, then enter runtime_SemacquireRWMutex flow to try to acquire semaphore writerSem. This semaphore is released by reader goroutines. If can get semaphore, it means reader goroutines have finished work. Otherwise, need to enter blocking queue to wait (this part of semaphore logic is basically consistent with the mutex part).

UnLock

Then there's UnLock(), releasing write lock.

func (rw *RWMutex) Unlock() {
    r := rw.readerCount.Add(rwmutexMaxReaders)
    if r >= rwmutexMaxReaders {
       fatal("sync: Unlock of unlocked RWMutex")
    }

    for i := 0; i < int(r); i++ {
       runtime_Semrelease(&rw.readerSem, false, 0)
    }

    rw.w.Unlock()
}

Its flow is as follows:

As mentioned earlier, during locking, readerCount is updated to negative value. Here adding rwmutexMaxReaders means no writer goroutines are now working, and the obtained value is the number of reader goroutines blocking and waiting.
go
```
r := rw.readerCount.Add(rwmutexMaxReaders)
```
If it's itself 0 or greater than 0, it means write lock has already been released:
go
```
if r >= rwmutexMaxReaders {
	fatal("sync: Unlock of unlocked RWMutex")
}
```

Release semaphore readerSem, wake up waiting reader goroutines:

for i := 0; i < int(r); i++ {
	runtime_Semrelease(&rw.readerSem, false, 0)
}

Finally release mutex, wake up waiting writer goroutines:
go
```
rw.w.Unlock()
```

Write lock release completed.

TryRLock

Next, let's look at the read lock part. This is the TryRLock code:

func (rw *RWMutex) TryRLock() bool {
	for {
		c := rw.readerCount.Load()
		if c < 0 {
			return false
		}
		if rw.readerCount.CompareAndSwap(c, c+1) {
			return true
		}
	}
}

It only does two things:

Check if writer goroutines are working. If yes, lock fails.

c := rw.readerCount.Load()
if c < 0 {
	return false
}

Try to increment readerCount by 1. If update succeeds, lock succeeds:
go
```
if rw.readerCount.CompareAndSwap(c, c+1) {
	return true
}
```
Otherwise, continue looping judgment until exit.

It can be seen that the readerCount dependency here is all maintained in the write lock part. This is also why write lock is discussed first, because the complex core parts are all maintained in the write lock part.

RLock

RLock logic is even simpler:

func (rw *RWMutex) RLock() {
	if rw.readerCount.Add(1) < 0 {
		runtime_SemacquireRWMutexR(&rw.readerSem, false, 0)
	}
}

It tries to increment readerCount value by 1. If the new value is still less than 0, it means writer goroutines are working, then enter readerSem semaphore blocking flow. Current goroutine enters blocking queue to wait.

RUnLock

RUnLock is also equally simple and easy to understand:

func (rw *RWMutex) RUnlock() {
    if r := rw.readerCount.Add(-1); r < 0 {
       rw.rUnlockSlow(r)
    }
}

func (rw *RWMutex) rUnlockSlow(r int32) {
	if r+1 == 0 || r+1 == -rwmutexMaxReaders {
		fatal("sync: RUnlock of unlocked RWMutex")
	}
	if rw.readerWait.Add(-1) == 0 {
		runtime_Semrelease(&rw.writerSem, false, 1)
	}
}

It first tries to decrement readerCount by one, indicating active reader goroutine count decrements by one. If obtained value is greater than 0, it means can directly release, because now no writer goroutine holds mutex. If less than 0, it means writer goroutine has already held mutex, it's waiting for all current reader goroutines to complete work. Next enter runlockSlow flow:

If original readerCount value is 0 (lock is idle) or -rwmutexMaxReaders (writer goroutine has no reader goroutines to wait for, i.e., read locks already all released), it means currently no active reader goroutines, no need to unlock:
go
```
if r+1 == 0 || r+1 == -rwmutexMaxReaders {
	fatal("sync: RUnlock of unlocked RWMutex")
}
```
If there are active reader goroutines, decrement readerWait by one. If current reader goroutine is the last active reader, release writerSem semaphore, wake up waiting writer goroutines:
go
```
if rw.readerWait.Add(-1) == 0 {
	runtime_Semrelease(&rw.writerSem, false, 1)
}
```

Read lock release flow ends.

Semaphore

The semaphore inside the mutex is a simple uint32 integer, representing semaphore acquisition and release through atomically decrementing by one and incrementing by one. The structure responsible for maintaining semaphores at runtime is runtime.semaRoot, whose type definition is located in the runtime/sema.go file. semaRoot uses a balanced binary tree (treap) to organize and manage semaphores. Each node in the tree represents a semaphore, node type is *sudog, which is a doubly linked list maintaining the wait queue for corresponding semaphores. Nodes maintain uniqueness through *sudog.elem (semaphore address) and maintain tree balance through *sudog.ticket field.

type semaRoot struct {
	lock  mutex
	treap *sudog        // root of balanced tree of unique waiters.
	nwait atomic.Uint32 // Number of waiters. Read w/o the lock.
}

The semaphore tree semaRoot depends on a lower-level mutex runtime.mutex to ensure its concurrency safety.

var semtable semTable

// Prime to not correlate with any user patterns.
const semTabSize = 251

type semTable [semTabSize]struct {
	root semaRoot
    // Used for memory alignment, improving performance
	pad  [cpu.CacheLinePadSize - unsafe.Sizeof(semaRoot{})]byte
}

semaRoot is stored in a global semaTable at runtime. It looks like a fixed-length array used to store multiple semaphore tree root node collections, but actually from an operational perspective, it's actually a hash table. Each element in the table contains a semaRoot and some padding bytes (pad), used for memory alignment and avoiding cache line contention. semTabSize is the semaphore table size constant, specifying table length as 251, usually choosing a prime number to reduce hash collisions and improve hashing efficiency.

func (t *semTable) rootFor(addr *uint32) *semaRoot {
	return &t[(uintptr(unsafe.Pointer(addr))>>3)%semTabSize].root
}

The rootFor method is equivalent to a hash function. It accepts a uint32 type pointer addr (i.e., semaphore address) and returns the pointer to the semaRoot structure corresponding to that address. This line of code first converts addr to a uintptr, then right shifts by 3 bits, equivalent to dividing by 8 (because one byte occupies 8 bits, dividing pointer address by 8 can map it to array index). By taking modulo semTabSize, it ensures the index is within the semaphore table size range. After obtaining semaRoot through index, it goes to the balanced tree to find the *sudog wait queue corresponding to the semaphore.

Acquire

Acquiring semaphore, corresponding implementation is runtime.semacquire1 function:

func semacquire1(addr *uint32, lifo bool, profile semaProfileFlags, skipframes int, reason waitReason)

It receives the following parameters:

addr, semaphore address
lifo, affects balanced tree dequeue order, default is FIFO, LIFO is last-in-first-out, when goroutine wait lock time is not 0 (at least blocked once), it's true
profile, flags for lock performance analysis
skipframes, number of skipped stack frames
reason, reason for blocking

Below briefly describes the entire semaphore acquisition flow:

Check goroutine state. If current goroutine is not the goroutine being scheduled, directly throw exception:
go
```
gp := getg()
if gp != gp.m.curg {
	throw("semacquire not on the G stack")
}
```

Check if semaphore can be acquired, and try to acquire semaphore through non-blocking method. If can acquire, can directly return:

for {
	v := atomic.Load(addr)
	if v == 0 {
		return false
	}
	if atomic.Cas(addr, v, v-1) {
		return true
	}
}

If cannot acquire non-blockingly, enter loop to acquire semaphore through normal means. First, get a *sudog from cache through acquireSudog(), this structure represents a blocking wait goroutine:
```
s := acquireSudog()
```
Then get semaphore tree from global table:
go
```
root := semtable.rootFor(addr)
```

Enter loop, lock semaphore tree, check again if semaphore can be acquired. If not, add it to semaphore tree, then call gopark to suspend and wait, until awakened continue repeating this process, looping until obtaining semaphore:

for {
	lockWithRank(&root.lock, lockRankRoot)
	root.nwait.Add(1)
	if cansemacquire(addr) {
		root.nwait.Add(-1)
		unlock(&root.lock)
		break
	}
	root.queue(addr, s, lifo)
	goparkunlock(&root.lock, reason, traceBlockSync, 4+skipframes)
	if s.ticket != 0 || cansemacquire(addr) {
		break
	}
}

Finally, when awakened, release *sudog, return it to cache:
go
```
releaseSudog(s)
```

Release

Release semaphore, wake up blocking wait goroutines. This functionality is implemented by runtime.semrelease1 function:

func semrelease1(addr *uint32, handoff bool, skipframes int)

It receives the following parameters:

addr, semaphore address
handoff, indicates whether to directly switch the G currently being scheduled by P to the awakened G, only true in starvation mode
skipframes, number of skipped stack frames

Below briefly describes the entire release process:

Get semaphore tree, then increment semaphore by one, indicating releasing one semaphore:
go
```
root := semtable.rootFor(addr)
atomic.Xadd(addr, 1)
```
If waiting goroutine count is 0, directly return:
go
```
if root.nwait.Load() == 0 {
	return
}
```

Lock semaphore tree, secondary check if there are waiting goroutines:

lockWithRank(&root.lock, lockRankRoot)
if root.nwait.Load() == 0 {
	unlock(&root.lock)
	return
}

Get a blocking wait goroutine from semaphore tree, nwait decrements by one, then release semaphore lock:
go
```
s, t0, tailtime := root.dequeue(addr)
if s != nil {
	root.nwait.Add(-1)
}
unlock(&root.lock)
```

Check if semaphore can be acquired:

if handoff && cansemacquire(addr) {
	s.ticket = 1
}

readyWithTime function directly uses the awakened goroutine G as the next goroutine to run for P, i.e., modify *p.runnext=g:
go
```
readyWithTime(s, 5+skipframes)
```
If handoff is true, then goyield will unbind the current semaphore-releasing goroutine G from current M, and re-add to the end of P local run queue, then start new scheduling loop, so that the awakened goroutine G can immediately get scheduled:
go
```
if s.ticket == 1 && getg().m.locks == 0 {
	goyield()
}
```

The semaphore acquisition and release flows are these. Go language uses semaphores not only in mutex locks. It's placed here because semaphores have the greatest correlation with mutex locks. Officials even wrote in comments:

// Asynchronous semaphore for sync.Mutex.

After understanding semaphores, looking back at mutex locks becomes clearer.

TIP

Regarding semaRoot semaphore tree, its dequeue and enqueue involve self-balancing operations which are quite complex to implement. Delving into these is unrelated to this article's theme and meaningless, so it's shielded. If interested, you can explore the source code yourself.

Summary

Mutex sync.Mutex implements goroutine waiting through spinning and semaphore two mechanisms. Spinning is non-blocking but needs strict limitations because it consumes CPU resources; while semaphores are blocking and can effectively avoid unnecessary resource consumption. To implement a fairer competition mechanism, Go distinguishes between normal mode and starvation mode to ensure goroutines can compete for locks more balancedly. Compared to runtime.mutex this kind of low-level lock, sync.Mutex as a user-oriented lock considers more actual usage scenarios during design.

Read-write lock sync.RWMutex implements write-write mutual exclusion through mutex sync.Mutex, and on this basis additionally adds two semaphores, used to implement read-write mutual exclusion and read-read sharing, thereby supporting multiple concurrent scenarios.

Although lock implementation appears complex, once you understand Mutex principles, learning other synchronization tools in sync standard library becomes much easier.

sync.Mutex ​

Locker ​

Mutex ​

TryLock ​

Lock ​

Unlock ​

RWMutex ​

readerCount ​

TryLock ​

Lock ​

UnLock ​

TryRLock ​

RLock ​

RUnLock ​

Semaphore ​

Acquire ​

Release ​

Summary ​

sync.Mutex

Locker

Mutex

TryLock

Lock

Unlock

RWMutex

readerCount

TryLock

Lock

UnLock

TryRLock

RLock

RUnLock

Semaphore

Acquire

Release

Summary