Strings
In Go, strings are essentially immutable, read-only byte sequences. Here, "byte sequence" means that the underlying data of a string consists of a sequence of bytes arranged in order, and these bytes occupy a contiguous memory space.
Literals
As mentioned earlier, there are two ways to express string literals: regular strings and raw strings.
Regular Strings
Regular strings are represented by double quotes "", support escape sequences, and do not support multi-line writing. Here are some regular strings:
"This is a regular string\n"
"abcdefghijlmn\nopqrst\t\\uvwxyz"This is a regular string
abcdefghijlmn
opqrst \uvwxyzRaw Strings
Raw strings are represented by backticks, do not support escape sequences, and support multi-line writing. All characters in raw strings are output as-is, including newlines and indentation.
`This is a raw string, newline
tab indentation, \t tab character is invalid, newline
"This is a regular string"
end
`This is a raw string, newline
tab indentation, \t tab character is invalid, newline
"This is a regular string"
endAccess
Since a string is essentially a byte sequence, the index operation str[i] is designed to return the i-th byte. The syntax is consistent with slices. For example, to access the first element of a string:
func main() {
str := "this is a string"
fmt.Println(str[0])
}The output is the byte encoding value, not the character:
116Slicing a string:
func main() {
str := "this is a string"
fmt.Println(string(str[0:4]))
}thisAttempting to modify a string element:
func main() {
str := "this is a string"
str[0] = 'a' // cannot compile
fmt.Println(str)
}main.go:7:2: cannot assign to str[0] (value of type byte)Although you cannot modify a string, you can overwrite it:
func main() {
str := "this is a string"
str = "that is a string"
fmt.Println(str)
}that is a stringConversion
Strings can be converted to byte slices, and byte slices or byte sequences can also be converted to strings. Examples:
func main() {
str := "this is a string"
// Explicit type conversion to byte slice
bytes := []byte(str)
fmt.Println(bytes)
// Explicit type conversion to string
fmt.Println(string(bytes))
}String content is read-only and immutable, but byte slices can be modified.
func main() {
str := "this is a string"
fmt.Println(&str)
bytes := []byte(str)
// Modify the byte slice
bytes = append(bytes, 96, 97, 98, 99)
// Assign to the original string
str = string(bytes)
fmt.Println(str)
}After converting a string to a byte slice, they are completely unrelated because Go allocates a new memory space for the byte slice and then copies the string's memory to it. Modifying the byte slice does not affect the original string. This is done for memory safety.
In this case, if the string or byte slice to be converted is large, the performance overhead can be quite high. However, you can also use the unsafe library to achieve zero-copy conversion, but you need to bear the security risks yourself. For example in the following, the addresses of b1 and s1 are the same:
func main() {
s1 := "hello world"
b1 := unsafe.Slice(unsafe.StringData(s1), len(s1))
fmt.Printf("%p %p", unsafe.StringData(s1), unsafe.SliceData(b1))
}0xe27bb2 0xe27bb2Length
The length of a string is actually not the number of characters, but the length of the byte sequence. However, most of the time we deal with ASCII characters, where each character can be represented by exactly one byte, so the byte length and character count happen to be equal. Use the built-in function len to get the string length:
func main() {
str := "this is a string" // appears to have length 16
str2 := "这是一个字符串" // appears to have length 7
fmt.Println(len(str), len(str2))
}16 21It looks like the Chinese string is shorter than the English string, but the actual length obtained is longer than the English string. This is because in unicode encoding, a Chinese character occupies 3 bytes in most cases, while an English character occupies only 1 byte. This can be seen by outputting the first element of the string:
func main() {
str := "this is a string"
str2 := "这是一个字符串"
fmt.Println(string(str[0]))
fmt.Println(string(str2[0]))
fmt.Println(string(str2[0:3]))
}t // letter t
è // encoding value of the first "fragment" (first byte) of a Chinese character, which happens to be the same as the encoding value of the Italian character è
这 // Chinese characterCopy
Similar to array slice copying, string copying is actually byte slice copying, using the built-in function copy:
func main() {
var dst, src string
src = "this is a string"
desBytes := make([]byte, len(src))
copy(desBytes, src)
dst = string(desBytes)
fmt.Println(src, dst)
}You can also use the strings.Clone function, but the internal implementation is similar:
func main() {
var dst, src string
src = "this is a string"
dst = strings.Clone(src)
fmt.Println(src, dst)
}Concatenation
String concatenation uses the + operator:
func main() {
str := "this is a string"
str = str + " that is a int"
fmt.Println(str)
}You can also convert to a byte slice and then append elements:
func main() {
str := "this is a string"
bytes := []byte(str)
bytes = append(bytes, "that is a int"...)
str = string(bytes)
fmt.Println(str)
}The performance of both concatenation methods is poor. They can be used in general cases, but if you have higher performance requirements, you can use strings.Builder:
func main() {
builder := strings.Builder{}
builder.WriteString("this is a string ")
builder.WriteString("that is a int")
fmt.Println(builder.String())
}this is a string that is a intTraversal
As mentioned at the beginning of this article, strings in Go are read-only byte slices, meaning the unit of a string is bytes, not characters. This often occurs when traversing strings. For example:
func main() {
str := "hello world!"
for i := 0; i < len(str); i++ {
fmt.Printf("%d,%x,%s\n", str[i], str[i], string(str[i]))
}
}The example outputs the decimal and hexadecimal forms of bytes respectively.
104,68,h
101,65,e
108,6c,l
108,6c,l
111,6f,o
32,20,
119,77,w
111,6f,o
114,72,r
108,6c,l
100,64,d
33,21,!Since the characters in the example are all ASCII characters, they only need one byte to represent, so the result happens to have each byte correspond to one character. But if non-ASCII characters are included, the results are different:
func main() {
str := "hello 世界!"
for i := 0; i < len(str); i++ {
fmt.Printf("%d,%x,%s\n", str[i], str[i], string(str[i]))
}
}Normally, a Chinese character occupies 3 bytes, so you may see the following results:
104,68,h
101,65,e
108,6c,l
108,6c,l
111,6f,o
32,20,
228,e4,ä
184,b8,¸
150,96,
231,e7,ç
149,95,
140,8c,
33,21,!Traversing by bytes will split Chinese characters, which obviously causes garbled text. Go strings explicitly support UTF-8. To handle this situation, you need to use the rune type. When using for range to traverse, the default traversal unit type is rune. For example:
func main() {
str := "hello 世界!"
for _, r := range str {
fmt.Printf("%d,%x,%s\n", r, r, string(r))
}
}Output:
104,68,h
101,65,e
108,6c,l
108,6c,l
111,6f,o
32,20,
19990,4e16,世
30028,754c,界
33,21,!rune is essentially a type alias for int32. The unicode character set ranges from 0x0000 to 0x10FFFF, with a maximum of only three bytes. The maximum number of bytes for valid UTF-8 encoding is only 4 bytes. Therefore, using int32 to store it is natural. Converting the string to []rune and then traversing is the same principle:
func main() {
str := "hello 世界!"
runes := []rune(str)
for i := 0; i < len(runes); i++ {
fmt.Println(string(runes[i]))
}
}You can also use tools from the utf8 package:
func main() {
str := "hello 世界!"
for i, w := 0, 0; i < len(str); i += w {
r, width := utf8.DecodeRuneInString(str[i:])
fmt.Println(string(r))
w = width
}
}The outputs of these two examples are the same.
TIP
For more details about strings, visit Strings, bytes, runes and characters in Go.
