GO represents strings as byte slices using under the hood. This means you can access different indexes of a string like you would for a slice variable.
A byte slice is a slice whose underlying type is a slice. Byte slices are more like lists of bytes that represent UTF-8 encodings of Unicode code points.
Strings are immutable, unicode compliant and are UTF-8 encoded
Accessing the individual bytes of a string
I mentioned above that a string is a slice of bytes. We can access every individual byte in a string
package main
import (
"fmt"
)
func printBytes(s string) {
fmt.Printf("Bytes: ")
for i := 0; i < len(s); i++ {
fmt.Printf("%x ", s[i])
}
}
func main() {
string := "Hello String"
printBytes(string)
}
outputs:
String: Hello World
Bytes: 48 65 6c 6c 6f 20 57 6f 72 6c 64
We print the bytes in the string 'Hello String' by looping through the string using len() method. the len() method returns the number of bytes in the string, we then use the returned number to loop through the string and access the bytes at each index. The bytes are printed in hexadecimal formats using %x
format.
Accessing individual characters of a string
Let's modify the above program a little bit to print the characters of the string.
package main
import (
"fmt"
)
func printBytes(s string) {
fmt.Printf("Bytes: ")
for i := 0; i < len(s); i++ {
fmt.Printf("%x ", s[i])
}
}
func printChars(s string) {
fmt.Printf("Characters: ")
for i := 0; i < len(s); i++ {
fmt.Printf("%c ", s[i])
}
}
func main() {
name := "Hello World"
fmt.Printf("String: %s\n", name)
printChars(name)
fmt.Printf("\n")
printBytes(name)
}
String: Hello World
Characters: H e l l o W o r l d
Bytes: 48 65 6c 6c 6f 20 57 6f 72 6c 64
The logic remains the same as above, but this time, you would notice the use of %c
format specifier, which is used to to print the characters
of the string in the method.
In UTF-8 encoding a code point can occupy more than 1 byte, so this method of accessing the characters is not well suited since we are only assuming that each code point occupies one byte. A better approach is to use runes
package main
import (
"fmt"
)
func printBytes(s string) {
fmt.Printf("Bytes: ")
for i := 0; i < len(s); i++ {
fmt.Printf("%x ", s[i])
}
}
func printChars(s string) {
fmt.Printf("Characters: ")
for i := 0; i < len(s); i++ {
fmt.Printf("%c ", s[i])
}
}
func main() {
testSring := "Señor"
fmt.Printf("String: %s\n", testSring)
printChars(testSring)
fmt.Printf("\n")
printBytes(testSring)
}
This outputs:
String: Señor
Characters: S e à ± o r
Bytes: 53 65 c3 b1 6f 72
Notice that the program breaks, the characters returns à ± instead for ñ. The reason is that the Unicode code point of ñ is U+00F1 and its UTF-8 encoding occupies 2 bytes c3 and b1. We are trying to print characters assuming that each code point will be one byte long which is wrong.
Rune
A rune is simply a character. It is a builtin data type in Go. Rune literals are 32-bit integer values that represents a Unicode Codepoint.
package main
import (
"fmt"
)
func printBytes(s string) {
fmt.Printf("Bytes: ")
for i := 0; i < len(s); i++ {
fmt.Printf("%x ", s[i])
}
}
func printChars(s string) {
fmt.Printf("Characters: ")
runes := []rune(s)
for i := 0; i < len(runes); i++ {
fmt.Printf("%c ", runes[i])
}
}
func main() {
testString := "Señor"
fmt.Printf("String: %s\n", testString)
printChars(testString)
fmt.Printf("\n")
printBytes(testString)
}
String: Señor
Characters: S e ñ o r
Bytes: 53 65 c3 b1 6f 72
In this example, the string is converted to a slice of runes using []rune. We then loop over it and display the characters. This works because a rune can represent any number of bytes the code point has.
Accessing specific characters in a string
now we have seen how to access all the characters of a string. Let's see how we can access the individual indexes of the string. Remember that a string in Go is a slice of bytes so we can easily access the character at a specific index like we would for a slice, or an array without needing to loop through the string or convert it to a rune.
package main
import (
"fmt"
)
func main() {
testString := "Hello String"
fmt.Println(testString[2])
fmt.Println(testString[1])
fmt.Println(testString[4])
}
108
101
111
This returns the unicode code points for the specified indexes
Trying to access an index that is larger than your string's length throws an index out of range error, since the index specified exceeds the available range in your declared string
That was swift, all we did was declare the string and specify the index we would like to access. This is actually not our intended purpose, we still need to be able to access the actual character and not it's unicode value.
To access the character, we convert the Unicode code point using the builtin string method string()
package main
import (
"fmt"
)
func main() {
testString := "Hello String"
fmt.Println(string(testString[2]))
fmt.Println(string(testString[1]))
fmt.Println(string(testString[4]))
}
l
e
o
A simple program to check if a string begins with a lower case letter or an upper case letter
Using our knowledge on accessing string values, we are going to write a small Go program that reports if a string passed in begins with a lower-case or upper-case letter
Declare package and and a function that checks if the whether the string has a lower-case letter at the begining.
There is no perform any checks if the parameter is an empty string, so the function checks for that first and returns false
is it's an empty string
Next is the actual work, Go comparisons can automatically compare values within a range, in this case, we are checking if the first slice index of the string parameter exists within the range of lower-case letters.
package main
// startsWithLowerCase reports whether the string has a lower-case letter at the beginning.
func startsWithLowerCase(str string) bool {
if len(str) == 0 {
return false
}
c := str[0]
return 'a' <= c && c <= 'z'
}
startsWithUpperCase
function also compares the first letter of the string parameter across a range, but this time, it compares across a range of capital letters. add this function to your program
// startsWithUpperCase reports whether the string has an upper-case letter at the beginning.
func startsWithUpperCase(str string) bool {
if len(str) == 0 {
return false
}
c := str[0]
return 'A' <= c && c <= 'Z'
}
It's time to wrap up and test out program, declare the main function. Inside the main function, you would declare your test string and call the functions passing the testString as parameter. We want to properly report our results so we use fmt.Printf to format our report and print to the console
func main() {
testString := "Hello String"
fmt.Printf("'%s' begins with upper-case letter? %t \n",
testString,
startsWithUpperCase(testString))
fmt.Printf("'%s' begins with lower-case letter? %t \n",
testString,
startsWithLowerCase(testString))
}
'Hello String' begins with upper-case letter? true
'Hello String' begins with lower-case letter? false
Cooool right? You have just created an enterprise grade program. Yes, startsWithLowerCase
is the same logic used to in Go time package for the purpose of preventing matching strings like "Month" when looking for "Mon"
Conclusion
With this deep dive on accessing held in Go strings, you're ready to take over the world. But before that, There’s only one way to learn to develop Go programs: write a lot of code. Keep coding and taking over the world is only a matter of time.
Thank you for reading, I'm Azeez Lukman and here's a developer's journey building something awesome every day. Please let's meet on Twitter, LinkedIn and GitHub and anywhere else @robogeeek95
Top comments (4)
Iterating over a string will normally give you the runes within that string, so the conversion wouldn't be necessary, see this playground snippet as an example play.golang.org/p/LkdB8zO4Cu_d.
Also from Effective Go,
You're right Andrew. Thanks for pointing that out.
In addition to that, there's an even better way to access runes in a string:
which gives:
Yeah,
%c
would format it as a char, so you can avoid casting it to a string like I did with myfmt.Println
call.Very cool #TIL