Note: Each chapter of Programming for Lovers comprises two parts. First, the “core text” presents critical concepts at a high level, avoiding language-specific details. The core text is followed by “code alongs,” where you will apply what you have learned while learning the specifics of the language syntax.
Code along video
Beneath the video, we provide learning objectives, code setup, and a detailed summary of the topics covered in the code along.
At the bottom of the page, you will have the opportunity to validate your work via auto-graded assessments that evaluate the functions covered in the code along.
Although we suggest completing the code along with us, you can find completed code from the code along in our course code repository.
Learning objectives
In this lesson, we will return to a computational problem that we introduced in the main text modeling the biological problem of finding the complementary strand of a given strand of DNA; that is, the reverse complement of a DNA string.
Reverse Complement Problem
Input: A DNA string pattern.
Output: The reverse complement of pattern.
We saw the power of modularity to solve this problem, since we can reduce finding a reverse complement to two problems: reversing a string, and taking the complementary nucleotide at each position. At the level of pseudocode, this corresponds to calling Reverse()
and Complement()
subroutines as follows.
ReverseComplement(pattern) pattern ← Reverse(pattern) pattern ← Complement(pattern) return pattern
We also saw that we could simplify these subroutines by calling Reverse()
on the output of Complement()
in a single line, leading to the following one-line function.
ReverseComplement(pattern) return Reverse(Complement(pattern))
In this lesson, we will implement these functions, and along the way, we will explore the basics of working with strings in Go.
Setup
Create a folder called strings
in your go/src
directory and create a text file called main.go
in the go/src/strings
folder. We will edit main.go
, which should have the following starter code.
package main import ( "fmt" ) func main() { fmt.Println("Strings.") }
Code along summary
Converting symbols and integers to strings
Say that we print the string conversion of the symbol 'A'
, as follows.
package main import ( "fmt" ) func main() { fmt.Println("Strings.") fmt.Println(string('A')) }
After saving main.go
, navigate into go/src/strings
from the command line, compile your code by executing the command go build
, and run your code by executing either ./strings
(Mac) or strings.exe
(Windows). As you might expect, you will see A
printed to the console.
Let’s now add a third print statement to our code. When you compile and run this code, you might expect 45
to be the final line printed, but instead -
is printed!
package main import ( "fmt" ) func main() { fmt.Println("Strings.") fmt.Println(string('A')) fmt.Println(string(45)) }
The reason why Go produces this odd behavior is that it interprets the type cast string(45)
as accessing the element at position 45 of an array called the “ASCII character table“, which in this case is a hyphen. This is because Go thinks about symbols as integers, a fact that we saw in the previous chapter.
Note: To access the previous command in the command line, you can hit the up arrow key (↑
). As a result, if you want to re-compile and re-run your code, you can typically hit the up arrow key twice to seego build
appear, and after executing this command, hit the up arrow key twice again to see the./strings
(orstrings.exe
) command appear.
Fortunately, Go includes a package called strconv
that makes converting from numbers to strings (and vice-versa) more natural. In the case of converting an integer to a string, we can use the built-in function strconv.Itoa()
. The following will print 45
as its last line, as you might like to verify.
package main import ( "fmt" "strconv" ) func main() { fmt.Println("Strings.") fmt.Println(string('A')) fmt.Println(strconv.Itoa(45)) //this prints 45 }
Converting strings to integers
We might also wish to convert a string containing an integer into a variable of type integer. (This will arise later in the course when we start taking parameters from the user, which are by default represented as strings.) To do so, we can use the command strconv.Atoi()
. as follows.
package main import ( "fmt" "strconv" ) func main() { fmt.Println("Strings.") fmt.Println(string('A')) fmt.Println(strconv.Itoa(45)) //this prints 45 j := strconv.Atoi("23") fmt.Println(j) }
However, we could imagine the following scenario, in which the input to strconv.Atoi()
is not the string representation of an integer.
package main import ( "fmt" "strconv" ) func main() { fmt.Println("Strings.") fmt.Println(string('A')) fmt.Println(strconv.Itoa(45)) //this prints 45 j := strconv.Atoi("Hi") // wait: "Hi" is not an integer! fmt.Println(j) }
Were we to compile the above code, we would obtain a compiler error. Go protects itself from this type of misbehavior by requiring that strconv.Atoi()
returns two variables: an integer as well as a variable having a special error
type. The strconv.Atoi()
function exemplifies a situation in which Go will report an error as an additional variable if it fails at some task.
Without delving into the details of error reporting, we will know that the conversion has gone well when the variable of type error
is equal to the special default value nil
. We can therefore check whether there was an error by checking whether this second variable is equal to nil
, passing to a panic()
statement if not.
package main import ( "fmt" "strconv" ) func main() { fmt.Println("Strings.") fmt.Println(string('A')) fmt.Println(strconv.Itoa(45)) //this prints 45 j, err := strconv.Atoi("Hi") // wait: "Hi" is not an integer! // if this goes OK, then error will equal nil if err != nil { panic(err) // will produce automatically generated error reporting and halt program } fmt.Println(j) // if we reach this line, then the conversion went well }
When we compile and run this code, we see an error message get printed because "Hi"
is not an integer. To fix this, we should instead pass the string representation of an integer to strconv.Atoi()
, which will cause err
to equal nil
as desired.
package main import ( "fmt" "strconv" ) func main() { fmt.Println("Strings.") fmt.Println(string('A')) fmt.Println(strconv.Itoa(45)) //this prints 45 j, err := strconv.Atoi("23") // if this goes OK, then error will equal nil if err != nil { panic(err) // will produce automatically generated error reporting and halt program } fmt.Println(j) // will print 23 }
Parsing decimals
We can also parse decimal variables from strings with strconv.ParseFloat()
. This function takes two inputs: the string to be converted, and an integer that indicates the amount of memory to devote to the resulting variable. We will say more about memory later, but for now, we note that the float64
variable earns its name from the fact that it takes 64 bits of memory, which we will allocate when calling ParseFloat()
. As with Atoi()
, ParseFloat()
takes the string representation of a number as input and returns the parsed decimal variable as well as an error variable.
package main import ( "fmt" "strconv" ) func main() { fmt.Println("Strings.") // ... pi, err2 := strconv.ParseFloat("3.14", 64) if err2 != nil { panic(err2) } fmt.Println(pi) }
String concatenation
In Go, string concatenation is represented by the +
operator. That is, given two strings s
and t
, s+t
will be a new string comprising the symbols of s
, immediately followed by the symbols of t
. When we compile and run the following code, we will see that u
has the value "HiLovers"
.
package main import ( "fmt" "strconv" ) func main() { fmt.Println("Strings.") // ... s := "Hi" t := "Lovers" u := s + t fmt.Println(u) }
Note: If we wanted to have a space inu
between the two constituent words, we could concatenate a space symbol usingu := s + " " + t
.
Strings are (kinda) arrays of symbols
One way of thinking about a string is as an array of symbols, in particular symbols of type byte
. In this way, we can access the first and last symbols of our string u using the notation u[0]
and u[len(u)-1]
, respectively.
package main import ( "fmt" "strconv" ) func main() { fmt.Println("Strings.") // ... s := "Hi" t := "Lovers" u := s + t fmt.Println(u) //accessing individual symbols fmt.Println(u[0]) // prints H fmt.Println(u[len(u)-1]) // prints s }
Accessing a symbol of a string produces a value having type byte, and so if we wanted to convert this symbol into a string, then we could type cast it accordingly.
package main import ( "fmt" "strconv" ) func main() { fmt.Println("Strings.") // ... s := "Hi" t := "Lovers" u := s + t fmt.Println(u) //accessing individual symbols fmt.Println(string(u[0])) // prints H fmt.Println(string(u[len(u)-1])) // prints s }
Finally, we can test the values of individual symbols. The following code will check whether t[2]
is equal to the symbol 'v'
(it is).
package main import ( "fmt" "strconv" ) func main() { fmt.Println("Strings.") // ... s := "Hi" t := "Lovers" u := s + t fmt.Println(u) //accessing individual symbols fmt.Println(string(u[0])) // prints H fmt.Println(string(u[len(u)-1])) // prints s if t[2] == 'v' { fmt.Println("The symbol at position 2 of t is v.") } }
Note: Strings and symbols are “case sensitive”. As a result, if we were to change the condition of the if statement above tot[2] == 'V'
, then it would evaluate tofalse
and we would not enter the if block.
Implementing ReverseComplement()
We are now ready to return to implementing our ReverseComplement()
function. As mentioned previously, we can quickly subcontract the job to two subroutines: Complement()
and Reverse()
. (In the main text, we asked you to write pseudocode for each of these subroutines as an exercise.)
func ReverseComplement(dna string) string { return Reverse(Complement(dna)) }
STOP: Say that we replaced the return statement of this function withreturn Complement(Reverse(dna))
. Would the output ofReverseComplement()
be the same?
Implementing Complement() and introducing switch statements
Here is one potential implementation of Complement()
, which uses a range of else if statements to complement each individual symbol of a given DNA string, and panics if some symbol is not a symbol from the DNA alphabet.
//Complement takes as input a string of DNA symbols. //It panics if any symbol is not A, C, G, or T. //Otherwise, it returns the string whose i-th symbol is the //complementary nucleotide of the i-th symbol of the input string. func Complement(dna string) string { for i, symbol := range dna { if symbol == 'A' { dna[i] = 'T' } else if symbol == 'C' { dna[i] = 'G' } else if symbol == 'G' { dna[i] = 'C' } else if symbol == 'T' { dna[i] = 'A' } else { panic("Invalid symbol in string given to Complement().") } } return dna }
Whenever we have so many possibilities covered by else if
statements, it can be worthwhile to transition these statements to a switch statement, which takes actions based on different cases, or values, that some variable (in this case dna[i]
) can have. A switch statement may also have a default case that indicates what action to take if none of the cases are met. The following implementation of Complement()
illustrates the general syntax of a switch statement.
//Complement takes as input a string of DNA symbols. //It panics if any symbol is not A, C, G, or T. //Otherwise, it returns the string whose i-th symbol is the //complementary nucleotide of the i-th symbol of the input string. func Complement(dna string) string { for i, symbol := range dna { switch symbol { case 'A': dna[i] = 'T' case 'C': dna[i] = 'G' case 'G': dna[i] = 'C' case 'T': dna[i] = 'A' default: panic("Invalid symbol in string given to Complement().") } } return dna }
Unfortunately, when we compile our code, we obtain an error: cannot assign to dna[i] (value of type byte)
. The reason why is explained by the technical implementation of strings; in Go, strings are read-only slices of bytes. This fact means that, unlike some other languages, Go will not allow us to edit individual symbols of a string once we have created it.
Although we cannot edit individual symbols of a string, we can nevertheless change a string once we have created it. For example, if we begin with s := "Hi"
and wish to change it, we can use the statement s = "Yo"
. We will use this fact to update our implementation of Complement()
by declaring an empty string dna2
and concatenating one symbol at a time to it. In the below function, we use double quotations around each symbol because the concatenation operation must be applied to two strings.
//Complement takes as input a string of DNA symbols. //It panics if any symbol is not A, C, G, or T. //Otherwise, it returns the string whose i-th symbol is the //complementary nucleotide of the i-th symbol of the input string. func Complement(dna string) string { var dna2 string // default value is "" for _, symbol := range dna { switch symbol { case 'A': dna2 = dna2 + "T" case 'C': dna2 = dna2 + "G" case 'G': dna2 = dna2 + "C" case 'T': dna2 = dna2 + "A" default: panic("Invalid symbol in string given to Complement().") } } return dna2 }
As with arithmetical expressions, we can replace statements of the form dna2 = dna2 + "T"
with the shorthand dna2 += "T"
.
//Complement takes as input a string of DNA symbols. //It panics if any symbol is not A, C, G, or T. //Otherwise, it returns the string whose i-th symbol is the //complementary nucleotide of the i-th symbol of the input string. func Complement(dna string) string { var dna2 string // default value is "" for _, symbol := range dna { switch symbol { case 'A': dna2 += "T" case 'C': dna2 += "G" case 'G': dna2 += "C" case 'T': dna2 += "A" default: panic("Invalid symbol in string given to Complement().") } } return dna2 }
We can now test Complement()
function on a short input string by compiling and running our code with the following added to func main()
. (To avoid a compiler error, you will need to briefly comment out your ReverseComplement()
function because we have yet to implement Reverse()
.)
package main import ( "fmt" "strconv" ) func main() { fmt.Println("Strings.") // ... dna := "ACCGAT" fmt.Println(Complement(dna)) // should print "TGGCTA" }
For the precocious: optimizing Complement()
Our current implementation of Complement()
is now correct, but it suffers from an efficiency flaw that will prevent it from being useful on large inputs. In particular, each time that we call a statement like dna2 += "T"
, because dna2 is read-only, a new string is created that is equal to dna2 + "T"
, and the variable dna2
is updated to be equal to this string. For long input strings dna
, as we progress in the for loop, the string being copied gets longer and longer, slowing down our code.
This Go quirk can be frustrating to newcomers because our current implementation of Complement()
is so intuitive. Yet we can address this issue using a common workaround. Since a string is a read-only slice of byte
variables, rather than declaring dna2
as a string, we will declare it as a slice of byte
variables, which will allow us to edit the slice. We can then use the built-in function string()
to type cast this slice of bytes to a string.
//Complement takes as input a string of DNA symbols. //It panics if any symbol is not A, C, G, or T. //Otherwise, it returns the string whose i-th symbol is the //complementary nucleotide of the i-th symbol of the input string. func Complement(dna string) string { dna2 := make([]byte, len(dna)) for i, symbol := range dna { switch symbol { case 'A': dna2[i] = 'T' case 'C': dna2[i] = 'G' case 'G': dna2[i] = 'C' case 'T': dna2[i] = 'A' default: panic("Invalid symbol in string given to Complement().") } } return string(dna2) }
Implementing Reverse()
We are now ready to implement Reverse()
, a function that takes a string as input and that returns the result of reversing all of the input string’s symbols.
Exercise: Before we continue, practice what you have learned by attempting to implementReverse()
yourself.
We could range a counter variable i
starting at either the left or the right side of the input string s
; we will choose the left side because it will allow us to use the range
keyword. Also, unlike Complement()
, our Reverse()
function should work for an arbitrary input string, not just a string comprising DNA symbols.
Our implementation of Reverse()
, which uses the less efficient approach of building a new string rev
through repeated concatenations, is shown below. We also need to be careful with ranging. Letting n
denote the length of s
, we want to set rev[0]
equal to s[n-1]
, rev[1]
equal to s[n-2]
, rev[2]
equal to s[n-3]
, and so on. For an arbitrary i
, we set rev[i]
equal to s[n-i-1]
.
//Reverse takes as input a string and returns the string //formed by reversing all the symbols of the input string. func Reverse(s string) string { rev := "" for i := range s { rev += string(s[len(s)-i-1]) } return rev }
Note: This function offers a simple illustration of the need for strong programmers to have strong foundational quantitative skills. The control flow ofReverse()
is straightforward, but appreciating how to establish a formula for which index ofs
to consider requires a mathematics education that is based on noticing patterns and solving problems as opposed to rote memorization. We leave this topic of conversation to another time. For now, we will note that even though this is a course about programming computers, strong programmers use pencil and paper or the electronic equivalent (here, to write out the indices that we are considering at each point in time) to help themselves notice patterns and solve problems.
We also can write a more memory efficient version of Reverse()
that uses the trick of first generating a slice of byte
variables, and then after editing this slice, converting it to a string.
//Reverse takes as input a string and returns the string //formed by reversing all the symbols of the input string. func Reverse(pattern string) string { n := len(pattern) rev := make([]byte, n) for i := range pattern { rev[i] = pattern[n-i-1] } return string(rev) }
Putting it all together, and a final point about modularity
Since we have already written ReverseComplement()
, we are now ready to test it in addition to our function Reverse()
. We can do so by compiling and running our program after adding the following code in func main()
.
package main import ( "fmt" "strconv" ) func main() { fmt.Println("Strings.") // ... dna := "ACCGAT" fmt.Println(Complement(dna)) // should print "TGGCTA" fmt.Println(Reverse(dna)) // should print "TAGCCA" fmt.Println(ReverseComplement(dna)) // should print "ATCGGT" }
Testing our functions in this way illustrates one more benefit of writing modular code, which is that it is easy to test. By passing the work of reverse complementing a string to two subroutines, we can test and debug our code by first testing each of these subroutines, so that once these functions have been tested, we can be nearly certain that ReverseComplement()
is correct.
Check your work from the code along
We provide autograders in the window below (or via a direct link) allowing you to check your work for the following functions:
Complement()
Reverse()
ReverseComplement()