## DEV Community

Marco A.

Posted on • Updated on

# Playing with JavaScript performances and DNA

Biology is an interesting world and bioinformatic is where computer science meets biology,
Today I will describe a simple yet interesting bioinformatic problem from an algorithmic prospective.: Calculate the reverse complement of DNA bases using Javascript.

In this article, I am using a bioinformatic problem because is fun and interesting, but I will be mostly talking about JavaScript performance.
We will

• Start describing how DNA works (with some big simplifications… Eih! I am not a biologist!),
• Propose some implementations, and then
• try to archive the best time performance, comparing the time for completing the task.

heads-up: A basic knowledge of JavaScript language is required to follow along.

### What’s the reverse complement?

Before explaining it, bear with me for a small tour of how DNA looks like.

The DNA helix is composed of two strands like in the image above.
A strand is a long sequence of this for letters ATGC ( each letter is a specific nucleotide Adenine, Thymidine, Guanidine, Cytidine ) in some order.
There is a specific relation between what is the first strand and what there is on the second strand: for each A in the first sequence there is a T on the other strand and vice versa, and for each G a C will be on the other strand.

The conversion from map DNA strand to complementary strang would be something like:

'A': 'T',
'G': 'C',
'T': 'A',
'C': 'G'

Here is an example:

I often hear these two sequences named 5’ to 3’’ ( 3’ end) and the second string is named 3’ to 5’ (5’’ end). The direction of reading is in both from 5’ to 3’’ and this means that a sequence is read from left to right but the other one (the complementary) is read from right to left.

In most of the file formats and web APIs that I worked with since the complementary DNA strand can be calculated from the first strand sequence, only one DNA strand is provided ( 3’ end) and is up to us to calculate the complementary.

## Now, we have enough for our small challenge:

### How can I generate a complementary strand?

Given an input:
`TTATACGACTCACTATAGGGAGACTTAAGAAG`
The expected output should look like this:
`CTTCTTAAGTCTCCCTATAGTGAGTCGTATAA`

Remember: we are reading the complementary in reverse order so the DNA sequence starts TT the complementary will end with AA.

Input:

TT ATACGACTCACTATAGGGAGACTTAAGAAG
Output:

CTTCTTAAGTCTCCCTATAGTGAGTCGTAT AA

Ok, let make the code talks for us:

``````const map = (sequence) => {
const map = {'A': 'T', 'T': 'A', 'G': 'C', 'C': 'G'}
return sequence.split('').reverse().map(bp => map[bp]).join("")
}
``````

This is “easy” to read and the steps are :

We take the input
`“TACGA”`
We separate each char and create an array
`[ ‘T’ , ’A’ , ’C’ , ’G’ , ’A’]`
Then map each char into his complementary
`[ ‘A’ , ’T’ , ’G’ , ’C’ , ’T’]`
We reverse
`[ ‘T’ , ’C’ , ’G’ , ’T’ , ’A’]`
And the join in a string
`“TCGTA”`

## That’s it… right?

In most cases, yes, but today we are a bit more stubborn and we will try to find the best performance time for this job.
Why? Well even a small bacterial DNA can range in size anywhere from 130 kbp to over 14 Mbp ( a bp is a single letter/Nucleotide) so being fast could be important.

Ok, we have the rules, now let me present our players :

Player1:
We just saw the map implementation, let call map,

``````const map = (sequence) => {
const map = {'A': 'T', 'T': 'A', 'G': 'C', 'C': 'G'}
return sequence.split('')
.reverse()
.map(bp => map[bp])
.join("")
}
``````

Player2: for loop and with if:

``````const forLoop = (sequence) => {
let complement = ''
for (let idx = 0; idx < sequence.length; idx++) {
if (sequence[idx] === 'A') {
complement = 'T' + complement
} else if (sequence[idx] === 'T') {
complement = 'A' + complement
} else if (sequence[idx] === 'G') {
complement = 'C' + complement
} else if (sequence[idx] === 'C') {
complement = 'G' + complement
}
}
return complement
}
``````

Player3: A for with a switch case:

``````const forSwitch = (sequence) => {
let complement = '';

for (let idx = 0, sL = sequence.length; idx < sL; idx++) {
switch (sequence[idx]) {
case 'A':
complement = 'T' + complement
break;
case 'T':
complement = 'A' + complement
break;
case 'G':
complement = 'C' + complement
break;
case 'C':
complement = 'G' + complement
break;
}
}
return complement
}
``````

We will run these implementations ( and some small variations, github for more details ), 10000 times on a 35752 long DNA sequence and record the best time, the worst time, and the overall average time.

This graph is not that easy to read, let me provide a table ordered by

Code Average Best Worst
For (optimized) with switch case 0.9446 0.4836 99258,00
For with multiple if 21564,00 0.5540 867263,00
For (optimized) with each if 11737,00 0.6480 98886,00
For with dictionary 15038,00 11097,00 83742,00
ForEach with dictionary 23381,00 17202,00 70510,00
Big Map with regular expression 29884,00 23477,00 103878,00
Map with dictionary 34595,00 26937,00 137978,00
Replace with dictionary 237074,00 51751,00 3951461,00

It looks like “replace with dictionary” is the worst in timing, and “optimized switch case” is the best.

Wrapping up,
In this implementation I can see that:

• The regular expression and the dictionary are slower than if and switch case
• For is the faster loop
• switch case wins on if else if
• The optimized of ‘for loop’ give some small improvements

### Bonus, (what optimized for means):

Maybe you already noted the ‘switch case’ implementation. During my review of this topic, I fell on this website ( https://browserdiet.com/ ) and learned something interesting about the for loop that I didn’t know.

``````for ([initialization]; [condition]; [final-expression]){
Statement
}
``````

Every time a ‘statement’ gets executed the condition block runs again.
This sounds clear, but also sequence.length will re-calculate each interaction, consuming more time, and this is bad!

And there is a simple solution,

We can instance a variable with the value of sequence.length
in the initialization block:

``````for (let idx = 0; idx < sequence.length; idx++) {
// sequence.length is calculated every interaction
}

for (let idx = 0, sL = sequence.length; idx < sL; idx++) {
// sequence.length is calculated only 1 time
}
``````

Thanks for taking the time to read this article and let me know any feedback, have a great day!

Tony B

Depending on how much memory you want to throw at the problem, you could also cache all the reversed combinations of X length DNA strings you come across, making it much faster over time.

``````
const mapDNA = (() => {
const map = {}
const lookup = (s) => {
const x = map[s];
if (x) return x;
let r = ''
for (let c of s) {
switch (c) {
case 'T': r = 'A' + r; break;
case 'A': r = 'T' + r; break;
case 'G': r = 'C' + r; break;
default: r = 'G' + r;
}
}
return map[s] = r;
}

return s => {
const cacheSize = 8;
let result = '';
for (let i = 0, l = s.length; i < l; i += cacheSize)
result = lookup(s.slice(i, i + cacheSize)) + result;

return result;
}
})();
``````

Marco A.

This is an interesting point of view. Thanks!

My main concern on this idea is how hard/easy you can to find repetition in a DNA sequence.

BTW I think is worth trying to run some comparisons.

Thanks for sharing