Roberto Preste

Posted on • Originally published at Medium

Phred quality score

Next Generation Sequencing techniques have brought new insights into -omics data analysis, mostly thanks to their reliability in detecting biological variants. This reliability is usually measured using a value called Phred quality score (or Q score).

The Phred score of a base is an integer value that represents the estimated probability of an error in base calling. Mathematically, a Q score is logarithmically related to the base-calling error probabilities P, and can be calculated using the following formula:

Q = -10 log10 P

In the real world, a quality score of 20 means that there is a possibility in 100 that the base in incorrect; a quality score of 40 means the chances that the base is called incorrectly is 1 in 10000.

The Phred score is also inversely related to the base call accuracy, thus a higher Q score means a more reliable base call. Here is a useful table which shows this simple relationship:

Phred Quality Score Incorrect base call prob Base call accuracy
10 1 in 10 90%
20 1 in 100 99%
30 1 in 1000 99.9%
40 1 in 10000 99.99%

In fastq files, Phred quality scores are usually represented using ASCII characters, such that the quality score of each base can be specified using a single character. While older Illumina data used to apply the ASCII_BASE 64, nowadays the ASCII_BASE 33 table has been universally adopted for NGS data:

Q Score ASCII char Q Score ASCII char Q Score ASCII char Q Score ASCII char
0 ! 11 , 22 7 32 A
1 " 12 - 23 8 33 B
2 # 13 . 24 9 34 C
3 \$ 14 / 25 : 35 D
4 % 15 0 26 ; 36 E
5 & 16 1 27 < 37 F
6 ' 17 2 28 = 38 G
7 ( 18 3 29 > 39 H
8 ) 19 4 30 ? 40 I
9 * 20 5 31 @ 41 J
10 + 21 6

Even though there are lots of Python, Biopython and stand-alone softwares for dealing with Phred quality scores, a simple command to convert an ASCII character to its correspondent quality score is the following (from the terminal):

``````python3 -c 'print(ord("<ASCII>")-33)'
``````

Or, when working in a Python3 session:

``````print(ord("<ASCII>")-33)
``````

In both cases, just replace `<ASCII>` with the actual ASCII character and that will do the trick.