DEV Community

loading...
Cover image for Phred quality score

Phred quality score

robertopreste profile image Roberto Preste Originally published at Medium on ・2 min read

Next Generation Sequencing techniques have brought new insights into -omics data analysis, mostly thanks to their reliability in detecting biological variants. This reliability is usually measured using a value called Phred quality score (or Q score).

The Phred score of a base is an integer value that represents the estimated probability of an error in base calling. Mathematically, a Q score is logarithmically related to the base-calling error probabilities P, and can be calculated using the following formula:

Q = -10 log10 P

In the real world, a quality score of 20 means that there is a possibility in 100 that the base in incorrect; a quality score of 40 means the chances that the base is called incorrectly is 1 in 10000.

The Phred score is also inversely related to the base call accuracy, thus a higher Q score means a more reliable base call. Here is a useful table which shows this simple relationship:

Phred Quality Score Incorrect base call prob Base call accuracy
10 1 in 10 90%
20 1 in 100 99%
30 1 in 1000 99.9%
40 1 in 10000 99.99%

In fastq files, Phred quality scores are usually represented using ASCII characters, such that the quality score of each base can be specified using a single character. While older Illumina data used to apply the ASCII_BASE 64, nowadays the ASCII_BASE 33 table has been universally adopted for NGS data:

Q Score ASCII char Q Score ASCII char Q Score ASCII char Q Score ASCII char
0 ! 11 , 22 7 32 A
1 " 12 - 23 8 33 B
2 # 13 . 24 9 34 C
3 $ 14 / 25 : 35 D
4 % 15 0 26 ; 36 E
5 & 16 1 27 < 37 F
6 ' 17 2 28 = 38 G
7 ( 18 3 29 > 39 H
8 ) 19 4 30 ? 40 I
9 * 20 5 31 @ 41 J
10 + 21 6

Even though there are lots of Python, Biopython and stand-alone softwares for dealing with Phred quality scores, a simple command to convert an ASCII character to its correspondent quality score is the following (from the terminal):

python3 -c 'print(ord("<ASCII>")-33)'

Or, when working in a Python3 session:

print(ord("<ASCII>")-33)

In both cases, just replace <ASCII> with the actual ASCII character and that will do the trick.

Discussion (0)

Forem Open with the Forem app