Massimo Artizzu

Posted on Sep 4, 2021 • Edited on Sep 8, 2021

Let's develop a QR Code Generator, part VIII: different sizes

#javascript #tutorial #qrcode

At this point in the series, we know how to creare a QR Code with numeric, alphanumeric, ISO-8859-1 or Kanji data. But we've created only version 2 QR Codes, meaning that our content is quite limited in size. So let's see how to overcome this.

As we've said in part 1, the version of the code gives its size, as the code matrix will be a square of 17 + version * 4 modules. Part of this square is occupied by either fixed patterns, or reserved space for format information.

Let's have a look at what we're talking about:

So we have:

three finder patterns (in green), as 8×8 module patterns (including separator lines): these are fixed;
alignment patterns (in blue), 5×5 module patterns that vary in number (they are n² - 3, where n depends on the version);
timing patterns (in red): lines that connect the finder patterns, and as such their length depends on the QR Code version;
a dark module (in olive, fixed);
error level and mask information (in purple): two 15-module sequences (fixed);
version format information (in orange); two 6×3 areas adjacent to the top right and bottom left finder patterns (fixed, but present only from version 7 and above).

The content of last two areas have to be generated, but we don't know how to do it for the second one yet (we'll see that later). The main point of variability is the number of alignment patterns in the matrix.

Placing the alignment patterns

As we said, there are n² - 3 alignment patterns in a QR Code, except for version 1 that has no such patterns. The -3 part is because they'd be placed over the finder patterns, as you can see in the figure above. But how do we know this n?

Basically, if v is the version number, it's n = floor(v / 7) + 2, so versions 2 to 6 have 2² - 3 = 1 alignment patterns, versions 7-13 have 3² - 3 = 6, versions 14-20 have 4² - 3 = 13 and so on.

Now the question is: how do we place them in the QR Code's matrix?

As we can realize from the previous figure, they are placed on the vertexes of a grid, and this grid is symmetrical relatively to its main diagonal. This means we only need to know the position of one set of its tracks (either the rows or the columns).

For example, a version 2 QR Code has its alignment patterns on tracks 6 and 18 (nevermind we can only see one); while a version 7 has them on tracks 6, 22 and 38. These values are 0-based (the first row and column have index 0), and refer to the center of the pattern.

Here's the algorithm:

the first track is always 6;
the last track is always 6 from the end;
for the tracks in the middle, do the following:
- get the difference between the last and the first tracks above, then divide by n - 1;
- take the even number greater or equal to the quotient above;
- place the middle tracks counting from the last one.

We need to take an even number because, as you might see from the figure from before, the alignment patterns must match the timing patterns, i.e. you can't have an alignment pattern placed on an odd row/column.

Example: for version 22, n is Math.floor(22 / 7) + 2, so it's 5. A version 22 QR Code is 17 + 22 * 4 = 105 modules wide, so the fifth and last track is 98. For the other two:

the difference between the last and first track is 92;
92 / 4 = 23, so the next even number is 24;
therefore, the fourth track is 98 - 24 = 74, the third is 74 - 24 = 50 and the second is 50 - 24 = 26.

In code

The following function basically executes the above steps:



function getAlignmentTracks(version) {
  if (version === 1) {
    return [];
  }
  const intervals = Math.floor(version / 7) + 1;
  const distance = 4 * version + 4; // between first and last pattern
  const step = Math.ceil(distance / intervals / 2) * 2;
  return [6].concat(Array.from(
    { length: intervals },
    (_, index) => distance + 6 - (intervals - 1 - index) * step)
  );
}

Note that the pure function above should be used with just 40 different values, so we can safely memoize it, or precompute all the values beforehand and store them in a constant array. Or even copy a table from around the web (e.g. this well known library).

How many codewords?

Once the alignment pattern matter is solved, we can get to know how much actual space there is in a QR Code, i.e. codewords that can be used to store data and error correction information.

As we've said, version 1 has no alignment pattern, so the amount of available modules is:

21² (441, where 21 is the size of the QR Code)
- 3⋅8⋅8 (192, for 3 finder patterns)
- 2⋅5 (10, the timing patterns)
- 1 (the dark module)
- 2⋅15 (30, the error level and mask information)

for a total of 208, i.e. 26 codewords.

For larger versions, we have to compute this (let v the version number and n the number of alignment pattern coordinates):

v² (total modules)
- 3⋅8⋅8 (finder patterns)
- (n² - 3)⋅5 (alignment patterns)
- 2⋅(4‍v + 1) (timing patterns)
+ 2⋅(n - 2)⋅5 (readding the intersection of alignment and timing patterns)
- 1 (dark module)
- 2⋅3⋅6 (format data, only if v > 6)

In code

We just need to do the above:



function getAvailableModules(version) {
  if (version === 1) {
    return 21 * 21 - 3 * 8 * 8 - 2 * 15 - 1 - 2 * 5;
  }
  const alignmentCount = Math.floor(version / 7) + 2;
  return (version * 4 + 17) ** 2
    - 3 * 8 * 8
    - (alignmentCount ** 2 - 3) * 5 * 5
    - 2 * (version * 4 + 1)
    + (alignmentCount - 2) * 5 * 2
    - 2 * 15
    - 1
    - (version > 6 ? 2 * 3 * 6 : 0);
}

You can simplify the above return statement or let the compiler do that for you (I got down to 16 * (version + 4) ** 2 - (5 * alignmentCount - 1) ** 2 - (version > 6 ? 172 : 136)).

Just like getAlignmentTracks, also this function can be memoized/use to precomputation/replaced with a table.

How many data codewords?

The main question is, though, is get to know how many of those codewords are reserved for data - and conversely how many for error correction.

The problem here is that I haven't found, nor derived, any exact formula to determine that. Remember the error correction table we've seen in part 1 and 3?

Level	Letter	Data recovery
Low	L	~7%
Medium	M	~15%
Quartile	Q	~25%
High	H	~30%

But we just can't take those percentages and derive the amount of error correction codewords back. The original specification reports this formula (from the Reed-Solomon error correction algorithm):

e + 2‍t ≤ d - p

where:

e = number of erasures (i.e. single errors at known locations);
t = number of errors (i.e. recoverable codewords);
d = number of error correction codewords;
p = number of misdecode protection codewords (generally 0, except for smaller QR Codes),

meaning that d error correction codewords can correct at most d/2 unreadable codewords.

But other than that, it just reports a table where we can just take the amount of error correction codewords, and that's it (you can get it from here, for example). If you compute the "recovery capacity" for each version and error level, you'll see those percentages being 2-3% off the values from the table.

For example, our case of a version 2 QR Code with quartile error level has 22 error correction codewords, meaning a recovery capacity of 11… which is exactly 25% of all the codewords. But it's a rare case.

If you take a version 6 QR Code, still with quartile error correction level, it can recover at most 4*24/2 = 48 codewords out of 172, which is ~27.9%. If you reserve only 88 codewords for error correction instead of 96, you'd have a recovery capacity of ~25.5% (closer to 25%) and 8 more codewords for data. I don't know why they chose otherwise.

Anyway, let's see how to structure a QR Code for larger versions, because it's not as straightforward as it was for version 2…

Codeword blocks

As version grows, the number of total codewords grows too (more or less quadratically). The spec developers decided it was wiser to split the message into several blocks of varying amounts of codewords. Each block has its own data and error correction codewords.

Moreover, not every block has the same amount of codewords, but they're divided in two groups instead: one with blocks of n codewords, the other with block with n + 1 codewords. But for every block the number of error correction codewords is the same, so it's the number of data codewords that has a difference of 1 between blocks of different groups.

Splitting the total set of codewords into blocks happens as soon as version 3, while you'd get two groups in version 5. The main goal is having the number of error correction codewords in each block to be at most 30, while splitting into groups is just for parity.

But let's cut to the point, and see the actual table:

Version and EC level	EC codewords/block	Group 1 blocks	Data codewords in G1 blocks	Group 2 blocks	Data codewords in G2 blocks
1-L	7	1	19
1-M	10	1	16
1-Q	13	1	13
1-H	17	1	9
2-L	10	1	34
2-M	16	1	28
2-Q	22	1	22
2-H	28	1	16
3-L	15	1	55
3-M	26	1	44
3-Q	18	2	17
3-H	22	2	13
4-L	20	1	80
4-M	18	2	32
4-Q	26	2	24
4-H	16	4	9
5-L	26	1	108
5-M	24	2	43
5-Q	18	2	15	2	16
5-H	22	2	11	2	12
6-L	18	2	68
6-M	16	4	27
6-Q	24	4	19
6-H	28	4	15
7-L	20	2	78
7-M	18	4	31
7-Q	18	2	14	4	15
7-H	26	4	13	1	14
8-L	24	2	97
8-M	22	2	38	2	39
8-Q	22	4	18	2	19
8-H	26	4	14	2	15
9-L	30	2	116
9-M	22	3	36	2	37
9-Q	20	4	16	4	17
9-H	24	4	12	4	13
10-L	18	2	68	2	69
10-M	26	4	43	1	44
10-Q	24	6	19	2	20
10-H	28	6	15	2	16
11-L	20	4	81
11-M	30	1	50	4	51
11-Q	28	4	22	4	23
11-H	24	3	12	8	13
12-L	24	2	92	2	93
12-M	22	6	36	2	37
12-Q	26	4	20	6	21
12-H	28	7	14	4	15
13-L	26	4	107
13-M	22	8	37	1	38
13-Q	24	8	20	4	21
13-H	22	12	11	4	12
14-L	30	3	115	1	116
14-M	24	4	40	5	41
14-Q	20	11	16	5	17
14-H	24	11	12	5	13
15-L	22	5	87	1	88
15-M	24	5	41	5	42
15-Q	30	5	24	7	25
15-H	24	11	12	7	13
16-L	24	5	98	1	99
16-M	28	7	45	3	46
16-Q	24	15	19	2	20
16-H	30	3	15	13	16
17-L	28	1	107	5	108
17-M	28	10	46	1	47
17-Q	28	1	22	15	23
17-H	28	2	14	17	15
18-L	30	5	120	1	121
18-M	26	9	43	4	44
18-Q	28	17	22	1	23
18-H	28	2	14	19	15
19-L	28	3	113	4	114
19-M	26	3	44	11	45
19-Q	26	17	21	4	22
19-H	26	9	13	16	14
20-L	28	3	107	5	108
20-M	26	3	41	13	42
20-Q	30	15	24	5	25
20-H	28	15	15	10	16
21-L	28	4	116	4	117
21-M	26	17	42
21-Q	28	17	22	6	23
21-H	30	19	16	6	17
22-L	28	2	111	7	112
22-M	28	17	46
22-Q	30	7	24	16	25
22-H	24	34	13
23-L	30	4	121	5	122
23-M	28	4	47	14	48
23-Q	30	11	24	14	25
23-H	30	16	15	14	16
24-L	30	6	117	4	118
24-M	28	6	45	14	46
24-Q	30	11	24	16	25
24-H	30	30	16	2	17
25-L	26	8	106	4	107
25-M	28	8	47	13	48
25-Q	30	7	24	22	25
25-H	30	22	15	13	16
26-L	28	10	114	2	115
26-M	28	19	46	4	47
26-Q	28	28	22	6	23
26-H	30	33	16	4	17
27-L	30	8	122	4	123
27-M	28	22	45	3	46
27-Q	30	8	23	26	24
27-H	30	12	15	28	16
28-L	30	3	117	10	118
28-M	28	3	45	23	46
28-Q	30	4	24	31	25
28-H	30	11	15	31	16
29-L	30	7	116	7	117
29-M	28	21	45	7	46
29-Q	30	1	23	37	24
29-H	30	19	15	26	16
30-L	30	5	115	10	116
30-M	28	19	47	10	48
30-Q	30	15	24	25	25
30-H	30	23	15	25	16
31-L	30	13	115	3	116
31-M	28	2	46	29	47
31-Q	30	42	24	1	25
31-H	30	23	15	28	16
32-L	30	17	115
32-M	28	10	46	23	47
32-Q	30	10	24	35	25
32-H	30	19	15	35	16
33-L	30	17	115	1	116
33-M	28	14	46	21	47
33-Q	30	29	24	19	25
33-H	30	11	15	46	16
34-L	30	13	115	6	116
34-M	28	14	46	23	47
34-Q	30	44	24	7	25
34-H	30	59	16	1	17
35-L	30	12	121	7	122
35-M	28	12	47	26	48
35-Q	30	39	24	14	25
35-H	30	22	15	41	16
36-L	30	6	121	14	122
36-M	28	6	47	34	48
36-Q	30	46	24	10	25
36-H	30	2	15	64	16
37-L	30	17	122	4	123
37-M	28	29	46	14	47
37-Q	30	49	24	10	25
37-H	30	24	15	46	16
38-L	30	4	122	18	123
38-M	28	13	46	32	47
38-Q	30	48	24	14	25
38-H	30	42	15	32	16
39-L	30	20	117	4	118
39-M	28	40	47	7	48
39-Q	30	43	24	22	25
39-H	30	10	15	67	16
40-L	30	19	118	6	119
40-M	28	18	47	31	48
40-Q	30	34	24	34	25
40-H	30	20	15	61	16

To read these values: a version 38 QR Code with high error correction level has its data codewords split in two groups. The first group has 42 blocks of 15 codewords each, and the second has 32 blocks of 16 codewords. For each of these blocks, there's a error correction block of 30 codewords.

As a practical example, let's create a QR Code for the string https://en.wikipedia.org/wiki/QR_code#Error_correction (still byte content, for simplicity), adding a quartile error correction level. We need at least a version 5 QR Code for that.

According to the table above, we'll need to split the data codewords into 2 blocks of 15 codewords, then other 2 blocks of 16 codewords each (for 62 codewords in total for data). Using the getData function from the last part, we get:



> getData('https://en.wikipedia.org/wiki/QR_code#Error_correction', 8, 62)
< Uint8Array(62) [67, 102, 135, 71, 71, 7, 51, 162, 242, 246, 86, 226, 231, 118, 150, 182, 151, 6, 86, 70, 150, 18, 230, 247, 38, 114, 247, 118, 150, 182, 146, 245, 21, 37, 246, 54, 246, 70, 82, 52, 87, 39, 38, 247, 37, 246, 54, 247, 39, 38, 86, 55, 70, 150, 246, 224, 236, 17, 236, 17, 236, 17]

These codewords should be split like this:

Block	Data codewords
G1-B1	`67 102 135 71 71 7 51 162 242 246 86 226 231 118 150`
G1-B2	`182 151 6 86 70 150 18 230 247 38 114 247 118 150 182`
G2-B1	`146 245 21 37 246 54 246 70 82 52 87 39 38 247 37 246`
G2-B2	`54 247 39 38 86 55 70 150 246 224 236 17 236 17 236 17`

In the next part, we'll see how to actually place all these information (plus the error correction, and something more) inside the QR Code matrix. See you soon! 👋

DEV Community

Let's develop a QR Code Generator, part VIII: different sizes

Placing the alignment patterns

In code

How many codewords?

In code

How many data codewords?

Codeword blocks

Top comments (0)