DEV Community

Cover image for Let's develop a QR Code Generator, part IX: structuring larger versions
Massimo Artizzu
Massimo Artizzu

Posted on

Let's develop a QR Code Generator, part IX: structuring larger versions

The cover image will make sense later, I swear! πŸ˜…

In the last part, we came to know how to split our data and error correction codewords for QR Codes of larger versions. But how can we choose the right version for our content?

QR Code capacity

The answer lies in that big table we've seen. Thanks to that, we can get to know how many codewords are reserved for data for a given version and error correction level and, given the encoding mode, compute the maximum length of the content we can write.

Let's have a look at the columns of that table:

  1. Number of error correction codewords per block
  2. Number of blocks in Group 1
  3. Number of data codewords in blocks of Group 1
  4. Number of blocks in Group 2
  5. Number of data codewords in blocks of Group 2

Let's recall that the data codewords of a QR Code are split into blocks, and each block belong to group 1 or 2 depending on their size. For each data block there's an error correction block.

We also know that:

  • the value in (5) is just the value in (3) plus 1;
  • the value in (3) is actually the number of data codewords divided by (2) + (4) (i.e., the total number of blocks), rounded down to the previous integer;
  • the number of data codewords is the total number of codewords minus the number of error correction codewords;
  • the number of error correction codewords is (1) multiplied by the number of blocks;
  • (4) is actually the number of data codewords modulo the number of blocks.

In order to get the total number of codewords, we can use our function getAvailableModules from part 8 and divide the result by 8 (or shift to the right by 3).

In the end, for every version and error level, we just need two values:

  • the number of error correction codewords per block;
  • the number of blocks.

In the end, this should be our table:

const EC_TABLE = [
  { L: [7, 1],   M: [10, 1],  Q: [13, 1],  H: [17, 1] },
  { L: [10, 1],  M: [16, 1],  Q: [22, 1],  H: [28, 1] },
  { L: [15, 1],  M: [26, 1],  Q: [18, 2],  H: [22, 2] },
  { L: [20, 1],  M: [18, 2],  Q: [26, 2],  H: [16, 4] },
  { L: [26, 1],  M: [24, 2],  Q: [18, 4],  H: [22, 4] },
  { L: [18, 2],  M: [16, 4],  Q: [24, 4],  H: [28, 4] },
  { L: [20, 2],  M: [18, 4],  Q: [18, 6],  H: [26, 5] },
  { L: [24, 2],  M: [22, 4],  Q: [22, 6],  H: [26, 6] },
  { L: [30, 2],  M: [22, 5],  Q: [20, 8],  H: [24, 8] },
  { L: [18, 4],  M: [26, 5],  Q: [24, 8],  H: [28, 8] },
  { L: [20, 4],  M: [30, 5],  Q: [28, 8],  H: [24, 11] },
  { L: [24, 4],  M: [22, 8],  Q: [26, 10], H: [28, 11] },
  { L: [26, 4],  M: [22, 9],  Q: [24, 12], H: [22, 16] },
  { L: [30, 4],  M: [24, 9],  Q: [20, 16], H: [24, 16] },
  { L: [22, 6],  M: [24, 10], Q: [30, 12], H: [24, 18] },
  { L: [24, 6],  M: [28, 10], Q: [24, 17], H: [30, 16] },
  { L: [28, 6],  M: [28, 11], Q: [28, 16], H: [28, 19] },
  { L: [30, 6],  M: [26, 13], Q: [28, 18], H: [28, 21] },
  { L: [28, 7],  M: [26, 14], Q: [26, 21], H: [26, 25] },
  { L: [28, 8],  M: [26, 16], Q: [30, 20], H: [28, 25] },
  { L: [28, 8],  M: [26, 17], Q: [28, 23], H: [30, 25] },
  { L: [28, 9],  M: [28, 17], Q: [30, 23], H: [24, 34] },
  { L: [30, 9],  M: [28, 18], Q: [30, 25], H: [30, 30] },
  { L: [30, 10], M: [28, 20], Q: [30, 27], H: [30, 32] },
  { L: [26, 12], M: [28, 21], Q: [30, 29], H: [30, 35] },
  { L: [28, 12], M: [28, 23], Q: [28, 34], H: [30, 37] },
  { L: [30, 12], M: [28, 25], Q: [30, 34], H: [30, 40] },
  { L: [30, 13], M: [28, 26], Q: [30, 35], H: [30, 42] },
  { L: [30, 14], M: [28, 28], Q: [30, 38], H: [30, 45] },
  { L: [30, 15], M: [28, 29], Q: [30, 40], H: [30, 48] },
  { L: [30, 16], M: [28, 31], Q: [30, 43], H: [30, 51] },
  { L: [30, 17], M: [28, 33], Q: [30, 45], H: [30, 54] },
  { L: [30, 18], M: [28, 35], Q: [30, 48], H: [30, 57] },
  { L: [30, 19], M: [28, 37], Q: [30, 51], H: [30, 60] },
  { L: [30, 19], M: [28, 38], Q: [30, 53], H: [30, 63] },
  { L: [30, 20], M: [28, 40], Q: [30, 56], H: [30, 66] },
  { L: [30, 21], M: [28, 43], Q: [30, 59], H: [30, 70] },
  { L: [30, 22], M: [28, 45], Q: [30, 62], H: [30, 74] },
  { L: [30, 24], M: [28, 47], Q: [30, 65], H: [30, 77] },
  { L: [30, 25], M: [28, 49], Q: [30, 68], H: [30, 81] }
];
Enter fullscreen mode Exit fullscreen mode

Using this table, we can compute the amount of codewords reserved for data:

function getDataCodewords(version, errorLevel) {
  const totalCodewords = getAvailableModules(version) >> 3;
  const [blocks, ecBlockSize] = EC_TABLE[version - 1][errorLevel];
  return totalCodewords - blocks * ecBlockSize;
}
Enter fullscreen mode Exit fullscreen mode

This is not enough, though, because part of these data codewords are reserved for:

  • the encoding mode block;
  • the content length. While the former always take 4 bits/modules, the latter is variable in bit length, so we'll use the function getLengthBits that we've created back in part 2.

In the end we have a certain amount of available bits, but as we've seen in part 7, each encoding mode uses those bits differently.

Let's imagine we have 4 different functions (one for each encoding mode) that, given a certain amount of bits, returns the length of the content that can be cointained in those bits for a certain encoding mode:

const capacityFnMap = {
  [0b0001]: getNumericCapacity,
  [0b0010]: getAlphanumericCapacity,
  [0b0100]: getByteCapacity,
  [0b1000]: getKanjiCapacity
};
Enter fullscreen mode Exit fullscreen mode

We'll end up with something like this:

function getCapacity(version, errorLevel, encodingMode) {
  const dataCodewords = getDataCodewords(version, errorLevel);
  const lengthBits = getLengthBits(encodingMode, version);
  const availableBits = (dataCodewords << 3) - lengthBits - 4;
  return capacityFnMap[encodingMode](availableBits);
}
Enter fullscreen mode Exit fullscreen mode

Again, this is a pure function that we can memoize, but we can also precompute a table that we can use later.

Numeric mode capacity

As we've seen in part 7, we can store 3 digits in 10 bits, two digits in 7 and one digit in 4. Se we need to compute the bits modulo 10 and add the remainder digits at the end:

function getNumericCapacity(availableBits) {
  const remainderBits = availableBits % 10;
  return Math.floor(availableBits / 10) * 3 +
    (remainderBits > 6 ? 2 : remainderBits > 3 ? 1 : 0);
}
Enter fullscreen mode Exit fullscreen mode

Alphanumeric mode capacity

Similarly to numeric mode, we can store two characters in 11 bits and one in 6:

function getAlphanumericCapacity(availableBits) {
  return Math.floor(availableBits / 11) * 2 +
    (availableBits % 11 > 5 ? 1 : 0);
}
Enter fullscreen mode Exit fullscreen mode

Byte mode capacity

This is easy, as 1 character = 8 bits, flat.

function getByteCapacity(availableBits) {
  return availableBits >> 3;
}
Enter fullscreen mode Exit fullscreen mode

Kanji mode capacity

This is also easy, as each pictogram needs 13 bits:

function getKanjiCapacity(availableBits) {
  return Math.floor(availableBits / 13);
}
Enter fullscreen mode Exit fullscreen mode

The best QR Code

Now we've got everything to know which version we must choose for our content: we aim for the smallest version and highest error correction possible. The only additional complexity may come from the fact that we want a certain minimum error correction level.

For example, if we have a 54-digit long number (like the 10th perfect number), we could use a version 2 QR Code with medium error correction (as getCapacity(2, 'M') === 63), but if we want a high correction we have to use version 3 (since getCapacity(3, 'H') === 58).

So we can use something like this:

function getVersionAndErrorLevel(encodingMode, contentLength, minErrorLevel = 'L') {
  // The error levels we're going to consider
  const errorLevels = 'HQML'.slice(0, 'HQML'.indexOf(minErrorLevel) + 1);
  for (let version = 1; version <= 40; version++) {
    // You can iterate over strings in JavaScript 😁
    for (const errorLevel of errorLevels) {
      const capacity = getCapacity(version, errorLevel, encodingMode);
      if (capacity >= contentLength) {
        return [version, errorLevel];
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

If it doesn't return anything, it means the content is too long.

Shuffling the codewords!

Let's suppose we have to encode… a snippet of JavaScript code, for a change:

['give you up','let you down','run around and desert you'].map(x=>'Never gonna '+x)
Enter fullscreen mode Exit fullscreen mode

It's 83 bytes long, but we want a QR Code with quartile error correction at minimum. We get getVersionAndErrorLevel(0b0100, 83, 'Q') === [7, 'Q'], so we're going to need a version 7 QR Code.

We also know that getDataCodewords(7, 'Q') === 88, and we'll have to split these 88 codewords reserved for data into 2 blocks of 14 codewords (group 1), then other 4 blocks of 15 codewords each (group 2). Using the getData function from the last part, we get:

> getData(snippet, 8, 88)
< Uint8Array(88)Β [69, 53, 178, 118, 118, 151, 102, 82, 7, 150, 247, 82, 7, 87, 2, 114, 194, 118, 198, 87, 66, 7, 150, 247, 82, 6, 70, 247, 118, 226, 114, 194, 119, 39, 86, 226, 6, 23, 38, 247, 86, 230, 66, 6, 22, 230, 66, 6, 70, 87, 54, 87, 39, 66, 7, 150, 247, 82, 117, 210, 230, 214, 23, 2, 135, 131, 211, 226, 116, 230, 87, 102, 87, 34, 6, 118, 246, 230, 230, 18, 2, 114, 183, 130, 144, 236, 17, 236]
Enter fullscreen mode Exit fullscreen mode

These codewords should be split like this (hex values):

Block Bytes
G1-B1 45 35 B2 76 76 97 66 52 07 96 F7 52 07 57
G1-B2 02 72 C2 76 C6 57 42 07 96 F7 52 06 46 F7
G2-B1 76 E2 72 C2 77 27 56 E2 06 17 26 F7 56 E6 42
G2-B2 06 16 E6 42 06 46 57 36 57 27 42 07 96 F7 52
G2-B3 75 D2 E6 D6 17 02 87 83 D3 E2 74 E6 57 66 57
G2-B4 22 06 76 F6 E6 E6 12 02 72 B7 82 90 EC 11 EC

Now, instead of placing them one after the other, we take the first codewords from each block (first from group 1, then group 2), then the second codewords, and so on, until the 15th codewords, which are followed by the 16th codewords of the blocks of group 2. In short, we need to interleave the blocks. In the end, we'll end up with this sequence:

45 02 76 06 75 22 35 72 ... 57 EC 57 F7 E6 F7 66 11 42 52 57 EC
Enter fullscreen mode Exit fullscreen mode

In code

We can either modify getData, or keep it as it as but we'll need another helper function to reorder the codewords we got. This function should take:

  • the codewords returned from getData;
  • the number of blocks we should use to split the data.

Something like this:

function reorderData(data, blocks) {
  /** Codewords in data blocks (in group 1) */
  const blockSize = Math.floor(data.length / blocks);
  /** Blocks in group 1 */
  const group1 = blocks - data.length % blocks;
  /** Starting index of each block inside `data` */
  const blockStartIndexes = Array.from(
    { length: blocks },
    (_, index) => index < group1
      ? blockSize * index
      : (blockSize + 1) * index - group1
  );
  return Uint8Array.from(data, (_, index) => {
    /** Index of the codeword inside the block */
    const blockOffset = Math.floor(index / blocks);
    /** Index of the block to take the codeword from
      If we're at the end (`blockOffset === blockSize`), then we take
      only from the blocks of group 2 */
    const blockIndex = (index % blocks)
      + (blockOffset === blockSize ? group1 : 0);
    /** Index of the codeword inside `data` */
    const codewordIndex = blockStartIndexes[blockIndex] + blockOffset;
    return data[codewordIndex];
  });
}
Enter fullscreen mode Exit fullscreen mode

This function is supposed to be used like this:

const rawData = getData(snippet, 8, 88);
const orderedData = reorderData(rawData, 6);
Enter fullscreen mode Exit fullscreen mode

Error correction

The error correction part is similar to the data part, in that also error correction codewords are split into blocks. It's just a little easier because all the error correction blocks have the same size.

So, for a 7-Q QR Code, the table above says we have 18 codewords for each error correction block. These blocks are computed using the respective data block. So, the first error correction block is comprised by the error correction codewords for the codewords of the first data block of group 1. Basically, it's this:

const rawData = getData(snippet, 8, 88);
const firstBlock = rawData.subarray(0, 14);
// => 69 53 178 118 118 151 102 82 7 150 247 82 7 87
const firstECBlock = getEDC(firstBlock, 14 + 18);
// => 63 102 26 192 65 106 117 90 107 88 138 42 103 127 227 86 189 1
Enter fullscreen mode Exit fullscreen mode

The final part consists in interleaving the error correction blocks, and we're done.

In code

Given the instruction above, we can come up with the following helper function that wraps and replaces the old getEDC:

function getECData(data, blocks, ecBlockSize) {
  /** Codewords in data blocks (in group 1) */
  const dataBlockSize = Math.floor(data.length / blocks);
  /** Blocks in group 1 */
  const group1 = blocks - data.length % blocks;
  const ecData = new Uint8Array(ecBlockSize * blocks);
  for (let offset = 0; offset < blocks; offset++) {
    const start = offset < group1
      ? dataBlockSize * offset
      : (dataBlockSize + 1) * offset - group1;
    const end = start + dataBlockSize + (offset < group1 ? 0 : 1);
    const dataBlock = data.subarray(start, end);
    const ecCodewords = getEDC(dataBlock, dataBlock.length + ecBlockSize);
    // Interleaving the EC codewords: we place one every `blocks`
    ecCodewords.forEach((codeword, index) => {
      ecData[index * blocks + offset] = codeword;
    });
  }
  return ecData;
}
Enter fullscreen mode Exit fullscreen mode

For our example, we should get the following result:

const rawData = getData(snippet, 8, 88);
const ecData = getECData(rawData, 6, 18);
// => 63 55 231 201 50 250 102 104 ... 7 15 1 181 202 64 199 23
Enter fullscreen mode Exit fullscreen mode

for a total of 6*18 = 108 error correction codewords.

Wrapping everything up

So we have all that we need for data and error correction:

function getCodewords(content, minErrorLevel = 'L') {
  const encodingMode = getEncodingMode(content);
  const [version, errorLevel] = getVersionAndErrorLevel(
    encodingMode,
    content.length,
    minErrorLevel
  );
  const lengthBits = getLengthBits(encodingMode, version);

  const dataCodewords = getDataCodewords(version, errorLevel);
  const [ecBlockSize, blocks] = EC_TABLE[version - 1][errorLevel];

  const rawData = getData(content, lengthBits, dataCodewords);
  const data = reorderData(rawData, blocks);
  const ecData = getECData(rawData, blocks, ecBlockSize);

  const codewords = new Uint8Array(data.length + ecData.length);
  codewords.set(data, 0);
  codewords.set(ecData, data.length);

  return {
    codewords,
    version,
    errorLevel,
    encodingMode
  };
}
Enter fullscreen mode Exit fullscreen mode

The function above should return the codewords - both data and error correction - ready to be placed in our QR Code! πŸ™Œ

And we're… not done?

Unfortunately there's still a small step to do, and we're going to see it in the next part. We have to fix the functions that return the sequence of module placements in the matrix and that actually place the modules, then also add the format information areas.

See you then and happy coding! πŸ‘‹πŸŽ‰

Top comments (0)