Lucian Gruia for Ciklum CZ&SK

Posted on Mar 7, 2022 • Originally published at cngroup.dk

Blockchain main concepts

#blockchain #programming

Every day we hear about blockchain, and lately some convergent concepts have become buzzwords. But let’s look at their actual meanings.

Merkle tree

A Merkle tree is a fundamental concept of blockchain. It is a structure that allows secure verification and efficient content in a large body of data. It helps to verify the consistency and content of the data. [1] It summarises all the transactions in a block by creating a fingerprint of the entire set of transactions, which allows a user to verify whether a transaction is included in a block. [2]

A Merkle tree is created by repeatedly hashing pairs of notes until there is only one hash left, known as Root Hash or Merkle Root.

Using a Merkle Tree we are significantly reducing the data that a trusted authority has to store for verification purposes. It helps in verifying data consistency in an optimal way since we are not checking the real amount of data — so we are reducing the computational effort. Also, it is more optimal than creating a hash on all transactions, because the risk of collisions increases proportionally with the number of transactions. [3]

An example of a transaction is shown below. Instead of storing all hashes for all transactions, we are creating the hash of hashes and so on, till the root hash.

After we calculate a hash of the entire block, then we are taking into account the difficulty, which is set by the blockchain algorithm.

The important role of the Merkle tree is that it greatly compresses the amount of data we have to store. Otherwise, all nodes of a blockchain should keep a copy of every single transaction that has ever occurred and compare them line by line in order to validate the integrity of data.

This is actually separating the proof of data from the data itself.

Hashing

This is a function that converts an input item of any length into an output item of a fixed length. [4] The values returned by a hash function are called hash values, hash codes, digests, or simply hashes. [5]

Note that a hash function is not injective, not surjective and will return the same output every time you apply it to the same input, but using an output, you cannot know which was the original input.

Source: https://en.wikipedia.org/wiki/Hash_function

Nonce value

Miners are trying to guess the hash, based on difficulty. This can be set initially as n number of zeros a hash of a block should start with. The difficulty increases with the number of zeros. So, in this case, miners have all the transaction data, but they do not know one value, the so-called Nonce. They have to guess the specific nonce value that makes the block start with a specified number of zero. Because of the enormous volume of the number space, the only way to get the nonce value is random.

Let’s finally see what a block is

The block is the building group for any blockchain.

It has 2 main components:

Block header
Block body

The Block header has 6 sub-components:

Block version — does not matter in most cases, but it can signal which protocol decisions it supports
Merkle tree root hash — encodes the blockchain data in a secure matter. It enables nodes to quickly check the current block for integrity. [6]Simplified, it serves the same role as a hash of archive files downloaded from an internet repository.
Previous block hash — this is the hash of the previous block, without which there will be no connection between blocks and no chronology.
*nBits *— this is the encoded target threshold under which a block is considered valid. The lower the target, the greater the difficulty to generate a block. Generating a block is more like a lottery. It generates numbers between 0 and 256-bit numbers and a hash is considered valid if it is under the target. Please note, as the target decreases (so the number is smaller), the number of zeros increases, so the difficulty increases.
*Nonce *— is the variable increment for the PoW (Proof of Work).
*Timestamp *— this not only adds some randomness to the block hash, but it also makes it difficult for an adversary to manipulate the blockchain. A timestamp is considered valid if it is greater than the median timestamp of the previous 11 blocks.

So based on these components of Block Header, this is how blocks are linked:

In this scenario, if a hacker wants to attack a specific block number, they will have to alter not only preceding blocks, but also following blocks, as other miners are constantly checking hashes and the longest chain of blocks.

Wallets

A wallet is a program that allows users to buy, sell or monitor balances. This only makes sense in cases where we are storing values in the digital ledger of the blockchain. [7]

Wallets use a private-public keys pair to authenticate users.

There are 3 types of wallets, available to store/reflect the transactions on blockchain:

software — mobile, desktop or web apps
hardware — stores private keys on hardware devices, such as USB
paper — the pair of keys are generated by the software and then they are printed, to make transaction possible (by sending funds to the address on paper)

How a transaction is executed
Transactions are part of the block body and are the most important component of a blockchain. Any other components have the role of contributing to the safety of transactions.

Transactions are data structures that encode the transfer of value between participants in the blockchain system. The process of transaction verification and recording is immediate and permanent. [8] The transaction is approved through a process known as consensus.

A transaction is committed in 4 stages:

Initiation of transaction proposal. At the initial stage, the transaction is created and signed by the owner.
At this stage, the transaction is broadcasted to the network.
The transaction is verified. Once the transaction is broadcasted to the network, other authorised nodes verify it. If the transaction is valid, it is added to a Block, and if not, nodes reject the transaction. [8]
The transaction is committed. Finally, the Block is added to the Blockchain, and the transaction is committed.

Mining

This is a process of recording new transactions on the blockchain ledger. When two users make a transaction, nobody can see it until a miner puts it in a block. It is only after confirming the nonce value by the miner, which matches a valid hash (under target encoded in nbits).

For example, if the difficulty is 4, in this case, the hash will be valid.

As described below, a hash function always generates the same output for the same input. So, the miners have to change something in the content of the Block in order to find a valid hash, so that would be the nonce value. They are endlessly checking values in order to find a match that would generate the desired hash value. Also, note that the timestamp is constantly changing during the miner’s effort to find (to mine) the nonce value.

By creating a block, a miner receives a monetary reward, so they will have an interest in investing effort in this process.

The mining process is not essential for a blockchain to exist. This is a way of validating transactions especially used for public blockchains. In private blockchains, there can be customised rules to validate transactions.

Longest chain rule

For adding a new block to the blockchain, we need to use a lot of effort to generate the blocks. As a rule, nodes will always select the longer chain over the shorter one.

Adopting the longest chain rule allows every node on the network to agree on what the blockchain looks like. [9] So, they agree in this way on the same transaction history. This means that nodes that are acting independently can maintain a globally shared view of a file.

Let’s assume, there are 100 blocks on the chain and a malicious node gets corrupted on node 23. Affecting node 23 on a local node will also break the next node, so if this node tries to broadcast its blockchain to the network, the other nodes will reject it as they already have longer chains on their local nodes.

However, sometimes the longest chain rule does not necessarily mean the blockchain that requires the most energy to be created. [9] This is the case when nodes have to check 2 versions of the blockchain with multiple difficulty periods. In this case, nodes will select the one with the most cumulative chainwork (the total number of hashes that are expected to have been necessary to produce the chain).