Sahil Sojitra

Posted on Jul 11, 2023

From Bits to Blockchain: The Inner Workings of Ethereum Transactions Explained

#ethereum #security #blockchain #beginners

Transactions in Ethereum are like messages that cause changes on the blockchain. They are created by user accounts and processed by the Ethereum network. Transactions are what make things happen and execute contracts on Ethereum.

Imagine Ethereum as a machine with different states. Transactions are what drive this machine and make it change from one state to another. Contracts, which are important parts of Ethereum, cannot work by themselves. Ethereum doesn't run automatically; everything starts with a transaction.

In this blog, we will explain transactions, how they work, and their details. While some of this information is more relevant to those building wallet apps or dealing with transactions at a technical level, you don't need to worry about it if you use existing wallet applications. However, you might still find the details interesting!

The Structure of a Transaction

A transaction in Ethereum has a specific structure when it's sent over the network. However, different clients and applications may add their own extra information when storing and processing the transaction internally.

Here are the main components of a transaction:

Nonce: A sequence number assigned by the sender's account to prevent duplicate transactions.
Gas price: The amount of ether (in wei) that the sender is willing to pay for each unit of gas used to process the transaction.
Gas limit: The maximum amount of gas the sender is willing to use for this transaction.
Recipient: The Ethereum address of the intended receiver of the transaction.
Value: The amount of ether (in wei) to be sent to the recipient.
Data: A variable-length payload containing binary information.
v, r, s: Three components used for the digital signature of the sender's account.

The transaction's structure is serialized using a special encoding scheme called Recursive Length Prefix (RLP). Ethereum uses big-endian integers to represent numbers, and the length of each field is identified using RLP's length prefix.

It's important to note that the field labels (e.g., to, gas limit) mentioned here are for clarification purposes and are not part of the serialized transaction data itself. RLP doesn't include field delimiters or labels. Anything beyond the specified length belongs to the next field in the structure.

While this is the structure of the transmitted transaction, most software representations and user interfaces add extra information derived from the transaction or the blockchain.

For example, you might notice that the sender's address ("from" data) is not explicitly included in the transaction. That's because the sender's public key can be derived from the ECDSA signature components (v, r, s), and the address can be derived from the public key. The "from" field you see in transaction visualizations is added by the software for clarity. Other additional information, such as the block number and transaction ID, is often added by client software but is not part of the original transaction message itself.

The Transaction Nonce

The nonce is an important component of a transaction, but it is often misunderstood. It represents the number of transactions sent from an address or the number of contract creations made by that address. However, the nonce itself is not stored explicitly on the blockchain. It is dynamically calculated by counting the confirmed transactions originating from an address.

The nonce serves two purposes: maintaining the order of transactions and preventing duplication. Let's consider examples for each scenario:

Transaction Order: Imagine you want to send two transactions: one for 6 ether and another for 8 ether. You send the 6-ether transaction first, thinking it's more important, and then the 8-ether transaction. However, if your account doesn't have enough funds for both, one transaction will fail. Since transactions can reach nodes in different orders, it's uncertain which one will be accepted. But with the nonce, the first transaction will have a specific nonce value (let's say 3), while the second transaction will have the next nonce value (4). So, the second transaction will be ignored until the previous nonces are processed, ensuring the desired order of execution.
Duplication Protection: Suppose you send a payment of 2 ether to someone for a product. Without a nonce, a second transaction with the same amount to the same address would appear identical to the first one. This means anyone on the network could replay your transaction multiple times, potentially draining your funds. However, with the nonce value included in the transaction data, every single transaction is unique, even when sending the same amount of ether to the same recipient address multiple times. Even if you send the same amount to the same recipient multiple times, the incrementing nonce ensures no duplication. This safeguards your payments from being exploited or replayed by others.

To summarize, the nonce is crucial in account-based protocols like Ethereum, unlike the "Unspent Transaction Output" (UTXO) mechanism used in Bitcoin. It maintains transaction order and prevents unauthorized duplication, enhancing the security and integrity of transactions.

Keeping Track of Nonces

The nonce represents the count of confirmed transactions originating from an account. You can obtain the nonce by querying the blockchain through a web3 interface.

>web3.eth.getTransactionCount("0x9e713963a92c02317a681b9bb3065a8249de124f")
49

The nonce is a zero-based counter, meaning the first transaction has nonce 0. In this example, we have a transaction count of 49, meaning nonces 0 through 48 have been seen. The next transaction’s nonce will need to be 49.

Your wallet software keeps track of nonces for each managed address. If you're building your own wallet or application, you assign the next nonce when creating a new transaction. However, the pending transactions may not be included in the nonce count immediately. It's important to wait until the pending and confirmed counts are equal before relying on the nonce from the getTransactionCount call. Once the counts align, you can start tracking the nonce in your application until each transaction is confirmed.

>web3.eth.getTransactionCount("0x9e713963a92c02317a681b9bb3065a8249de124f", \
"pending")
40
>web3.eth.sendTransaction({from: web3.eth.accounts[0], to: \
"0xB0920c523d582040f2BCB1bD7FB1c7C1ECEbdB34", value: web3.utils.toWei(0.01, "ether")});
>web3.eth.getTransactionCount("0x9e713963a92c02317a681b9bb3065a8249de124f", \
"pending")
41
>web3.eth.sendTransaction({from: web3.eth.accounts[0], to: \
"0xB0920c523d582040f2BCB1bD7FB1c7C1ECEbdB34", value: web3.utils.toWei(0.01, "ether")});
>web3.eth.getTransactionCount("0x9e713963a92c02317a681b9bb3065a8249de124f", \
"pending")
41
>web3.eth.sendTransaction({from: web3.eth.accounts[0], to: \
"0xB0920c523d582040f2BCB1bD7FB1c7C1ECEbdB34", value: web3.utils.toWei(0.01, "ether")});
>web3.eth.getTransactionCount("0x9e713963a92c02317a681b9bb3065a8249de124f", \
"pending")
41

When we sent the first transaction, the transaction count increased to 41, indicating a pending transaction. However, when we quickly sent three more transactions, the getTransactionCount call didn't include them. It only counted one transaction, even though we expected all three to be pending in the network's memory pool (mempool). If we wait for a few seconds to let network communications settle, the getTransactionCount call will return the expected number. But during this time, when there are multiple pending transactions, relying on getTransactionCount may not be helpful.

When developing an application that creates transactions, it's important not to rely on getTransactionCount for pending transactions. Only when the pending and confirmed counts are the same (all outstanding transactions are confirmed) can you trust the getTransactionCount result and start tracking the nonce. Afterward, you should keep track of the nonce in your application until each transaction is confirmed.

Gaps in Nonces, Duplicate Nonces and Confirmation

It's crucial to keep track of nonces when creating transactions programmatically, especially if you're doing so from multiple independent processes simultaneously.

The Ethereum network processes transactions based on their nonce in a sequential manner. If you send a transaction with nonce 0 and then another with nonce 2, the second transaction will not be included in any blocks. It will be stored in the mempool, and the network will wait for the missing nonce (1 in this case) to appear. Nodes assume that the missing nonce is delayed and that the transaction with nonce 2 arrived out of order.

To resolve this, you need to send a valid transaction with the missing nonce (1) to allow both transactions (1 and 2) to be processed and included in blocks. If there's a gap in the nonce sequence, subsequent transactions will be stuck, waiting for the missing nonce to be filled. It's important to note that once a transaction with the missing nonce is validated, all the subsequent transactions with higher nonces become valid. It's not possible to undo or recall a transaction.

On the other hand, if you accidentally duplicate a nonce by sending two transactions with the same nonce but different recipients or values, one of them will be confirmed, and the other will be rejected. The confirmation of the transaction will be determined by the order in which they reach the first validating node, making it somewhat random.

As you can see, accurately managing nonces is necessary to avoid issues. If your application doesn't handle nonces correctly, you may encounter problems. Handling nonces becomes even more challenging when dealing with concurrent transactions, as we'll explore in the next section.

Concurrency, Transaction Origination, and Nonces

Concurrency means multiple independent systems working at the same time. In Ethereum, different parts of the system can work simultaneously, but they all follow the same rules to maintain consistency.

Imagine you have multiple wallet applications generating transactions from the same address. For example, an exchange processing withdrawals from its online wallet. Ideally, you'd want multiple computers handling withdrawals to avoid slowdowns or failures. However, this creates problems because each computer needs to coordinate and assign nonces correctly.

One approach is to use a single computer to assign nonces to the computers generating transactions. However, this creates a single point of failure. If a nonce is assigned but not used, all subsequent transactions get stuck.

Another approach is to generate transactions without assigning a nonce, leaving them unsigned. These unsigned transactions can then be queued to a single node that signs them and keeps track of nonces. However, this creates a bottleneck as signing and tracking nonces can become congested under heavy load. This approach lacks concurrency in a critical part of the process.

These concurrency challenges, combined with the difficulty of tracking balances and confirmations in independent processes, often lead to solutions that avoid concurrency. For example, some implementations use a single process to handle all withdrawals or set up multiple independent wallets that require occasional rebalancing.

In summary, managing concurrency in Ethereum can be complex. Practical solutions aim to balance between concurrent processing and avoiding bottlenecks in order to maintain a stable and reliable system.

Transaction Gas

Gas is like the fuel used in Ethereum transactions. It's not the same as ether, the main cryptocurrency of Ethereum, but a separate virtual currency with its own value compared to ether. Gas is used to control the resources that a transaction can use because it's processed by many computers worldwide. This ensures that transactions don't overload the system or consume excessive resources.

Gas exists separately from ether to protect the system from the rapid changes in ether's value. It also helps manage the costs of different resources involved in transactions, such as computation, memory, and storage.

The gasPrice field in a transaction allows the sender to choose the price they are willing to pay for gas. This price is measured in wei per unit of gas. For example, if a transaction sets the gasPrice to 3 gwei (3 billion wei), it means the sender is willing to pay that amount in exchange for the required gas.

Wallets can adjust the gasPrice in transactions to influence the speed of confirmation. Higher gasPrice means faster confirmation, while lower prices result in slower confirmation. In some cases, transactions with zero gasPrice can be fee-free and still get included in blocks during periods of low demand.

The gasPrice can be set to zero, which means wallets can create transactions without paying any fees. However, these transactions may not get confirmed if the network is busy. The Ethereum protocol doesn't prevent free transactions, and some examples of such transactions have been successfully added to the Ethereum blockchain.

The gasLimit field determines the maximum amount of gas the sender is willing to buy for the transaction. For simple payments, the gas amount needed is fixed at 21,000 units. To calculate the cost in ether, multiply 21,000 by the gasPrice. For example:

> web3.eth.getGasPrice(function(err, res){console.log(res*21000)})
> 210000000000000

When interacting with a contract, estimating the gas needed becomes challenging. Contracts can have different execution paths and varying gas costs based on conditions beyond your control. For instance, a contract may have different outcomes after a certain number of calls, resulting in different gas costs depending on prior transactions. The gas needed for a contract cannot be accurately determined.

An analogy often used is to think of gasLimit as the fuel tank capacity for your car (representing the transaction). You fill the tank with an estimated amount of gas needed for the journey (computation for validating the transaction). However, unexpected factors like diversions or complex execution paths can increase fuel consumption.

In Ethereum, the gasLimit works more like a credit account at a gas station. When you send a transaction, it's validated that your account has enough ether to cover gasPrice * gasLimit. The actual deduction from your account occurs after the transaction finishes executing, and you're billed only for the gas consumed. Therefore, you need to have enough balance to cover the maximum amount you're willing to pay before sending the transaction.

Transaction Recipient

The recipient of a transaction is specified in the "to" field, which contains a 20-byte Ethereum address. This address can belong to an individual (EOA) or a contract.

Ethereum doesn't perform any validation on this field. Any 20-byte value is considered a valid address. Whether the address corresponds to an existing private key or a contract is not verified by Ethereum. If you send a transaction to the wrong address, the ether sent will likely be lost forever and cannot be accessed, as most addresses do not have a known private key to authorize spending.

Validating the address is typically done at the user interface level to prevent such mistakes. Burning ether, or making it unspendable, can have valid reasons in some cases, like discouraging cheating in payment channels or other smart contracts. Since the total amount of ether is finite, burning it effectively redistributes its value to all ether holders in proportion to the amount they hold.

Transaction Value and Data

The main "payload" of a transaction is contained in two fields: value and data. Transactions can have both value and data, only value, only data, or neither value nor data. All four combinations are valid.

A transaction with only value is a payment. A transaction with only data is an invocation. A transaction with both value and data is both a payment and an invocation. A transaction with neither value nor data—well that’s probably just a waste of gas! But it is still possible.

Transmitting Value to EOA's and Contracts

When you send a transaction with a value in Ethereum, it can be considered a payment. The behavior of such transactions depends on whether the destination address is a contract or an Externally Owned Account (EOA).

For EOA addresses, the value you send is added to the recipient's balance. If the address is new and hasn't been seen before, it will be added to the client's internal representation of the state, and its balance will be initialized with the payment value.

If the destination address is a contract, the Ethereum Virtual Machine (EVM) will execute the contract and attempt to call the function specified in the transaction's data payload. If no data is provided, the EVM will try to execute a fallback function. If the fallback function allows payments, it will be executed to determine the next steps. If there is no code in the fallback function or if it is not payable, the transaction will be reverted.

A contract can reject incoming payments by throwing an exception when a function is called or based on specific conditions coded in a function. If the function executes successfully without exceptions, the contract's state will be updated to reflect an increase in its ether balance.

In simpler terms, when you send ether to an individual, it increases their balance. But if you send ether to a contract, the contract's code is executed, and if the contract allows payments, its balance will increase. Contracts can also reject payments based on their code's conditions.

Transmitting a Data Payload to an EOA or Contract

When you include data in your transaction, it is usually intended for a contract address. Sending data to an externally owned account (EOA) is also valid, but its interpretation depends on the wallet you use. The Ethereum protocol ignores the data when sent to an EOA, and most wallets also ignore any data received by EOAs they control. In the future, there might be standards that allow wallets to interpret data like contracts do, enabling transactions to invoke functions within user wallets. However, it's important to note that the interpretation of data by an EOA is not governed by Ethereum's consensus rules, unlike contract executions.

Assuming your transaction is targeting a contract address, the Ethereum Virtual Machine (EVM) interprets the data as a contract invocation. In most cases, the data represents a function call, specifying the function's name and any encoded arguments.

The data payload sent to an ABI-compatible contract (which most contracts are) is a hexadecimal serialization that consists of two parts:

Function Selector: The first 4 bytes of the Keccak-256 hash of the function's prototype. This uniquely identifies the function you want to invoke.
Function Arguments: The arguments of the function, encoded based on the rules defined in the ABI specification for various data types.

In simpler terms, when you send data to a contract, it is treated as a request to call a specific function within that contract, and the data payload contains the necessary information to identify the function and pass its arguments.

Special Transaction: Contract Creation

When you want to create a new contract on the Ethereum blockchain, you send a special transaction to an address known as the zero address, represented by 0x0 in the "to" field. This address is not associated with an externally owned account (EOA) or an existing contract. It serves only as a destination for contract creation.

While the zero address is specifically meant for contract creation, it sometimes receives accidental payments from various addresses. This can result in the loss of ether if sent by mistake. On the other hand, intentional ether burns, where ether is deliberately destroyed by sending it to an address that cannot spend it, can also occur. However, if you intend to perform an intentional ether burn, it is recommended to use a designated burn address to make your intention clear to the network.

0x000000000000000000000000000000000000dEaD

Any ether sent to the designated burn address will become unspendable and be lost forever.

To create a contract, your transaction only needs to include a data payload containing the compiled bytecode that will generate the contract. The purpose of this transaction is solely to create the contract. Optionally, you can specify an ether amount in the value field to set the new contract with an initial balance. However, sending ether to the contract creation address without a data payload (no contract) has the same effect as sending it to a burn address—it will be lost as there is no contract to credit.

Digital Signatures

In this section, we look at how digital signatures work and how they can be used to present proof of ownership of a private key without revealing that private key.

The Elliptic Curve Digital Signature Algorithm (ECDSA)

The digital signature algorithm used in Ethereum is called the Elliptic Curve Digital Signature Algorithm (ECDSA). It relies on private-public key pairs based on elliptic curves. A digital signature in Ethereum serves three purposes: proving authorization, ensuring non-repudiation, and guaranteeing data integrity.

A digital signature is a mathematical scheme that verifies the authenticity of digital messages or documents. It consists of two parts: creating the signature using a private key and verifying the signature using a public key.

Creating a Digital Signature

In Ethereum, when creating a digital signature, the transaction is used as the message, specifically the Keccak-256 hash of the RLP-encoded transaction data. The private key of the account is used for signing, resulting in the signature. The signature is composed of two values, often referred to as "r" and "s."

Sig = F-sig(F-keccak256(m),k)

where,

k is the signing private key.
m is the RLP-encoded transaction.
F-keccak256 is the Keccak-256 hash function.
F-sig is the signing algorithm.
Sig is the resulting signature.

The function F-sig produces a signature Sig that is composed of two values, commonly referred to as r and s:

Sig = ( r , s )

Verifying the Signature

To verify a signature, you need the signature itself (represented by "r" and "s"), the serialized transaction, and the public key that corresponds to the private key used to create the signature. Verification ensures that only the owner of the private key that generated the public key could have produced the signature for that specific transaction.

The signature verification algorithm takes the message (which is a hash of the transaction), the public key of the signer, and the signature (r and s values). If the algorithm determines that the signature is valid for the given message and public key, it returns "true."

ECDSA Math

The signature algorithm involves generating a ephemeral (temporary) private key to ensure the security of the sender's actual private key. This temporary key is used to calculate the values "r" and "s" in the signature.

Here's a simplified explanation of the steps involved:

Generate a random number "q" as the temporary private key.
Calculate the corresponding temporary public key "Q" using the elliptic curve generator point.
The "r" value of the signature is the x coordinate of the temporary public key "Q".
Calculate the "s" value of the signature using the formula:

s ≡ q-1 (Keccak256(m) + r * k) (mod p).

where,

"q" is the temporary private key.
"r" is the x coordinate of the temporary public key.
"k" is the actual private key of the sender (EOA owner).
"m" is the transaction data.
"p" is the prime order of the elliptic curve.

To verify a signature, the process is the inverse of the signature generation function. It involves using the "r" and "s" values of the signature, along with the sender's public key, to calculate a point on the elliptic curve called "Q," which represents the ephemeral public key used during signature creation. Here are the steps:

Check that all inputs are properly formatted.
Calculate the value "w" by taking the inverse of "s" modulo "p".
Calculate "u1" as the result of multiplying the Keccak256 hash of the signed transaction data ("m") by "w" modulo "p".
Calculate "u2" as the result of multiplying "r" by "w" modulo "p".
Finally, calculate the point "Q" on the elliptic curve using the formula

Q ≡ u1 * G + u2 * K (mod p).

where,

"r" and "s" are the signature values.
"K" is the sender's public key (EOA owner's public key).
"m" is the signed transaction data.
"G" is the elliptic curve generator point.
"p" is the prime order of the elliptic curve.

If the x coordinate of the calculated point "Q" matches the "r" value of the signature, then the verifier can conclude that the signature is valid.

Separating Signing and Transmission (Offline Signing)

Once a transaction is signed, it can be transmitted to the Ethereum network. Normally, the process of creating, signing, and broadcasting a transaction happens in a single step. However, it is possible to separate the signing and transmission steps for security reasons.

Separating these functions is done to protect the private keys used for signing. The computer that signs the transaction needs to have the private keys loaded in memory, which can be risky if it is connected to the internet. By performing the signing on an offline device and the transmission on an online device, known as offline signing, the private keys remain secure.

Here's an overview of the offline signing process:

Create an unsigned transaction on an online computer where the current account state, including the nonce and available funds, can be retrieved.
Transfer the unsigned transaction to an offline device that is not connected to the internet for signing. This can be done through methods like using a QR code or a USB flash drive.
Transmit the signed transaction back to an online device for broadcasting it on the Ethereum blockchain. This can be done by scanning a QR code or transferring the transaction via a USB flash drive.

By following this process, the private keys are kept offline during the signing step, reducing the risk of unauthorized access to the keys.

Transaction Propagation

The Ethereum network uses a system where all computers in the network, called nodes, are connected to each other. When someone creates or sends a transaction, it spreads across the network like a wave, reaching every node. Each node checks if the transaction is valid and then passes it on to its connected nodes. This process continues until every node has a copy of the transaction.

The way transactions spread is designed to be fast and efficient. It ensures that within a few seconds, the transaction is known to every node in the network around the world. The best part is that no node can tell where the transaction originated from. This makes it difficult for anyone to track or interfere with transactions unless they control a large portion of the network.

This approach helps maintain security and privacy in the Ethereum network. It ensures that transactions are widely distributed and protected from unauthorized access or manipulation.

Recording on the Blockchain

In the Ethereum network, some nodes are operated by miners who use powerful computers to process transactions and add them to blocks. These computers, known as mining farms, work to find a proof of work that validates the transactions in a block. Valid transactions are then recorded in the Ethereum blockchain.

Once a transaction is included in a block and mined, it becomes a permanent part of the blockchain. It can modify the state of the Ethereum network by changing account balances or invoking contracts that make internal changes. These changes are recorded in a transaction receipt, which also includes any associated events.

In simpler terms, when a transaction completes its journey from creation to being mined, it leaves a lasting impact on the Ethereum network and becomes a permanent part of the blockchain.

Multiple-Signature (Multisig) Transaction

In Ethereum, it is possible to create smart contracts that enforce custom rules for transferring ether and tokens. This allows for the implementation of features like multisignature accounts, where funds can only be spent when multiple parties sign a transaction.

To set up a multisignature account, you transfer your money to a special contract instead of a regular account. This contract is programmed with the rules you want, such as requiring two or more people to sign off on transactions. When you want to send money from the multisignature account, all the authorized users need to sign and approve the transaction using a wallet app.

These contracts can also be designed to require multiple signatures for executing specific actions or triggering other contracts. The security of the multisignature system depends on the code written for the contract.

While this flexibility is powerful, it also introduces the risk of programming errors that could undermine the security of multisignature setups. There are proposals to create a built-in feature for multisignature transactions in Ethereum to simplify the process and make it more secure, similar to Bitcoin's system that is known for its robustness and security.

Conclusion

Transactions serve as the initial step for all actions within the Ethereum system. They act as the "triggers" that prompt the Ethereum Virtual Machine to process contracts, update balances, and make changes to the Ethereum blockchain's overall state.

Top comments (2)

Utsav Desai • Aug 9 '23

Fascinating topic! This article appears to dissect Ethereum transactions, guiding us through the intricate process from bits to blockchain. Looking forward to gaining a clearer understanding of how transactions function within the Ethereum network.

Bhavypatel45 • Jul 16 '23

Great article! I really enjoyed learning about the inner workings of Ethereum transactions. It's fascinating how transactions are broken down into bits and then combined into blocks on the blockchain. I appreciate the clear and concise explanations provided in the article. It helped me understand the process better. Looking forward to more informative content like this!