Afri Schoedon

Posted on Nov 29, 2017 • Edited on Dec 17, 2017

The Ethereum-blockchain size will not exceed 1TB anytime soon.

#ethereum #parity #blockchain #bitcoin

Before diving into this article, please read the two disclosures about my involvement (1,2) and the one on data accuracy (3) at the bottom of the article.

At least once a month someone posts a chart on r/ethereum predicting the blockchain size of Ethereum will soon exceed 1 TB. I want to take that chance to clean up with some stories around the Ethereum-blockchain size in this article and try to explain why this chart is technically correct, but not the full picture.

Let's have a look at this chart first. It shows the complete data directory size of an Ethereum node (red), Geth in this case, and a Bitcoin node (blue), probably Bitcoin-Core, plotted over time. While the Bitcoin graph is moving slightly upwards in a seemingly linear inclination, the Ethereum graph reminds the reader of an exponential growing slope.

On Blocks, Block-History, States, and State-History

Users accusing Ethereum of blockchain-bloat are not far off with their assumptions. But actually, not the chain is bloated but the Ethereum state. I want to examine some terminology from the Whitepaper before proceeding.

Block. A bundle of transactions which, after proper execution, update the state. Each transaction-bundling block gets a number, has some difficulty, and contains the most recent state.
State. The state is made up of all initialized Ethereum accounts. At the time of writing, there are around 12 million known accounts and contracts growing at a rate of roughly 100k new accounts per day.
Block-History. A chain of all historical blocks, starting at the genesis block up to the latest best block, also known as the blockchain.
State-History. The state of each historical block makes up the state history. I will get into the details on this later.

If this already bores you, now please, read on.

Understanding Pruning-Modes and Sync-Modes

Early 2016, the Go-Ethereum team introduced a so-called fast synchronization mode. Since then, it was pretty famous to run geth --fast, especially after the spam-attacks on Ethereum later the same year making a full synchronization mode painful. I'm writing these modes italic because I will come back to an essential disambiguation at a later point in this article. Just keep them in mind for now.

The Parity team (formerly Ethcore) reacted to the on-chain spam by offering a warp synchronization mode at the end of 2016 to ease the chain synchronization for new users. Much as the same as Geth's fast, parity --warp soon became the de-facto standard mode for users trying to synchronize the Ethereum chain. As of today, both these options are adapted as default in both clients.

But what does it mean to fast-sync versus full-sync a Geth node? What does it actually mean to warp-sync a Parity node rather than no-warp-syncing it?

A full Geth node processes the entire blockchain and replays all transactions that ever happened. A fast Geth node downloads all transaction receipts in parallel to all blocks, and the most recent state database. It switches to a full synchronization mode once done with that. Note, that this results not only in a fast sync but also in a pruned state-database because the historical states are not available for blocks smaller than best block minus 1024. That's not an issue, but before reading on, please keep in mind that Geth synchronization modes are also pruning modes.

Looking at Parity configuration options, this gets more complex. In addition to the previously mentioned synchronization modes, Parity also offers separate pruning modes, namely fast and archive... Right, Geth fast is a sync-mode, we learned, that even prunes, however, Parity fast is pruning mode not heavily coupled to the sync mode. At this point, I have to admit, the terminology is confusing, and I might have lost you already. Let's draw something with pen and paper.

Geth's fast enables a quicker synchronization and database pruning. Geth full disables both. Parity warp, however, can be disabled without disabling the state-trie pruning! This is a significant sentence. Thus I bolded it. And I am not comparing Ethereum clients here, that's not my intention at least. I want to show you that it is possible to run a full-verifying Ethereum node with a small database. Parity just provides the proof-of-concept for this.

But why is this? Because as long as you have all historic blocks on your disk, you can compute any historical state from it by reprocessing the entire chain again. But in most use-cases, you don't need historical states at all! Therefore it is smart just to delete outdated entries from the state history and to reduce your required disk space by 95%.

So, what's the minimum Size of a full-verified Node?

Some 10's of GB by just running parity --no-warp. Earlier this fall it was less than 20 GB, but the state is growing very fast. Currently, the raw historical block data containing the blocks and transactions is approximately 12-15GB in size and the latest state around 1-2GB.

But is this to be considered a full Ethereum node? Yes:

It runs a full blockchain synchronization starting at genesis.
It replays all transactions and executes all contracts.
It recomputes the state for each block.
It keeps all historical blocks on the disk.
It keeps the most recent states on the disk and prunes ancient states.

Something an Ethereum client never does is deleting old blocks. This is a significant difference between Bitcoin and Ethereum because pruning a Bitcoin node does not leave any choice but removing old blocks. With this context available, it's easier to understand why users often think a pruned Ethereum node is not a full node. But now, dear reader, you know the opposite is true. :)

And on top of this, even a warp-synced Parity node is downloading the whole history of blocks after the initial synchronization allowing it to serve the network as a full node once completed the ancient-block synchronization.

The Full Picture: 9 Parity Configurations compared

Below is a screenshot of my nicely-colored spreadsheet trying to distinguish between node-security of different Parity operation modes.

The configurations 00 through 05 are all to be considered full nodes. Configuration 06 is a default-configuration warp-node which can be regarded as full once the ancient block download is finished. However, it does not replay all transactions; it only checks the Proof-of-Work of the historical blocks.

The configuration 07 is something users often ask for but should be highly discouraged in production use. This setting is comparable to a pruned bitcoin node as historical blocks are partially not available. This is not a full node anymore. Note, how I added a separator above this paragraph. You get the idea.

Configuration 08 is a light client, but that's worth another blog article. Thanks for scrolling this far down, here is your conclusion: An Ethereum full node does not require more than 20-30 GB disk space by default. :)

Noteworthy disclosures and bottom-line comments.

(1) I work for Parity. I'm comparing different Parity configurations not only because I sincerely know and understand them, but also because Parity allows users to configure pruning mode and synchronization mode separately.

(2) I hold some Bitcoin and some Ether. I hope this does not have any influence on the technical aspects I'm outlining in this article. Also, I'm trying not to become overly political about this.

(2) I have been running Parity in 36 different configurations over six weeks to gather the numbers. This is time- and resource-consuming, and still, it bears the issue that I can not keep all configurations running at the same time, and therefore, the accuracy of the numbers presented in this article have to be consumed with caution. I expect the results to differ up to plus/minus 20% from other nodes running the same configuration. But you get the idea:

| ID | Pruning / DB Config | Verification    | Available History          | ETH        | ETC        | MSC        | EXP        | Parity CLI Options                         |
|====|=====================|=================|============================|============|============|============|============|============================================|
| 00 | archive +Fat +Trace | Full/No-Warp    | All Blocks + States        | 385     GB |  90     GB |  25     GB |   5.6   GB | --pruning archive --tracing on --fat-db on |
| 01 | archive +Trace      | Full/No-Warp    | All Blocks + States        | 334     GB |  90     GB |  21     GB |   5.8   GB | --pruning archive --tracing on             |
| 02 | archive             | Full/No-Warp    | All Blocks + States        | 326     GB |  91     GB |  30     GB |   5.5   GB | --pruning archive                          |
| 03 | fast +Fat +Trace    | Full/No-Warp    | All Blocks + Recent States |  37     GB |  13     GB |   3.5   GB |   1.3   GB | --tracing on --fat-db on                   |
| 04 | fast +Trace         | Full/No-Warp    | All Blocks + Recent States |  34     GB |  13     GB |   3.5   GB |   1.2   GB | --tracing on                               |
| 05 | fast                | Full/No-Warp    | All Blocks + Recent States |  26     GB |   9.7   GB |   3.0   GB |   1.1   GB | --no-warp                                  |
| 06 | fast +Warp          | PoW-Only/Warp   | All Blocks + Recent States |  25     GB |   9.6   GB |   2.6   GB |   0.96  GB |                                            |
| 07 | fast +Warp -Ancient | No-Ancient/Warp | Recent Blocks + States     |   5.3   GB |   2.9   GB |   0.19  GB |   0.13  GB | --no-ancient-blocks                        |
| 08 | light               | Headers/Light   | No Blocks + No State       |       5 MB |       3 MB |       4 MB |       5 MB | --light                                    |

Meta-data:

Version: Parity/v1.8.0-unstable-7940bf6ec-20170921/x86_64-linux-gnu/rustc1.19.0 from source w/ musicoin support
Ubuntu: 17.04 Kernel 4.10.0-35-generic / September 2017 / Lenovo Thinkpad X270, Core i7-7600U, 1TB SSD, 16GB RAM

Thanks for scrolling to the bottom. <3

Update: Thanks for featuring me on dev.to and twitter. Users who enjoyed reading this article, might also find the following reddit discussion interesting.

Fun fact: While publishing this article, the price of Bitcoin broke 10_000 USD and Ethereum 500 USD. I think I will add current market prices to my articles in future, just for fun.

Update: Thanks for rating this top-1 post, dear dev.to team <3 <3 <3

Update: Here is a more controversial discussion on Hackernews.

Top comments (21)

Jason C. McDonald • Nov 29 '17

I only understood about 10% of this, being totally uninformed about blockchain (my fault)...but I still have to <3 and applaud, because I know research effort when I see it! Great job.

Thomas Jay Rush • Nov 30 '17 • Edited

I run a Parity node with --tracing on and --pruning archive and have done so since June of 2016. Since the Byzantium hard fork (October 16), the chain data (in this admittedly extreme case) has grown more than 125 GB. If one wishes to do what I call a "deep, full, audit level accounting", one needs the traces. It's unclear if one needs the archive from this article. If you're running tracing and archive, the chain will blow past 1TB very soon.

Erik Jonsson Thorén • Dec 17 '17

Hi Afri, thanks for the write-up!

I have a question, you write:

The configuration 07 is something users often ask for but should be highly discouraged in production use. This setting is comparable to a pruned bitcoin node as historical blocks are partially not available.

Could you explain why this should be discouraged?

Afri Schoedon • Dec 17 '17

Because it does not hold historic blocks. And you can only verify the integrity of the chain, the transactions, the state, and balances, if you have access to all historic block data. That is available in Configurations 00-06, and partially in 08, but not in 07.

Kari Ilkkala • Dec 7 '17

Great article!

If I understood correctly, a full but pruned node would need all the blocks + the recent state of each account (and smart contract)?

Yesterday Dec 6th 2017 the number of Ethereum accounts grew to 13.4 million. Based on your article, the size of the chaindata of a pruned Ethereum node is now, depending on the mode, somewhere between 25 and 40 GB?

As more and more apps meant for global use rely on users to have cryptocurrencies and tokens, the number of Ethereum accounts can start to approach that of Internet users in general.

How big even the pruned node will grow if we have like 1 billion plus accounts? Proportional growth from 35 GB / 13.4M accounts node size would give 2.6 TB node size.

Not trying to invoke FUD, just asking?

Afri Schoedon • Dec 7 '17

Hey, thanks for reading it! Without running the numbers, I want to say: it's not impossible. I am just highlighting that we are far away from this and this is not happening soon.

There are a lot of proposals for scalability, I don't really have an overview, and also I do not feel technically qualified to discuss them currently. But regarding the state size, or let's say, state bloat, you might want to read about state-cleaning or dust-cleaning. There are proposals to just purge entries from the state which are provable non-recoverable accounts (i.e., balance smaller lowest possible transaction fee).

The good thing, we still have time, in that regard, to discuss proposals and eventually implement them.

Magento Chile • Mar 10 '18

Hello Rando,

Thank you very much for the information. And I am checking your table data for Full / No-Warp --no-warp (05). I'm doing the installation of Parity in a Digital Ocean Droplet - Cents 2vcpu 2G RAM 60G Storage.

My installation for Parity in Centos 7 was like this:

1.- Dockear search:
docker search parity/parity

2.- We are looking for the latest version of Parity:
curl -sS 'registry.hub.docker.com/v2/reposit...' | jq '."results"[]["name"]' | sort

-bash: jq: command not found

We are missing a package yum:
yum install jq

curl -sS 'registry.hub.docker.com/v2/reposit...' | jq '."results"[]["name"]' | sort

Now yes! we see the latest versions:

[root@centos-s-2vcpu-2gb-lon1-01 ~]# curl -sS 'registry.hub.docker.com/v2/reposit...' | jq '."results"[]["name"]' | sort
"beta"
"beta-release"
"nightly"
"stable-release"
"v1.10.0-ci5"
"v1.10.0-ci6"
"v1.8.10"
"v1.8.11"
"v1.9.3"
"v1.9.4"

3.- We put the Docker parity (latest version):
docker pull parity/parity:v1.9.4

Ready put it:

[root@centos-s-2vcpu-2gb-lon1-01 ~]# docker pull parity/parity:v1.9.4
Trying to pull repository docker.io/parity/parity ...
v1.9.4: Pulling from docker.io/parity/parity
c954d15f947c: Pull complete
c3688624ef2b: Pull complete
848fe4263b3b: Pull complete
23b4459d3b04: Pull complete
36ab3b56c8f1: Pull complete
ecd224a1ca24: Pull complete
c6053fbd9bf9: Pull complete
52846da88991: Pull complete
Digest: sha256:a1b992c63edabd240cc5d77f914651b280e4dc5df55f7ec1c5ff07065ed4827a

4.- Ok before you let it run. We will put the ports:

docker run -ti -p 8180:8180 -p 8545:8545 -p 8546:8546 -p 30303:30303 -p 30303:30303/udp parity/parity:v1.9.4 --ui-interface all --jsonrpc-interface all

[root@centos-s-2vcpu-2gb-lon1-01 ~]# docker run -ti -p 8180:8180 -p 8545:8545 -p 8546:8546 -p 30303:30303 -p 30303:30303/udp parity/parity:v1.9.4 --ui-interface all --jsonrpc-interface all
2018-03-09 05:15:26 UTC Starting Parity/v1.9.4-beta-6f21a32-20180228/x86_64-linux-gnu/rustc1.24.0
2018-03-09 05:15:26 UTC Keys path /root/.local/share/io.parity.ethereum/keys/Foundation
2018-03-09 05:15:26 UTC DB path /root/.local/share/io.parity.ethereum/chains/ethereum/db/906a34e69aec8c0d
2018-03-09 05:15:26 UTC Path to dapps /root/.local/share/io.parity.ethereum/dapps
2018-03-09 05:15:26 UTC State DB configuration: fast
……
…
..
.

According to the table you indicate, I am in cell 5, with Fast --no-warp. And I've been synchronizing for about 2 days, and there should be little left over depending on the table:

[root@centos-s-2vcpu-2gb-lon1-01 ~]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/vda1 62903276 27952700 34950576 45% /
devtmpfs 920968 0 920968 0% /dev
tmpfs 941688 0 941688 0% /dev/shm
tmpfs 941688 98812 842876 11% /run
tmpfs 941688 0 941688 0% /sys/fs/cgroup
tmpfs 188340 0 188340 0% /run/user/0

If you manage to synchronize, I will notify you how busy the disk is, so that you can update or check data.

Thank you very much!

Now the only thing I'm looking for is how to create the new account with Docker and the parity signer new-token command (it does not work so far, I think it should be something like docker run parity/parity: v1.9.4 signer new-token something well, but I still have the code)

Regards,

Boris Durán

huyhoangk50 • Dec 16 '17

I want to develop a server that deploys smart contract through web3j. Can I ask you that: is my server considered to be a node of ethereum decentralized system? And which parameters is the server required

Magento Chile • Mar 12 '18

Hi Rando,

Here my data for you to update or contrasts with the table. In Droplet Digital Ocean 2G RAM, 60G storage, 2 CPU (approx 48 hours synchronizing - IP London).
12.03.2018 Parity ethereum synchronized node (Fast --no-warp (cell 5)):

cd /root/.local/share/io.parity.ethereum/docker

[root@centos-s-2vcpu-2gb-lon1-01 docker]# du -sh chains
43G chains

[root@centos-s-2vcpu-2gb-lon1-01 docker]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/vda1 62903276 47366608 15536668 76% /
devtmpfs 920968 0 920968 0% /dev
tmpfs 941688 0 941688 0% /dev/shm
tmpfs 941688 102368 839320 11% /run
tmpfs 941688 0 941688 0% /sys/fs/cgroup
tmpfs 188340 0 188340 0% /run/user/0

[root@centos-s-2vcpu-2gb-lon1-01 io.parity.ethereum]# du -sh docker
44G docker

Regards,

Boris Durán

suffic • Apr 5 '18

If by 'anytime soon' you mean, one quarter later...

Just finished syncing a full node with parity on archive (ID 02 in your table) and I can confirm the current size of parity's db folder is 954GB with 14,751 items.

mohinimraut • Aug 23 '18

when I restart Rest Server I faced following error :
Discovering types from business network definition ...
Connection fails: Error: Error trying to ping. Error: make sure the chaincode landregistry has been successfully instantiated and try again: getccdata composerchannel/landregistry responded with error: could not find chaincode with name 'landregistry'
It will be retried for the next request.
Exception: Error: Error trying to ping. Error: make sure the chaincode landregistry has been successfully instantiated and try again: getccdata composerchannel/landregistry responded with error: could not find chaincode with name 'landregistry'
Error: Error trying to ping. Error: make sure the chaincode landregistry has been successfully instantiated and try again: getccdata composerchannel/landregistry responded with error: could not find chaincode with name 'landregistry'
at _checkRuntimeVersions.then.catch (/usr/lib/node_modules/composer-rest-server/node_modules/composer-connector-hlfv1/lib/hlfconnection.js:806:34)

ch0235 • Apr 18 '18

hello, I'd like to run "traceReplayTransaction" api with parity client node, which current cli param I used is "--pruning=archive". But the sync data size is too large, more than 1.0T . So I want to use the param "--tracing on --fat-db on" to instead and redo it.

The question is whether I can still run "traceReplayTransaction" api by using the param "--tracing on --fat-db on" ?

If not, could you please tell me the best optional param? thank you!