DEV Community

Cover image for Using a Merkle tree for validating off-chain data on IPFS
josevelez.eth
josevelez.eth

Posted on

Using a Merkle tree for validating off-chain data on IPFS

I've spent the bulk of this week struggling with the challenge of building a minting contract that requires validation of off-chain data. Smart contracts are still something new for me as I continue my nascent journey into web3.

So I am sure that the solution evaded me more out of being a newbie than anything else; but the fact remains that it has pushed me tremendously. It has also prompted me to dig into the very interesting concepts of oracles and Merkle trees.

The Challenge - storage and validation

It might help to give some context of my challenge.

I am working with the team at NFgenes to bring the genome to web3. It is a very interesting project that is aiming to be an early entrant into the decentralized science (DeSci) space.

One of the initial project goals is to release a list of the ~20,000 unique human genes. (Sidebar, there are actually ~40,000 unique human genes, but only half of them are protein encoded. The genesis list will consist of these protein encoded genes)

This list will serve two main functions:

  1. goal to become the default source for the research community to rely on as an accurate and current list of unique human genes
  2. kick off the genesis collection of NFgenes NFTs, where each NFgene will represent one of the unique human genes in the list.

We recognized early on that storing the list of genes in our minting contract was not going to work. Smart contracts, while theoretically capable of storing large amounts of data, are not suited for this purpose; and in fact could be prohibitively expensive to do so.

Ignoring the technical limitation of likely hitting the block size limit attempting to deploy a contract that stores ~20,000 values, the expense of simply storing those values with OPCODE SSTORE could add over $100,000 USD to the contract deployment cost (using current ETH prices).

IPFS was the natural solution. Once we got the storage solution squared away, we hit the next challenge - off-chain validation.

With the list of available genes to mint off-chain on IPFS, how were we going to build a smart contract that could allow minting only those genes in our list, while also preventing a bad actor from passing a bogus name or value and delegitimizing our project in the process?

We started to put our heads together and talk about different solutions.

Exploring solutions - backends, oracles and Merkle trees

the solutions - backends, oracles and Merkle trees

We considered a custom backend that would interface with the web3 client, the list on IPFS and the contract - taking all of the validation and handling it outside of the contract. We recognized this was not ideal, as it would heavily centralize the project and require backend work that could possibly be avoided.

We started to explore the possibility of using an oracle like Chainlink to allow the contract the ability to be involved in the validation process. This was an interesting side quest for me personally, since I knew nothing about oracles in general.

For about a day or so, I was convinced that this was the route we needed to take. It satisfied our requirements for maintaining decentralization and having the minting contract handle validation. However, it became clear that this solution brought a lot of extra complexity and costs, and it still was not fully clear how we would implement a practical solution.

Eventually, we skipped the idea of using Chainlink and instead considered using our own pseudo oracle, which was a fun little thought experiment; but getting into the details of implementation, we realized that there had to be an easier and better way.

At some point in the depths of our despair, it dawned on us that a Merkle tree could be our solution. Aside from being fundamental to various computer science applications and blockchains in particular, they have seen widespread use in airdrops and NFT whitelists.

After spending some time getting familiar with Merkle trees and seeing how they were being used for whitelists and airdrops, it seemed like our requirement for off-chain validation would be a natural fit. I got busy building a proof of concept implementation.

Implementing a Merkle Tree for off-chain validation

Before going into the details of the implementation, I want to emphasize that this is my first go at using Merkle Trees and I am still getting my feet wet with solidity. I expect there to be factoring opportunities and bugs.

My primary goal was just to build something that proved that this application of Merkle Trees would work for our off-chain validation needs.

All of the code for this can be found on the NFgenes Repo:
https://github.com/nfgenes/merkletree_generator

The image below briefly describes what the relationship would look like between our gene list, the smart contract, the Merkle Tree and the proof.

Merkle tree solution

The steps would be as follows:
(1) Compile the list of NGgenes and generate a new Merkle Tree based on that list. This will provide us the root hash as well.

Image description

(2) Deploy the VPBM smart contract that would verify a proof before allowing the minting of an NFgene. The constructor will require as an argument the current root hash when deploying.

(3) We will need to select any arbitrary gene and try to mint it. We can run the generateTreeSummary script to get a JSON object list containing each gene.

Image description

(4) With the index value of the selected gene, we can now generate a Merkle Proof with the generateMerkleProof script. The script will create a file with the information we need as well as log it out to the console.

Image description

(5) Now that we have our deployed contract, a Merkle Tree from our data, and the generated Proof, we can submit our proof to the smart contract for verification.

The video below shows the process for a Rinkeby deployment using Remix.

Video Walkthrough

Naturally, a web3 frontend would be built around this workflow. For example, the selection of a gene to mint, the generation of a proof and the subsequent submittal for verification could all be handled within a frontend using React and ethers library.

I would be remiss if I did not give a huge shout out the these resources below that where VERY helpful.

Top comments (0)