DEV Community

Cover image for Itheum WhitePaper
Itheum
Itheum

Posted on • Updated on

Itheum WhitePaper

Abstract

Itheum is the world's 1st decentralised data brokerage. The platform transforms your personal data into a highly tradable asset class. It provides Data Creators and Data Consumers with the tools required to "bridge" highly valuable personal data from web2 into web3 and to then trade data with a seamless UX that’s built on top of blockchain technology and decentralised governance. We provide the end-to-end platform required for personal data to be made available in web3 for the first time in history and enables many more wonderful and complex real-world use cases to enter the web3 ecosystem. Itheum provides the core, cross-chain web3 protocol required to enable personal data ownership, data sovereignty and fair compensation for data usage - and this positions Itheum as the data platform for the Metaverse.

Editor’s Note: This blog is not meant to be investment advice nor a solicitation for acquisition of Itheum's tokens. Full Disclaimers are available at the bottom of this document




Preface

This is the “Lite WhitePaper” of the Itheum Platform. This paper is worked on so it’s in DRAFT format. Once a peer review is complete; a formal Whitepaper will be released.

A technical "yellow paper" is also being actively worked on and will be released prior to our launch.

Introduction

The Problem Today

Everyday, billions of people give away their personal data to organisations in return for some service or product. They sign up to apps, websites, social networks, make online purchases, use digital banking for their day to day transactions, use wearables to monitor their health and use countless other digital services run by commercial profit-seeking organisation who absorb their personal data and put it into locked up data silos.

These organisations then use your personal data to learn more about you and people like you. They then create more products and services for you to buy and get hooked onto to… they call this "stickiness"; and their objective is for you to keep coming back and to share more data.

Some organisations (usually the largest and most influential) can even use your data to influence your thinking and ideas or even resell your data to Data Brokers (independent commercial organisations). These Data Brokers lurk in the shadows and "deal in the personal data trade" making millions of dollars of profit by packaging and selling your data to other organisations.

At Itheum, we believe that YOUR DATA IS YOUR BUSINESS!

Itheum’s Solution

Itheum wants to change this current toxic model for personal data collection and exchange and level the playing field — where the organisation and “you” (the data creator) equally benefit from the trade of personal data.

Itheum is a complete platform that aims to provide 2 fundamental products (that when used together) will flip the dynamic of personal data collection and exchange.

Product 1: Tools for structured and rich personal data collection and analytics

Organisations can use the Itheum “data collection and analytics toolkit” to seamlessly build apps and programs that can collect structured and rich personal data from users and and also provide visual trends and patterns on the collected data (usually using fully anonymous or semi-anonymous analytics to protect the “data creator” — i.e. the user who provides the data)

With this product, Itheum provides real-world value and adoption — and in return we generate highly structured and outcome oriented personal datasets (i.e we normalise what user data looks like across organisation silos)

This product is essentially a fully featured personal data and analytics platform offered as a PaaS (platform as a service) to organisations. For example:

  1. A health and wellness company can use the Itheum data collection and analytics toolkit and embed it into their apps; enabling the collection of health data like blood pressure, fitness/activity, sleep quality and then visualise trends and patterns of their app user-base.

  2. A financial organisation can use the Itheum data collection and analytics toolkit to collect scheduled data via customised surveys or questionnaires, triggered by certain actions related to spending patterns — the trends and patterns can then give them more context of their customers’ spending habits.

Itheum already has multiple apps and programs using the collection and analytics toolkit — your can read about these real-world case studies here

With this product offering, we are enabling the creation of highly-structured, outcome-oriented and normalised personal datasets (for e.g. have a look at the data that is collected as part of the OKPulse program built on Itheum)

… and this leads us to our 2nd and more important product offering.

Product 2: Decentralised Data Exchange (DDEX) for the free, open market trade of your personal data

This product enables you (the owner and data creator on personal data) to own and trade your personal data that was collected by the organisations who built on the Itheum “data collection and analytics toolkit” product. It unlocks the personal data form these organisation silos and lets you earn a passive income by trading your personal data on the open market with other organisations who can derive value from the datasets.

So in those same examples given above:

  1. If you use that health and wellness company’s app, you get to own your own data and then sell the health and wellness data you collect as part of your subscription to that app using the Itheum data DEX.

  2. If you use that financial organisation product, get to own your own behavioural data and sell it as your choose to anyone else on the Itheum data DEX.

As your personal data was collected as part of a real-world product or service, it’s highly-structured, outcome-oriented and normalised — which means it’s highly valuable to many organisations across the world looking to build high quality datasets to power their machine learning backed business analytics engines.

This is the real value of Itheum and our key differentiator when compared to other blockchain based data platforms or self sovereignty solutions. They struggle to gain any real momentum as their products cannot effectively be used in the real-world. Itheum wants to focus on real-world adoption and our 2 products are a balanced approach to adoption and decentralized self sovereignty.

Using the Itheum Data DEX, you can sell your data via a peer-to-peer, direct sale method or use our Data Coalition technology and align to a decentralised entity (backed by a DAO) which will sell your data on your behalf (whilst acting on your best interests) and compensate you after the successful sale of the data. Data Coalitions are explained in more detail in a below section, but they are a bit like decentralized credit unions - View a view on Credit Union Philosophy— who by representing many members (DAO stakers) have the collective bargaining power on who can buy your data and for what it can used for (i.e. they trade on behalf of their members interests)

The Data DEX is extremely powerful; it allows you to secure sensitive personal data in secure data vaults, it allows you to wrap your data as a NFT so you can earn royalties on any re-sale of your data (all these revolutionary features are provided in detail in below sections). It’s also fully cross-chain compliant (all EVM basked blockchains).

In summary; Itheum converts personal data into a new asset class that can be traded in the new blockchain based economy. It provides the complete package - data collection and analytics toolkits that increase the quantity of high quality personal data and the decentralised cross-chain personal data trading platform. Enabling the creation + exchange of high value personal datasets.

So what are you waiting for? Read on to learn more about the Itheum platform and join the Itheum community and be part of the real-world revolution in personal data sovereignty technology.


Multi-Chain Strategy

The Itheum Data DEX will work across all EVM compatible chains from day one. We already have Ethereum and Polygon working and will be deploying into BSC and Avalanche shortly. Currently data advertised on a particular chain can only be visible to participants who are one the same chain. i.e. a buyer can only buy data advertised for sale on the same chain.

We are working on cross-chain advertising and purchasing of data facilitated via cross chain oracles that will allow for the transaction handshake to be done between chains and allow for the data to be verified and transacted as well as for the mint/burn of chain native tokens to balance the chain specific token repositories (i.g. burning of Polygon tokens and minting of BSC tokens and vice versa as data is transacted across chains).

Demo video of the Itheum Data DEX working on Ethereum and Polygon


Cross Chain Tokens

The core Itheum Data DEX token is an ERC20 token called MYDA (short for MYDAta). The MYDA token will allow for the purchasing of data from data creators as well as from Data Coalitions. The MYDA token will also play a role in staking against a Data Coalition and for decentralised governance of the Data Coalition’s responsibilities and actions.

As the Itheum Data DEX operates cross-chain, there will be 3 core tokens that represent each chain.

  • MYDA — Token deployed on Ethereum
  • mMYDA — Token deployed on Polygon/Matic
  • bMYDA — Token deployed on Binance Smart Chain (BSC)
  • aMYDA — Token deployed on Avalanche

MYDA can be moved between chains via native ERC20 bridges that already exist. For example, you can use the native Polygon <> Ethereum token bridge to convert you MYDA to mMYDA and vice versa.


Itheum Token

The “primary token” will exist on the Ethereum blockchain (Note that this is subject to change as we continue weighing the pros/cons of cross-chain adoption — as low transaction cost for trading of data is very critical for adoption of a platform like Itheum, we may move the primary token to a side-chain like Polygon, BSC or Avalanche).

The primary token will have the token symbol MYDA and the side-chain tokens (called Side Tokens) will have a prefix character in front of the token symbol to identify it (e.g. mMYDA, bMYDA).

Token Utility

The MYDA token is a pure “utility token” as without owning MYDA you will not be able to use many features on the Itheum Data DEX to facilitate the open exchange of personal data.

At this point, the following tasks will REQUIRE the MYDA token (or corresponding Side Tokens) — as we design and write our smart contracts we will also take a MYDA 1st approach to aligning incentives so we can ensure that MYDA’s position as a utility token can’t be disputed:

1) Gaining Access rights to use Data Packs / Data Streams / Data Coalition Data Pools

A Data Creator (who is the original Data Owner of the data) will allow an Access Requester to access their data via a transfer of MYDA between the parties. MYDA is essentially the key to use the data from an authorization perspective. The primary goal would be use the transaction of MYDA (between the 2 parties as logged on-chain) as:

  • Evidence of access rights being granted by the Data Owner to another party
  • To trace and have evidence of the lineage of data access
  • To trace back access rights to a Provenance (Data Creator)

If the sale is done via a Data Coalition (i.e. as a bulk pool of data), the MYDA is transferred from the Access Requester to the DC DAO and then distributed to the members of the DC as follows:

  • To the Data Creators to indicate access rights to use the data (as mentioned in the above section)

  • Based on the relative bond/stake token contribution of the board members, members and general stakers to highlight traceability of the effort spent by these parties to coordinate the bulk sale, flag and signal data quality and accept risk of remediating and mediating contentious sales. The transfer of MYDA to these parties is also used as lineage and audit to trace all parties involved in the data transfer process.

See point number (3) below for more details…

2) Stake MYDA to have relative voting rights in the Itheum Foundation (IF) DAO

The IF DAO will vote on "Proposals'' on how the MYDA Treasury will be spent to further Itheum's community development and roadmap.

3) Stake/Bond MYDA to create a new Data Coalition (DC) (which is governed by a DAO)

By doing so; you receive relative voting rights in return to manage the operation of the Data Coalition DAO (DC DAO). Staked MYDA goes into the "DC Fund Pool" - which is then used for arbitration and dispute resolution in contentious sales of data. The DC DOA votes on "Motions" related to the bulk sale of data via the DC.

The following parties can Stake/Bond MYDA:

  • Board Members will bond MYDA to flag their commitment to the DC and to act in the best interest of the Members. The bond is locked in for until the Board Member leaves the DC.

  • Data Creators or Contributors (called Members) can link their data to a DC and also stake some MYDA to flag that they have genuine interest in supplying good quality data to the DC.

  • Anyone (even those who do not want to provide data themselves) can stake MYDA into an existing DC DAO. They do this by becoming a "data quality verifier" (resulting in a Crowd Sourced Data Curation dynamic), effectively also signalling the "genuineness/credibility" of a Data Coalition (similar to how credibility of a node validator in a DPOS gets signalled by the community who delegates their stake with them). Everyone who stakes/bonds MYDA into are all incentivised relative to their role and stake and earn micropayments after each sale is finalised. Learn more about the [Data Coalition DAO design here][#data-coalitions-1]

Token Metrics

Please note that the following details are likely to change as we are currently planning for our token launch and as we adjust our token offering based on our planned token utility forecast.

Total MYDA Supply: 1,000,000,000

Token Generation Event (TGE) Offering:

  • Hard Cap on Launch: 6.5 million
  • Liquidity on DEX: 600k
  • Total MYDA offered (24%): 240,000,000 (seed + private = 232,500,000; Public IDO = 7,500,000)
  • Price per MYDA: 0.08
  • Initial Market Cap: TBD

Image description


Selling Data

The DEX allows for the seamless sale of personal data by “data creators”. The sale, verification and ownership accreditation of the data is handled on-chain but the actual data being sold is kept off-chain.

There are a few reasons for this:

  1. On-Chain storage of large datasets is not feasible. Blockchains are not built for the storage of data and doing this will be costly and also pollute the blockchain.

  2. On-Chain storage of whole data or segments of data can also lead to privacy and data sovereignty issues as the blockchain is a fully open and transparent tool.

As such the sale of data is facilitated via a hybrid on-chain/off-chain model.

  1. Data is first validated and hashed. The validated and encrypted dataset is uploaded to a centralised, secure storage location to a hidden (non-public) destination.

  2. It’s important to note that the above “storage location” is centralised. rather than being decentralised via IPFS (for example). Again there are reasons for this — this mainly has to do with the data sovereignty issues of data where certain countries or regions require data to remain within geographical boundaries. We are working on a solution that allows for decentralised, region based storage or via validated, decentralised node based storage. This will be handled as part of the Data Coalitions concept where privacy, security and regulatory requirements for storage of data is handled via a delegated, authorised coalition who will manage these “under the hood”

  3. The hashed value and the identifier for the location is stored on the blockchain.The availability of this new data for sale is then “Advertised” — allowing for this new dataset to be “discovered” and then purchased via on-chain facilitation.

At this point in time, there are “two groups” of data you can sell on the DEX. They are as follows:

1) Selling Core Itheum Data

The core Itheum platform is a general purpose personal data collection platform where organisations can build applications (which require structured user data) using the core toolset that Itheum provides.

Examples of these structured data collection applications are:

The applications are called “programs” within the Itheum platform and these programs can onboard their own “end users’. These end users generate a lot of personal data and have a “Itheum user portal (called CareView)” for them to have full visibility of the data they have provided.

This demo video shows you an example of the end user portal (CareView)for OKPulse (https://www.youtube.com/watch?v=ITONnseBFV4)

The CareView portal allows the end user (data creator) to link their Ethereum, Polygon, BSC or Avalanche account to their Itheum platform account. Once they have done this they can then use the Data DEX to load the raw datasets they have generated and put them up for sale.

As the data collected via a Itheum program/application is fully structured and outcome oriented, for example — Red Heart Challenge data is centred around the self management of Cardiovascular disease management and OKPulse is around the proactive monitoring of Employee health and wellness issues — it is very valuable for a data buyer/consumer who wants to align analytics discovery or outcome analysis around a certain topic.

When this data is grouped together with multiple people who have joined the same program/application via a “Data Coalition (see below)”and augmented with personal data via the “Data Vault (see below)” then the value of the data can grow exponentially as the quantity and quality of the datasets grow.

2) Selling Any Arbitrary Data

The Data DEX also allows you to sell any arbitrary data using the same on-chain facilitation process. This allows anyone with a crypto wallet to upload and sell datasets via the DEX. This reduces the barrier of entry for end-users and also provides them equal opportunity to participate in the shared data economy. At this stage we allow for the sale of the following arbitrary datasets.

2.1) Facebook Profile Data

As you may know, Facebook opened up the option for individuals to download all their data in “bulk”. This was a feature that was released by Facebook after pressure from data advocacy regulators who wanted to ensure individuals had the right to download their personal data off the Facebook platform should they ever want to delete their Facebook account or move to a new social media platform.

The Itheum Data DEX will allow for the above later scenario (of moving to a new social media platform) to be handled seamlessly. For example, an individual can download all their Facebook data and advertise it for sale on the Data DEX, they can then align with a Data Coalition centred around the responsible usage of data by Social Media type organisations. A new social media platform can then view the “Facebook user datasets” managed for sale by the Data Coalition, agree to the responsible terms of use and purchase the data in bulk. The new social media platform uses the data DEX to bootstrap and migrate Facebook users to its platform directly via an authorised, delegated owner (Data Coalition) of the end-user data. The end-user (original Facebook user) then gets the payout from the Data Coalition and can also view the complete audit trail of data transfer between the social media platforms.

2.2) Any Other General Data

The data DEX also allows for the on-chain sale of any other type of data. For example, you can create a dataset of all the brands and products you, your family and friends like, all the reviews and ratings of your favourite restaurants, personal fitness or other wellness data. You can also create and sell other interesting IP centric general datasets like utterances to intent mappings which can then be used by organisations to train NLU and speech to text applications. For example, a hospitality booking organisation (who provide telephony services or conversational tools like chat bots to help users make a booking) might already be spending a lot of investment in continuing to study end-user speech-to-text patterns and map them to intents that are relevant to their industry. They can sell the utterance-to-intent mapping data on the Data DEX to recoup some of their spending in producing this IP.


Types of Direct Sale

As described in the above section - you can use the Itheum Data DEX to sell personal data using on-chain tools that coordinate the trade between buyer and seller. On the Data DEX, there are fundamentally two types of direct sale that can occur. We mention “direct sale” here as it’s the seller (Data Creator) who decides which type they prefer and initiates the trade process within those constraints. In a later section we will describe how data can also be sold indirectly via a delegated Data Coalition, but for now let’s focus on the following two direct sale types.

  1. Sale of Data Packs
  2. Sale of Data NFTs

let’s now dive into these two types individually and understand the difference.

Sale of Data Packs

Once a Data Creator decides to sell their personal data, the default method of sale is the sale of Data Packs. Data Packs hold a reference to the type of dataset they are putting up for sale and contains some metadata around it. For example, it provides a preview of the data, when it was created, terms of use, the link to where the data can be securely downloaded from (after sale is complete) etc. Once the user creates a Data Pack it gets advertised for sale ‘on-chain’. Once this on-chain advertising process is completed, the original “data hash” and the “transaction reference” of the advertising process and the data pack is also stored as metadata against the Data Pack. A MYDA price is automatically calculated based on the type of data and market demand for the type of data. In future, we will also support the Data Creator picking their own price for their data. Sale of Data Packs can be described as direct peer-to-peer sale.

Once the on-chain advertising process is complete, buyers will see the Data Packs for sale in the “Buy Data” section of the DEX. The buyer can the pay the MYDA cost for the data and then “own” a copy of the data (these copies are called Data Orders and appear in the Purchased Data section of the DEX). As part of the buying process they agree (on-chain) to abide by the “terms of use”. A record of this agreement is stored on-chain as part of the purchase and serves as an immutable audit trail for this agreement. One key point to note is that in this type of sale — the re-sale of data is NOT permitted.

A data creator should choose to sell data as a Data Pack if they are have the following requirements:

  • Sell multiple (potentially unlimited) copies of their data to buyers around the world
  • Don’t want their data to be re-sold by the buyer (i.e. the buyer can only use it as per the terms of sale and for their own use/consumption — if a buyer breaks their agreement and resells the data, the owner will have the ability to detect this and mediate a conflict resolution process — but only if the sale is handled by the Data Coalition — more on this below — but the seller unfortunately won’t have a direct method to track the re-sale or get any benefit from it. e.g. royalties).
  • Only allow for their personal data to be sold on the Itheum Data DEX (no other decentralised marketplace will display the Data Pack for sale — this is because it’s not built on any open blockchain standard like ERC 20 or ERC 721 that allows for interoperability). This can be looked at as a benefit if the Data Creator wants to limit exposure of their data (as it’s for sale in only one marketplace)

Sale of Data NFTs

A Data Creator can also choose to sell their personal data as a NFT (Non Fungible Token). This makes a lot of sense as personal data is very unique and NOT fungible (watch this short video to understand the difference between (fungible and non-fungible)[https://www.youtube.com/watch?v=OXCJxy0f4Ic]). This allows for the Data Creator to have more control of their data and can align to NFT features that make their data grow in value (rarity/scarcity) and increase exposure of their data assets due to the interoperability and portability NFT standards have in the blockchain ecosystem.

Data NFTs are described in more detail below and use the ERC 721 open standard to coordinate and facilitate the NFT contract between parties involved in the trade.

A Data Creator should choose to sell data as a Data NFT if they are have the following requirements:

  • Sell a “limited number” of copies of data: This is the same concept as the limited edition NFTs we see in the market today. Having a limited number of copies of the data will create organic growth demand for the data as the buyer will realise the rarity and scarcity of the data. For example, if a Data Creator “only mints 2 copies” of their DNA sequencing result — this has a lot of value to a buyer trying to build a dataset of similar DNA sequencing results as there are only 2 copies available for sale. If there were virtually unlimited copies available (as you would have with Data Pack based sale) — then the perceived inherent value for the data will be less.
  • Allow for the “re-sale” of data: The data creator is happy to allow for the re-sale of the NFT packaged data by a buyer. This opens up a lot of opportunity for the data to grow in value as the buyer might have a better ability to “market the data”. This also opens up opportunities for secondary markets where “verified, legitimate data brokers can exist on-chain” — a revolutionary concept and a futuristic solution to the problem that exists today where centralised data brokers are selling your personal data without your knowledge.
  • Benefit from the “re-sale” of data via Royalties: Your data is essentially your IP (much like a song by a music artist or a book by an author) and packaging your data and selling it as an NFT will allow you to earn a Royalty on the resale of the data. For example, you can choose to nominate a 10% royalty condition and if a buyer re-sells the data, 10% of the sale will be transferred to your account.
  • Benefit from multiple NFT marketplaces: As Itheum Data NFTs are built on the ERC721 standard, they immediately have interoperability with all the NFT marketplaces that support this standard (OpenSea, Rarible etc). This significantly increases the audience and therefore increases your potential to sell.

Demo 1: minting and selling Genomics data as a Data NFT



Demo 2: minting and selling Blood tests results as a Data NFT on OpenSea


Buying Data

One-off datasets advertised for sale on the Data DEX are called “Data Packs”. As described in the above section “Selling Data”, there are various types of data that can be put up for sale on the DEX. On top of the one-off data sets, the Itheum Data DEX also allows for the sale to “Data Stream subscriptions” (see in a below section titled Data Streams)

Data is sold via two channels:

  1. Direct between Data Creator and Buyer
  2. Via an intermediary, Authorised Data Coalition

Anyone with a crypto wallet can become a buyer of data packs or data streams under certain conditions.

  1. They need to have MYDA in order to pay for the data. The MYDA is sent direct to the data creator (if the sale is direct between data creator and buyer) or to a Data Coalition if the Coalition is “brokering the sale”.

  2. Each data pack or stream will have an associated “terms of use”, the buyer agrees to abide by the nominated use. There will be dispute and conflict resolution processes in future to protect the seller from misuse. (if the sale is direct between data creator and buyer)

  3. If the purchase is via an authorised Data Coalition, then the buyer needs to adhere to the terms and condition of use as per the Data Coalition and also put in collateral in the form of MYDA for a certain period of time (until the buyer earns a higher credibility score) – data sold via a Data Coalition has a more robust misuse remediation and dispute resolution process handled via decentralised governance.


Enabling Personal Data "Proofs" within Smart Contracts

In the decentralised DApps ecosystem (DeFi, DAOs or any other application use cases enabled via DApps); Smart Contracts enable for agreements between parties to execute based on "indisputable truths". For example, in a DeFi exchange, a trade transaction between two parties can happen backed by the on-chain state data that confirms that the transaction can indeed proceed (e.g. party A has the tokens to transfer and Party B is entitled to receive the tokens based on some pre-agreed above condition). So Smart Contracts enable for trust-less interaction to happen between multiple counter-parties. Traditionally, transactions such as these in finance require a trusted intermediary such as a Bank to coordinate.

But Smart Contracts do have a key limitation in that the data they have access to on-chain to facilitate such trust-less behaviour is very limited. We only have data around transaction history and other such on-chain information (e.g. voting outcomes etc). For more complex trust-less applications to migrate to the blockchain, we need "richer, real-world data" to flow into Smart Contracts. ChainLink provides this real-world data via their Decentralised Oracle Networks and enable the technology of "hybrid smart contracts". These hybrid smart contracts use the real-world oracle data to make decisions and carry out transactions between parties. ChainLink enables smart contracts to tap into real-world data like exchange price feeds, weather or sensor data and allows for the contracts to mediate transactions between multiple parties based on outcomes resulting from smart contract code execution backed by real-world data.

But if Blockchain technology was to branch out and handle many more mainstream use-cases, we will need a wider variety of real world data to flow into the smart contract world. One type of data in particular that is not available today (even via Chainlink oracle networks) but is crucial for richer blockchain use-cases - is Personal Data.

As described in the above section titled "Types of Direct Sale" - the Itheum platform enables Data Creators to place their personal data for direct peer-to-peer direct sale on-chain. One of the underlying qualities of this type of trade is that a "proof of the personal data" is stored on-chain. This proof is then used (by a Buyer) to verify that the data's veracity is untampered before the trade actually happens.

This features enabled by the Itheum Data DEX can also be used as a "personal data proof" by smart contracts to execute specific rules that enable transactions on-chain to happen that are backed by personal data.

This revolutionary concept is best explained with an example:

Home Loan Application - Real Word Scenario
Jack wants to buy a house. He heads over to his bank and requests for a bank loan. The bank needs to carry out a detailed assessment to make sure Jack is actually eligible for the loan (i.e. can he make repayments? what is the risk he will default on it? etc). The bank does a credit check on Jack and finds that his credit history is good. The bank's home loan broker then does an extended personal due-diligence assessment. Jack is asked to provide his financial history, income history and other details about his spending habits, family, dependents and work history. This information is collected using a detailed bank loan application form the bank has prepared as a standard document. Once Jack fills the form he needs to attest the form as holding true information (via a form of statutory declaration that is legally binding). The bank assesses the details on the form and makes a decision to give Jack the loan. Jack uses the loan and buys a house.

Now let's imagine that we want to port this entire scenario to the blockchain and remove as many intermediaries as we can (the bank, the credit history provider, the home loan broker etc.)

Home Loan Application - DeFi Scenario
Jack wants to buy a house. He visits a DeFi Lending DApp that is a DAO (modelled after a real-world credit union but fully decentralised) that allows for borrowing based on voting based approval and on-chain evidence of collateral (e.g. other assets that Jack owns in the same DeFi DApp or other DApps). Jack requests a loan and the DApp begins its automated due diligence process. As part of this process; the DApp invokes another DeFi DApp's Smart Contracts that allows for deep credit history checks (possibly via deep index lookups or via the Graph for more advanced lookups). The credit checks come back positive for Jack. The DeFi Lending DApp then refers Jack to an application form built and run using Itheum's data collection and analytics toolkit. It includes all the regular questions asked by banks during loan applications that needs to be asked to an applicant directly (e.g. what is your employment history? your spending habits? your family details and other financial responsibilities? basically all the personal information that is needed to make an informed decision about a borrowing risk but cannot be obtained using blockchain lookups). Jack completes the form and his responses are stored inside the Itheum Data DEX as a "Data Pack". Jack then Advertises this Data Pack on-chain and the proof of his response is published on-chain and sent to the DeFi Lending DApp as a "personal data proof" and attestation to his responses to the form. The DeFi Lending DApp now has all the information it has to take his application to the final DAO voting panel. The members of the DAO have all the attested information and proof to make a decision on the home loan (collateral confirmation, positive credit history, application form due-diligence responses and on-chain proof). The DeFi Lending DApp is happy with the application and approves the loan and Jack receives the money.

Itheum provides the complete platform for these kinds of personal data proofs that can be used for on-chain decision making. Think of Itheum as the next layer of data inflow into the blockchain world. Core blockchains provide transaction data, ChainLink provides real-world event data and Itheum provides personal data proofs direct from the end-user. When used in unison; we can replicate many real world scenarios using smart contracts and remove redundant intermediaries.

Watch more real-world use cases and code demos


Our 5-stage Product Development Process

To ensure we continually innovate and delivery tangible web3 / blockchain features to market - we will use a simple 5-stage product development process. All our features will continually be categorised as per these stages to ensure progress is transparent to our community.

  1. Research : Labs - Ideas we are running through R&D.
  2. Detailed Design & Prototyping - Ideas that pass our labs stage and we are doing detailed solution architecture with release candidates for the testnets.
  3. Available in Testnet - Iterative builds being released to testnet, until product update is tested and signed off by our community and is ready for deployment to mainnet.
  4. Security Audits - Final release that goes through internal and external security audits.
  5. Available in Mainnet - Released to mainnet.

Data Coalitions

Independently selling personal data is inefficient and time consuming. Continuing to curate and monitor the “terms and conditions” for each sale as well as to keep track of what data will be used for and by whom will quickly become overwhelming.

Your individual data (both the longitudinal data from your structured programs and highly personal & sensitive data from your vault) — is also not very valuable “when viewed in isolation” — but when your data is “grouped” into clusters of similar people, it grows significantly in value as the volume and quality increases (e.g. your health data is worth > $1,500 if sold as part of a larger dataset). The grouped data then becomes useful for deep analysis or to train machine learning models for example. We believe that this is the future of how data will be sourced on the blockchain to train AI and for deep analytics.

Data Coalitions are DAOs where the "Creators" of the Data Coalition will bond MYDA to form and run it. The creators are called Board Members and they have an incentive to run the Coalition on the best interests of its "Members". Board Members have "stake in the game" with their bond and therefore will need to always act in the best interest of the Members. Board members will also earn a share of the trade, so it's in their best interest to keep their Coalition as robust as possible in order to attract new Members (and therefore more Data).

Itheum envisions a future where the most successful Data Coalitions will be run my enterprises and SMEs (subject matter experts like legal and regulation experts, commercial data warehouses, academic/research institutes, government departments etc) and will be the perfect balance between commercialisation of data and accountability to end-users (Data Providers). Board Members vote to agree on the terms and conditions and the governance (privacy and security) of the data trade, the parameters they agree to will be made visible to anyone who wants to join the Data Coalition. Users (called Members) can then align to the Coalition who they feel acts in their best interests.

You then delegate the ownership of your datasets, data streams and data NFTs. The Coalition will group your data into clusters of similar data and sell the data in bulk to a larger column of buyers. In return you can earn a steady value return on your data or choose to lock up your returns for longer term growth of the Coalitions network.

Data Coalitions also allow for "staking" of MYDA, where anyone can stake their MYDA with a Coalition (you don't HAVE to provide your data) to flag their support for the Data Coalition and to signal that the data within the Coalition is good (Crowd Sourced Data Curation), this allows for itheum to be used by users who want to participate in the personal data economy but who don't necessarily want to provide their data at that point in time. All parties involved in the Data Coalition (Board Members who bond MYDA, Members who share data, Members who stake MYDA are all incentivised relative to their role and stake and earn micropayments after each sale is finalised)

Itheum's Data Coalitions are modelled on the Credit Union Philosophy

Decentralised Board Members

As introduced above; Data Coalitions are formed and run by a virtual board — they have additional governance responsibility and can mediate / provide conflict resolution, negotiate terms of sale of data with real world entities and other Coalitions etc. Board Members have to bond MYDA into the Data Coalition to ensure they have a “stake in the game”, after which they can stand for election and be voted in by other board members. To prevent hard centralisation, Board Members will serve a fixed term (if required by the Members - it's not mandatory), and after which they will need to rotate out and be replaced with a new board. Board members earn a share of the sale in data (payed out in MYDA) that is housed within the Data Coalition. They can also lose MYDA in case they don’t represent their member’s best interests or conduct an incorrect sale of data (that breaks the terms of sale contract) and need to revoke it and pay back the buyers and compensate sellers for the damage. Although not mandatory, Members will be able to participate in ongoing period votes to express satisfaction of the Board's performance. If satisfaction rates are low for multiple voting points - this will trigger a board rotation clause.

Other Notable Properties of Data Coalitions

  • Data Coalitions enable "collective bargaining power" for end-users and will be a viable solution to the problem of centralised enterprise data silos that don't provide any value to the Data Creator.
  • DAO based governance and modified proof-of-authority based decision making will be involved in ongoing operations of the Data Coalition.
  • They will also be delegated custodians of “Data Vaults” and can autonomously trade high value data from the Vaults by attaching it to the other datasets within the Data Coalition.
  • They will also be able to link with the "Trusted Computation Framework" and facilitate the privacy preserving compute-to-data technology handshake, where 3rd parties will be able to run algorithms on the datasets housed within the Coalition without having the identity and privacy of the original Data Creators leaked.
  • They can efficiently facilitate "micropayments" to all its members in return for data. For example, a Data Coalition can have 1000 "members" who contribute data for bulk sale. After each sale, the 1000 members will be sent a share of the earnings via micropayments. Traditional banking payment systems are unable to handle these kind of micropayments due to the overhead of fees and charges - but crypto will be able to facilitate this well.
  • Data Coalitions in future will also trade with other Coalitions and be connected to autonomous machines via a machine to machine type interface. E.g. wearables or EVs who join Coalitions directly and participate in the data economy.
  • They can allow (if voted by members) for "anonymous cohort analysis" of data trends via tooling provided via our "data collection and analytics toolkit" feature. For example; there may be a data coalition setup for the collection of "fitness and demographic data" - where you, as a Data Creator can align to and sell your wearable data from Fitbit or Garmin as well as append specify demographic data from your Data Vault (e.g. Gender, Age, Ethnicity) to enhance it's value. Anonymous cohort analysis can then be enabled to visualise the type of data under the control of this Data Coalition. This adds more appeal to buyers who can preview data with more detail before committing to buying at a premium price.

image

Anonymous cohort analysis via our data collection and analytics toolkit integration

  • By default the Itheum Data DEX supports any data uploaded in valid JSON format. But there may be some specific data sub-standards that will be more appealing to certain types of niche buyers. For example, buyers who are interested in Health and Genomics data for automated ingestions into their systems - they will prefer a more globally interoperable standard like FHIR - Fast Healthcare Interoperability Resources Standard. Data Coalitions will be able to mandate this as a minimum requirement for its members and ensure that the data being contributed is in the FHIR JSON standard.
  • Anyone can start a Data Coalition but it will take some effort to progress it into an "operational mode" and attract new data to come under it's control. For e.g. if you start a new Data Coalition you will need to bring in Board Members with credibility and who will need to Bond MYDA for their term duration. Once you have filled the minimum requirements for the Board Members, the Data Coalition then enters "operational mode" and can begin accepting data and MYDA stakes from regular users (members). But being in "operational mode" is not sufficient to attract the best quality data; all details about Data Coalition Board Members are made public - so it's important that you have some commercial experience in data related matters to give you credibility. Any "slashes or disputes" arising from your Data Coalition's trade activity are also made public. This is very similar to how the Delegated Proof of Stake validator selection process works, where you can stake your tokens after doing some due-diligence on the validators reputation and past performance. So for a Data Coalition to be successful - it will need to be in a "operational mode" and have some credible "board members" whilst maintaining ongoing trade operational credibility.

This feature is currently in the "Detailed Design & Prototyping" stage.


Data NFTs

Data is an asset in itself and personal data is a “unique asset” as no two personal datasets are the same. Highly personal or sensitive datasets can essentially function as a NFT allowing for uniqueness and limited availability.

For example, your might want to share your DNA or partial sequencing results under the “research” terms and conditions — but you may want to limit how many buyers can purchase it and use it (controlling distribution)— data NFTs allow you to do this.

To make it more aesthetic to trade, we convert valuable datasets (usually this will be in JSON or data in any another interoperable format) to a unique visual representation of that data (which will be in a deterministic, random image abstract format) — to do this we will an algorithm for image generation based on the unique signature of the data.

Packaging and trading your personal data as NFTs have the following benefits (when compared to regular one-off trade using the DEX):

  1. Limit the distribution of your highly sensitive or protected datasets to a smaller amount (similar to limited edition NFTs we see in the market today)

  2. You can choose to earn royalties if your data is resold. You can limit the distribution of your data but should a buyer resell your data NFT, you have the option to earn a % as royalty. This is a game changer and can prove to be a steady secondary income stream for you. This is especially true if your data is curated into a data coalition which has a high buyer demand.

  3. Once data is packaged and tokenised as an NFT, it can be traded on any NFT marketplace (e.g. Opensea - we already support this interface in our testnet). This significantly increases the selling power of your data as the audience for your data increases. Read about it here on our blogpost Selling your Itheum Data NFT on OpenSea

  4. Your unique data will be minted with an aesthetic “generative art wrapper” created using the unique signature of the data. E.g. your DNA data can be represented as piece of unique art (similar to Autoglyphs or other generative art) which is sort after as it has rarity, creativity and actual utility.

What real-world trade characteristics do NFT wrapped data assets provide?

Let's look at some key trade characteristics we can get by wrapping our data as an NFT and then opening it for trade.

When working with NFTs in general, the main actors to consider are:

  • Issuing Entity: The issuing entity behind the NFT (in Itheum's case; the Original Owner or a Data Coalition can be the Issuing Entity)
  • Original Owner: The original Data Creator who choose to sell their data as a Data NFT
  • NFT Holder - The present holder of the Data NFT
  • NFT Purchaser - Someone with the intent of acquiring the Data NFT from the present NFT Holder

image

In this example, let's assume you want to sell your Genomics dataset as a Data NFT.

  1. You use the Itheum Data DEX to upload your data file and mint a new Data NFT. You are the Original Owner / Issuing Entity. The NFT is minted with a "proof of ownership" along with other metadata that enables access to the original data file if ownership is proven. All this metadata will be stored on-chain via the standard NFT metadata file schema standards.

  2. You then head over to a public NFT marketplace (e.g. OpenSea - where your Data NFT will already be available for sale under your wallet). You place it for sale for .05 ETH

  3. Buyer 1 comes along and purchases the Data NFT as they see value in owning a genomics/DNA based dataset. Transfer of ownership now moves from you to Buyer 1. They are now the NFT Holder

  4. But Buyer 1 does not intend to use the data for any utility, they are essentially a pure "data trader" and intend to resell the Data NFT at a higher price. They increase the sale price to 1 ETC.

  5. Buyer 2 comes along and wants to purchase ownership of the Data NFT as they intent to use it for their research into topics relating to Genomics. They are now the NFT Purchaser. As their data research requires for absolute truth-fullness in the data quality (e.g. they may intend to use it to train a ML model for disease diagnosis and need to guarantee that the training data has not been tampered with) - they choose to purchase this data via centralised blockchain based trade rather than buying it from centralised data brokers or sellers.

Let's look at the list of benefits each actor received by trading data wrapped as a NFT

Original Owner / Issuing Entity:

  1. Limit supply of the dataset to just a single item (the can always mint more if they see demand grow)
  2. During initial sale on the open NFT Marketplace (e.g. OpenSea), assign a royalty % that they get paid during all future re-sales
  3. Benefit from all the open NFT Marketplace features to control the sale (duration / minimum period, highest bid etc)

NFT Holder:

  1. Ability to speculate on future price of non-fungible data and earn profits for re-sale
  2. Build Data NFT collections based on market/seasonal demand for certain types of datasets

NFT Purchaser

  1. This actor gets the most value as they intend to use the contents of the Data NFT (raw dataset) for actual utility.
  2. They can view and prove the provenance of the NFT using on-chain lookup and clearly identify the Original Owner
  3. They can view and prove the lineage of the NFT using on-chain lookup (Original Owner -> Buyer 1 -> Buyer 2)
  4. They can view and prove the veracity (truthfulness/accuracy) of the NFT using on-chain lookup of the meta-data
  5. The can request formal transfer of ownership to own the IP if needed and is allowed in the original terms of sale. Although this feature is not an inherent quality of NFTs, it will be mediated via Data Coalitions and our "Decentralised Middleware" service

Data NFTs are currently in the "Available in Testnet" stage.


Data Streams

You can let buyers subscribe to “personal data streams” — unlike the on-off datasets that can also be purchased on the Data Dex, data streams will continue to feed data once a “subscription” is purchased.

Streams are a more powerful way for buyers to subscribe to longitudinal datasets that grow over time. For example, health and wellness data like activity, sleep quality, blood pressure or financial activity like spend habits etc.

When paired with context rich data from your “data vault” — streams become a valuable and steady source of passive income for you in exchange for your personal data.

This feature is currently in the "Detailed Design & Prototyping" stage.


Data Vault

You can store highly sensitive personal data in your data vault. For example; details about your gender, race, sexual preference, prior health conditions, financial history etc.

This sensitive data is encrypted using your own encryption keys (no one else can unlock and view it) and stored in IPFS (no one else can destroy it)

You can then choose to append data from your vault to the regular data you sell on the data DEX. As this gives the “dataset” more context, it becomes more valuable to the buyer — and you will earn more MYDA.

As the data is encrypted using user’s private key we need to enable a frictionless UX during trade between buyer and seller where keys need to change hands with minimum manual involvement between parties; For this purpose, “symmetric key pools” (decentralised middleware) are used to enforce secure authorization between seller and buyer in real time. Symmetric key pools operate using a modified proof-of-authority mechanism to enforce the highest security with balanced decentralization.

Other Notable Properties of Data Vaults

  • Their design will enable true data sovereignty via a "proof of ownership" based design architecture. Detailed technical design of data vaults will be released in our "Yellow Paper" but at a high level - all datasets that include sensitive data will be encrypted with keys that belong solely to the data creator/owner. If a copy of the data is given to a new buyer, decentralised middleware will be used to mediate the handover of the copy to the new owner with encryption handled behind the scenes (to ensure UX is seamless). But in the case of a Data NFT, where the actual ownership of a data asset can move from one party to another - the "proof of ownership" will also be transferred. This process will also be handled by the "decentralised middleware" service.
  • Data Vaults will enable a user to "opt out" of the system in the event they do not want to share their data anymore (e.g. a requirement in GDPR). This is achieved by the above mentioned "proof of ownership" protocol. Where the unique decrypting key can be "burned" which then effectively makes all decentralised copies of data (e.g. in IPFS or elsewhere) become untethered from the Data Creator. The data without its decryption key is now effectively just a blob of scrambled test without any identity or utility attached to it. There are of course challenges to this that we need to solve, for e.g. what happens if you sell your data and then change your mind after the sale? Do we allow for a recall of data sale? if so, how can we ensure a user can completely opt-out?

This feature is currently in the "Detailed Design & Prototyping" stage.


Trusted Computation Framework

As personal datasets under the control of Data Coalitions grow overtime, certain end use-cases may require access to highly sensitive, identifiable data — often these use-cases will provide the most “payout” for data usage (as such they are considered high value use-cases). In such situations a trusted computation framework can be used to ensure computation is handled off-chain with tamper-proof integrity. The Data Coalition will coordinate these computation jobs on-chain (with possible coordination assistance of (Chainlink’s Attested Oracles)[https://blog.chain.link/driving-demand-for-enterprise-smart-contracts-using-the-trusted-computation-framework-and-attested-oracles-via-chainlink/])

All personal data traded on the on-chain DEX is never stored on-chain — only hash values are stored to ensure integrity of traded datasets. But in certain advanced use-cases where Data Coalition’s manage the data of multiple users, there will be encrypted personal meta data stored on-chain. There will be cases where this data cannot be put on-chain even when encrypted due to privacy regulations, especially if the blockchain network is spread across multiple geographies. Off-chain execution is, in some cases, the only option for processing this data. Trusted Computation Framework can be used to localise the computation of the data to ensure the data storage and processing complies with all data sovereignty regulations.

The trusted computation framework is tethered to the “regional decentralisation hub” and is our "Compute-to-Data” solution for highly sensitive data processing requirements within high regulatory environments.

This feature is currently in the "Research:Labs" stage.


Regional Decentralisation Hubs

Highly sensitive data like medical data from hospitals, personal health records, financial transaction or credit history are protected by regional or local sovereignty laws. To unlock the trade of this data we cannot use fully decentralised global storage or compute. For example; we may want to limit trade, storage and compute on data to only the EU region so that it complies with laws like GDPR and yet prevent a central hoarding of these resources

Regional Decentralised Hubs are a novel idea we are exploring around regional decentralisation which balance legal sovereignty laws with personal data sovereignty.

This feature is currently in the "Research:Labs" stage.


Decentralised Governance

The Itheum platform will aim to be a public-goods platform that's fully decentralised. Public-goods in a sense that all the technological developments that are made as part of Itheum's vision will always be available in the public domain and not privatised in any way. Although it will take some time to fully get there; especially in the area of our web2 data on-ramp/bridge technology (e.g. Data Collection and Analytics Toolkit). But it's worth nothing that our web3 technology stacks will be fully available in the public-domain from day one.

With our technology deliverables aligned to be made available as a fully public-goods platform, the next important aspect to decentralisation is to have our technology development roadmap DAO controlled with MYDA token holders being able to collectively and fairly decide on the technology strategy and direction of roadmap delivery. It's worth nothing that "fully DAO controlled" platforms are complex to setup and will require some platform operating maturity before implementing, but the ultimate intention of Itheum is full decentralisation and this will happen progressively over time in order to ensure the Itheum platform will remain in the hands of the public - but at the same time, be a robust operating technology solution that will be around for the next 100 years. Until the platform roadmap is progressively transformed into a fully DOA controlled element, the Core Team will manage the prioritised of the roadmap items with some pathways detailed on how the platform will transition to decentralisation. This is described below in the "Foundation DAO" section.

Once the Itheum platform transforms into having its operations and roadmap strategy DAO controlled, the intention is to have the native token (MYDA) be the governance token. New advanced yet seamless DAO schemes will be build around the MYDA token that will increase the utility of the MYDA token and to also ensure the best user experience for our platform users and token holders. People who own MYDA will be able to participate in DAO votes and collectively make decisions on the future of Itheum.

Itheum has 2 high-level forms of DAO schemes that will be implemented in our platform:

  1. Foundation DAO: This is the platform governance DAO that will eventually be responsible for the future direction of Itheum's technology strategy and roadmap.

  2. Data Coalitions DAO: Unlike the single Foundation DAO, there can be multiple Data Coalition DOAs. Each time a new Data Coalition is setup by the public and the structure reaches an "operating mode" it can be considered to be an independent DAO. The overarching parameters of the Data Coalitions DAO will be controlled by the Foundation DAO. So effectively, the Foundation DAO is considered to be a DAO of DAOs.

Platform Governance - Foundation DAO

As detailed above; the Foundation DAO will be responsible for the core platform's governance activities. Until the Itheum platform reaches the operational maturity required to fully decentralise the Foundation DAO, the Core Team will provide the governance in away that's fully open to public visibility and accountability.

Operational Maturity

In this section we will clearly detail what it means when we say operational maturity and this dictates when Itheum will progressively transition to a fully DAO controlled Foundation DAO. For an ambitious platform like Itheum to gain widespread mainstream adoption and deliver robust technology solutions in the midst of high competition from other commercial and public organisations - we will need to have fast iterative delivery of roadmap items an make objective decisions to ensure we get Itheum to the Mainnet in a state that makes it the number 1 data platform for web3.

It's a well known observation that putting in a fully decentralised DAO too early can slow down decision making and delay competitive timelines and therefore we need to be cognisant and pragmatics on how early we embrace full decentralisation, as failure to do so will put the entire Itheum platform delivery at risk and effect all of the itheum community and token holders. Once Itheum has been deployed to the Mainnet and the day-to-day operations of the platform are in a stable and controlled state and we are ready to move into iterative continuous improvements; we will then begin rolling out the Foundation DAO schemes and start opening up public voting for further roadmap upgrades. Such an approach will ensure the long term success of itheum and is in the best interests of the entire community.

Roadmap methodology

Itheum's roadmap has always been public and is available for everyone to view. It can be accessed here: https://itheum.com/roadmap

As seen in the roadmap board; all items that need to be considered to be included in the roadmap are added here and are voted on by the Foundation DAO to transition them to the appropriate lanes for delivery. We follow the Kanban methodology to manage the transition of items from inception to delivery. An idea is registered in the IdeaBox and can then move to Research and Development after the Foundation DAO voting agrees that the core-team can spend the time and effort to commence R&D activity on the task. Once R&D is complete and the team is ready to ready to begin design and estimation; the task will then move to the Estimating lane. The Foundation DAO then can decide when to schedule in the work. Once it's ready to start the development phase it will then move to the "Sprint Candidate" lane and then development and testing begins. The final lane is called shipped and an item moves there once it's deployed to production (and/or mainnet).

DAO Technology Schemes

The Foundation DAO feature is still under development but we detail our current design schemes goals below, please note that this is subject to change as we iterate on our design and we will continue to keep this section of the whitepaper updated with any changes and also inform our community via our channels of any changes.

  • The MYDA token also plays the role of a Governance Token. There wont be any other, dedicated governance token in Itheum. This ensure MYDA has more utility int the Itheum ecosystem and that there also wont be a proliferation of bespoke governance and other tokens in Itheum - this boosts the user experience of the platform as it keeps things simple.
  • Voting is by Quorum + Direct Democracy
  • MYDA holders can stake their tokens into the Governance contract and in turn they will be able to vote on proposals
  • The weight of each user’s vote is proportionate to the amount of tokens they have staked
  • Users can exist their stake anytime, but their vote will be withdraw if they exit during an active voting round
  • Proposals can be decisions to roadmap updates or changes to core platform parameters
  • Core Platform Parameters can be tasks like Update Quorum %, Approve new Data Coalition applications, Manage core Data Coalition parameters (min:max members / min fees to join), Setting Harberger Tax rate on Coalitions managing Data NFTs, Manage Key-pool parameters for Personal Data Vault nodes (max / min / rotation / bonds), Manage node parameters for Trusted Computation framework (max / min / rotation / bonds) etc.
  • Roadmap updates are basically proposals to move the development roadmap future (see above in Roadmap methodology section)
  • Potential issues with this scheme that we will need to design around are Governance Locks if quorum is not reached and Whale dominance

Data Coalition DAOs

Details about the Data Coalition DAOs are explained in the above section titled Data Coalitions, but we will summarise the DAO scheme below to lay out the various features of this component. Please note that this is subject to change as we iterate on our design and we will continue to keep this section of the whitepaper updated with any changes and also inform our community via our channels of any changes.

  • These DAOs are generated each time a new Data Coalition is formed and approved by the Foundation DAO to operate.
  • These DAOs are programmatically built via a factory contract that generates the base Data Coalition DAO according to the core parameters (these parameters can be altered later by the Board Members of the DOA).
  • Anyone can form a Data Coalition DAO by bonding MYDA. The creator is called the Chairman but they don't have any special rights. They have the same rights as Board Members.
  • Once the Chairman creates the DAO, it goes into a phase where new Board Members need to be recruited. Board Members also need to bond MYDA and be voted in by the other Board Members. This is akin to Permissioned entry where existing Board Members need to recommend you via a Motion
  • Only Board Members can vote on Motions; which follows a representative democracy scheme. A Motion can be anything that ranges from adding more Board Members to changing the parameters of DAO (min:max members / min fees to join, terms of sale, sale price etc) and also to agree on which purchase request to approve (i.e. who to sell data to).
  • Voting is by Simple Majority (no quorum needed), this allows for fast decisions to be made on potential new data sales.
  • All funds raised via bonds and stakes go into a DC Fund Pool which is then used for arbitration and dispute resolution in contentious sales of data.
  • The DC Fund Pool will be controlled by a Multi-Sig Wallet that will require a minimum set of Board Member signatures to process transactions.
  • Once the minimum Board members have joined the Data Coalition, it will enter into "operational mode" where it can start accepting Members.
  • Board Members will provide some public profile information to provide transparency on who they are, this is to provide some information for future Members to make informed decisions on if they should join the Coalition. This is very similar to how the Delegated Proof of Stake validator selection process works, where you can stake your tokens after doing some due-diligence on the validators reputation and past performance.
  • Anyone can link their data to a Data Coalition and join as a Member. They can also choose to Stake MYDA along with their data to provide more guarantee that they are aligned to the long term mission of the Data Coalition.
  • Everyone who joins the Data Coalition (Board Members and Members who contribute data) - start with a low reputation score that builds up over time with each successful data trade.
  • Data Coalitions also allow for "pure staking" of MYDA, where anyone can stake their MYDA into a Coalition (you don't HAVE to provide your data) to flag their support for the Data Coalition and to signal that the data within the Coalition is good (Crowd Sourced Data Curation). They are also considered to be a Member.
  • Members who staked MYDA and Board Members get a majority share of each sale. A minority share is available for pure stakers, data providers who did not “stake” and/or who have a low reputation
  • Members can exit anytime but Board Members need a Motion passed to leave or be replaced. Board Members also need to wait until bond period ends to exit.
  • Although not mandatory, Members will be able to participate in ongoing period votes to express satisfaction of the Board's performance. If satisfaction rates are low for multiple voting points - this will trigger a board rotation clause.
  • Although not mandatory, Members will be able to expect the Board to have a fixed term, and after which they will need to rotate out and be replaced with a new board.
  • A Data Coalition can only be shut down if it's not operational, if it does not have any outstanding disputes to resolve, all Board Members and Members have been compensated for any sales made in the past. The final decision will be made by the foundation DAO to terminate operation.

Fraud Detection — “Gaming” the system

As detailed in the above sections, the Itheum Data DEX allows for the sale of personal datasets. The reward for each sale will be paid out in the platform’s native MYDA utility tokens. As the demand for MYDA tokens grow we anticipate there will be malicious individuals or parties that try and “game the system” with the intention of obtaining MYDA or to disrupt the market activity of legitimate data trade.

Types of Attacks

The following are some attack scenarios we anticipate:

1. Selling Fake Data

A malicious user or a botnet can potentially spin up 100s of addresses and upload fake data files. The intention here would be to spray attack dummy datasets and mask them as legitimate datasets. For e.g. If a Data Coalition that aligns with the “sale of health data for commercial use-cases” has a high return in terms of sales — the malicious party can upload fake data and tag it as legitimate blood pressure readings. The malicious party can then align to the above Data Coalition and the data is then piped into the Data Coalition’s data pool. This kind of attack will diminish the data quality of the overall Data Coalition as buyers will rate the pool data quality as low and/or request refunds. The malicious parties intention would be to attempt to pass for legitimate data and in return earn some MYDA tokens before the act is discovered and blacklisted.

The attack can also happen in a direct peer-to-peer sale method, where the malicious user uploads fake data, write up an appealing and legitimate data preview headline and hope that a buyer will be tricked into making the MYDA transfer before discovering the data is fake.

2) Selling “Doctored Data”

This is similar to the above attack but instead of uploading and selling fully fake data, the Data Creator/Seller doctors or manufactures data to look like it’s accurate. For example, as blood pressure data has a standard “mask” (e.g. 123/89) they can generate some random data that looks like accurate data (134/32, 123/90). They can use scripting to generate bulk quantities of doctored data and automate the upload and sale of it as described in the above section (i.e. via a Data Coalition or direct peer-to-peer sale)

3) DDOS/Spam via Fake Orders

This is very similar to the attack described in the "Selling Fake Data" section but the intention here is more to disrupt the natural marketplace activity by spamming it with fake data orders that make it impossible for legitimate orders to be viewed and purchased. The attacker/s can spin up multiple identities and advertise small bits of data for sale until they pollute the marketplace or the Data Coalition’s data pools with fake activity.

4) Uploading Irrelevant/Inappropriate Data

In this type of attack, a malicious user will attempt to bring reputation damage to the Itheum Data DEX by uploading irrelevant or inappropriate data. They may masquerade as a legitimate user and then disrupt the credibility of the platform by uploading bad data that will be seen up other legitimate users of the platform. This may also bring about regulation concerns if the data is especially inappropriate. Over time this type of attack will "spam" the natural, organic network activity and degrade the reputation of the Itheum Data DEX.

5) Sybil Attack

This type of attack is something all distributed systems and blockchains are prone to and involves malicious parties using “multiple identities” to flood a chain with the intention to take control of governance or core functionalities (like producing blocks in a blockchain or voting in their fake proposals). The Itheum Data DEX is built on EVM compatible chains (Ethereum, Polygon etc) — so the inherent risk of Sybil attacks on the core chains will impact Itheum’s data trade activity. The Itheum Data DEX is also prone to Sybil Attacks of a different type; where a malicious user, masquerading as a Data Creator/Seller can spin up multiple identities and flood the marketplace with fake orders or attempt to take control of the governance and voting activities that are core to the Data Coalitions.

Methods for Prevention

The following methods are (or will be) implemented to mitigate the above attack vectors.

  • The selling of personal data (either via direct peer-to-peer sales or via a Data Coalition) involves an “advertising process” (described in a section above) — where the integrity of the raw data is advertised on-chain to facilitate the trade process. As this is a “write-transaction” it does require GAS to complete. If a malicious user is attempting to spam or dump fake datasets for sale on the marketplace, the GAS cost implication will make it impractical. (this is especially true if the chain is Ethereum, but for low GAS fee chains like Polygon — the risk still remains but at lower levels). This is applicable for direct peer-to-peer sales of Data Sets as well as for sales of Data NFTs.
  • Currently a Data Preview is attached to each Data Set and Data NFT sale offer. When a user puts “Data Previews” they are currently manually entered, this allows for a malicious user to put in a fake preview and upload fake data.

    image

    Data Previews can be “Gamed” as it’s manually entered

We are working on having a new meta data field called “data snapshot” which will focus on a specific, random part of the uploaded file or stream and attach that to the dataset order. In order to prevent the “data snapshot” exposing any sensitive data; upon upload of the data file by the seller they will be provided with a few “snapshots” and they will be able to pick one that they prefer (similar to how you can pick a generated thumbnail for a video uploaded in YouTube)

  • Data Creators/Sellers can align to a Data Coalition to leverage “power selling” of datasets. To align to a Data Coalition and then pipe their data to the Coalition for sale, the Data Creator/Seller will need to stake some MYDA against the Coalition DAO. The more they stake, the higher the data quality score is attached to the origin Data Creator. This puts a “skin in the game” vector for the Data Creator/Seller to behave in ways that don’t end up them having to be penalised and have their MYDA revoked.
  • Staking/Voting against a Data Asset to boost credibility: We are looking at methods to attach community staking and/or voting against data assets that are put for sale. This may be voting that happens as part of the Data Coalition (i.e. if specific user’s data is gaining more demand, the DAO upvotes the user’s credibility). As the core credibility grows the value of the user’s data (in terms of MYDA cost) also increases. Based on this method, as a new Seller (i.e. new’ish Chain Address) tries to sell data, the credibility score will be extremely low and therefore the return will be less — this provides the organic incentive needed for legitimate behaviour as the best way to earn more MYDA would be to upload legitimate data, align to strong Data Coalitions and gain organic credibility over time.

This approach will organically generate a sort of confidence score for each dataset or data NFT being sold on the marketplace. This is very similar to the OpenSea Confidence Score dialog alert you see when you attempt to buy NFTs.

image

OpenSea Confidence Score notification for purchases

  • Multi Account Detection and Blacklisting: We currently use Moralis for user account management. (Moralis)[https://moralis.io/] allows for multiple addresses to be linked behind the scenes to a single user. This enables to have an audit trail of a users trying to launch any potential address based sybil attacks and allow us to blacklist transactions in real time.
  • Verified Identity and/or KYC of Data Seller: One method to prevent most attacks described above would be to use a Verified Identity or KYC platform to ensure that the Data Seller have gone through a verification process before they can sell their data. To prevent a high barrier to entry and boost adoption with a better UX, we may allow for a “unverified” sale of data which will allow for the sale of data but with limited functionality in the DEX and/or much lower MYDA earnings (this is to incentivise the user to verify — this is similar to CEX’s like Binance which allow for trading smaller amount without verification). We are also investigating the (BrightID)[https://www.brightid.org/] project for fully decentralised proof of uniqueness and other mainstream KYC platforms like CIVIC as potential solutions.
  • To increase the legitimacy of Data DEX user accounts and to prevent the attacks that we describe above; we are also considering using a mobile phone number as a 2nd factor on the user identity record. This solution path makes sense should we not want to proceed with a full KYC tool and to yet ensure each user has a 2nd factor they need to prove they own before using the Data DEX. As obtaining a mobile number has compliance and audit trail attached to it (you need to provide your driver's license or prove your identity and address for example), this makes attacking the system very impractical as each user will generally only have access to one mobile number (of course, there are exceptions to this but it's not possible for a single user to own hundreds of mobile numbers for example).
  • To prevent dump botnet attacks we will implement "invisible CAPTCHA". These are modern captcha technologies that don't have the bad UX of traditional captcha (i.e. click a button on select patterns from photos) but can weed out fraudulent automated users or bots with very high accuracy.

We are sure that there will be many more methods malicious users will use to try and game and disrupt the Itheum Data DEX and marketplace ecosystem. As part of our core governance model and token distribution we will reserve a potion for the community to investigate new attack vectors and incentivise this type of white-hat hacking and remediation behaviour to continue to boost the security of our Data DEX.


Key Terms of Reference

  • Data Creator: The type of user who generates the raw data (i.e. the data is an extension to them and would not exist if they did not generate it). This is usually the user who uses an app built using Itheum’s data collection and analytics toolkit.
  • Data Seller: A Data Creator who uses the DEX to sell their data either as a direct Data Pack sale or Data NFT.
  • Data Owner: The type of user who owns a piece of data that’s available in the decentralised data DEX/marketplace. The Owner does not have to be the Data Creator, it can be someone who purchased a Data Pack or a Data NFT can can prove ownership of that asset on the blockchain. So for each Data Pack for example, there will be 1 Data Creator and potentially multiple, unlimited (n∞) Data Owners (the users who bought the original data pack). And for Each Data NFT for example, there will be 1 Data Creator and a single (n1)or multiple, limited (nx) Data Owners as Data NFTs have a limited supply of 1 — x.
  • Buyer (Access Requester): The type of user who logs into the decentralised data DEX and buys a Data Pack or Data NFT. They then become a Data Owner. They are also referred to as a Access Requester as they are technically not "buying data to fully own it", instead they are only "buying access rights to use a copy of the data".
  • Data Pack: The simplest dataset that can be sold on the Data DEX. It’s a file in JSON format that can be put on sale by the Data Creator and then get’s advertised for sale in the marketplace. The sale happens directly between Creator and Buyer with no intermediaries.
  • Data Order: When a buyer buys a Data Pack a Data Order is created. This has some meta data around the transaction, this information is kept off-chain (to conserve on-chain resources and keep costs low) but can be verified to be accurate on-chain.
  • Data NFT: The alternate way to sell data is by wrapping it in a NFT and then trading it as a regular NFT product in the NFT marketplace.

Disclaimers

This Whitepaper and any other documents published in association with it including the related token sale terms and conditions (the Documents) relate to a potential token (Token) offering to persons (contributors) in respect of the intended development and use of the network by various participants. The Documents do not constitute an offer of securities or a promotion, invitation or solicitation for investment purposes. The Documents are not intended to be a financial services offering document or a prospectus. The token offering involves and relates to the development and use of experimental software and technologies that may not come to fruition or achieve the objectives specified in this White Paper. The purchase of Tokens represents a high risk to any contributor. Tokens do not represent equity, shares, units, royalties or rights to capital, profit or income in the network or software or in the Token issuing entity or any other entity or intellectual property associated with the network or any other public or private enterprise, corporation, foundation or other entity in any jurisdiction. The Token is not therefore intended to represent a security or similar legal interest.
The purchase of Tokens involves significant risks and prior to purchasing them, you should carefully assess and take into account the potential risks including those described in the Documents and on our website.
Although there may be speculation on the value of the Tokens, we disclaim any liability for the use of Tokens in this manner. A market in the Tokens may not emerge and there is no guarantee of liquidity in the trading of the Tokens nor that any markets will accept them for trading.
This Whitepaper describes a future project and contains forward-looking statements that are based on our beliefs and assumptions at the present time. The project envisaged in this Whitepaper is under development and being constantly updated and accordingly, if and when the project is completed, it may differ significantly from the project set out in this whitepaper. No representation or warranty is given as to the achievement or reasonableness of any plans, future projections or prospects and nothing in the Documents is or should be relied upon as a promise or representation as to the future.

Discussion (0)