Z. Grgurić

Posted on Sep 30, 2022 • Updated on May 5, 2023

XRPL transaction indexing and complex data searching with XWAnalyzer

#xrpl #php #dynamodb #aws

XRPLWin is working on PHP Open-Source XRPLWinAnalyzer (XWA).

This project is still under active development, AWS DynamoDB described below is deprecated and we moved to Google BigQuery as database storage.

Intro

The XRP Ledger (XRPL) is a decentralized, public blockchain led by a global developer community.

XWA stores per-account transaction history and makes it available in searchable manner.

Search engine is core functionality of XWA and base for analyzing and aggregating Ledger data.

Here will be described methods of storing XRPL transactions for high availability in searchable manner.

Aggregating live data could be troublesome, since there are many moving parts, but we are in "luck" with Ledger system, since data history is unmovable (not changing dynamically). We will use this to our advantage to cache responses and minimize database hits.

Datastores overview

Primary persistent datastore is AWS DynamoDB - Amazon DynamoDB is a fully managed proprietary NoSQL database service that supports key–value and document data structures.

Secondary non-persistent DB is PostgreSQL installed on local server used for temporary aggregations, mapping and indexing.

Additional PostgreSQL database is used to store user preferences, searches, tags etc.

Caching overview

There are several caching techniques used to prevent over-querying DynamoDB.

Redis
Disk store for resultsets
HTTP caching like Varnish

Deep dive

DynamoDB for transaction storing

Each XRPL transaction have many available fields, for each transaction type only certain fields are copied to DynamoDB which are used for displaying and searching over those transactions including transaction HASH for pulling additional data from live XRPL nodes. This method will keep DynamoDB store as small as possible.

Let's see Payment type example.

With plethora of fields and metadata only following is stored for Payment type:

Partition key: PK
Sort key: SK

Date - ripple epoch
- Key: t
- Value: ripple epoch integer (smaller than unix timestamp)
Fee - fee in drops
- Key: fe
- Value: amount in drops (smaller than XRP amount)
Counterparty address
- Key: r
- Value: rAddress...
Transaction HASH
- Key: h
- Value: TXHASH
Amount (only amount for XRP or Amount+Issuer+Currency)
- Key: a
- Value: string amount - can be Exponential Notation
- Additional keys for token: i (optional) and c (optional) where i is rAddress and c is Currency ISO or HEX
Direction (signify in or out)
- Key: in (optional)
- Value: IN - true (1 bit), OUT - flag not stored (0 bytes)
SourceTag
- Key: st (optional)
- Value: string
DestinationTag
- Key: dt (optional)
- Value: string

Real world representation of one sample payment transaction would be like this

[
  {
    "a": 50,
    "r": "rGhATPPU4tCfYF5mwrYoMsrDp23eKdsULU",
    "c": "EUR",
    "t": 478121360,
    "SK": 11908872.003,
    "h": "825029C9F6100DBEBB87C5701B8648830E5119AB84B1E82021D129B8BE953321",
    "i": "rhub8VRN55s94qWKDv6jmDy1pUykJzF3wq",
    "PK": "rhotcWYdfn6qxhVMbPKGDF3XCKqwXar5J4-1",
    "fe": 15000
  }
]

Most of field keys are shared (named equally)
For example "a" amount can represent transferred amount in Payment or Limit amount in TrustSet. This key name sharing will remove complexity when building definitive filters.

Partition key

For table partition key (PK) combination of rAddress and numeric transaction type is used (example rhotcWYdfn6qxhVMbPKGDF3XCKqwXar5J4-1)

Sort key

Sort key (Range key) is combination of LedgerIndex and TransactionIndex (example 11908872.003) which gives us numeric sort key to order all transactions for specific rAddress through time. Transaction index is always padded with zeros to keep consistant transaction sorting, max value is 999, this will keep all transactions unique within range of one resultset with specific PK + LedgerIndex.

Primary key

rAddress-TYPE + LedgerIndex.TransactionIndex

rhub8VRN55s94qWKDv6jmDy1pUykJzF3wq-1 + 11908872.003

Querying DynamoDB

This will describe technique how querying is achieved to get specific transaction results with or without advanced conditions.

Limitations

There are several limitations that are in place.

History over time limitation
Paginated results

When requesting transactions of single rAccount from and to date fields are required and can not span more than 31 days. If single query returns more than 10k results, output is paginated to several pages in increments of 10k results. This is to keep HTTP responses not too large.

XWA Search engine

Data fetching in few words:

HTTP cache
Disk cache
Conditions mapper reduces generates SK ranges via counting on DynamoDB which needs to be queried and builds Scanplan
Scanplan lays out list of queries which are executed against DynamoDB

Execution Workflow

Schema above demonstrate query process to extract transactions via three filters, date range, counterparty rAddress2 and destination tag 1234. This process is repeated for each requested transaction TYPE-s, results are merged and sorted post-fetch before caching.

When GET request to search API is requested first thing that is hit is reverse proxy (like Varnish), if request is not cached it is passed to backend PHP script.

Depending on sent filter parameters, definitive and non-definitive filter identifier (hashes) are generated and used to lookup next level of cache - disk cache file. If cache of result collection for that non-definitive identifier is found on disk, collection is unserialized and definitive filter is applied ad-hoc before returning JSON output.

If no disk cache is found, per-day loop is constructed between date ranges and map data for each non-definitive filter is looked up in local DB. If not found in local DB, DynamoDB COUNT query is executed for day range (SK) for specific PK. Returned counts are stored in local DB (maps table) and also cached to Redis. This ensures less hits to DynamoDB. After all count statistics are pulled PHP script calculates intersected ledger days where results are > 0. After that Scanplan is generated as basis for execution of SCAN and QUERY against DynamoDB.

Scanplan has hard-coded limit of 1000 items per one DynamoDB QUERY, if more than 1000 items are returned in group of ledger days, Scanplan may split one to multiple Queries. This will avoid DynamoDB paginating of results due to its 1MB limit of returned data.

In short, for combination of conditions Scanplan will extract only specific ledger days which have one or more transactions contained within them. Definitive filter will loop over those transactions and reject any that are not exact match.

Example of non-definitive and definitive filtering:

Counterparty rvCdEf1234567 will yield 100 transactions where "r" starts with "vC", and definitive filtering will filter out 5 transactions with exact match. This is done this way so we have finite caching combinations.

Synchronizing rAccount history

There is command created within Laravel that synces full transaction history of certain rAccount, each transactions is stored in DynamoDB in reduced format (to save space).

This command is executed in one (or few) background queue jobs. Advantage of this is predictive DynamoDB writes where write Throughput can be controlled on demand and capacity planning is possible.

One small Linux server can handle 60 Writes per second which is enough to Sync full history of a exchange account in few hours (and this needs to be done only once).

Small personal account of 2-3k transactions will sync in 2-10 minutes, this was our experience when benchmarking.

Synchronizing Ledger Days

There is command created within Laravel that synces ledger days, CRON job will execute this every day at beginning of day (UTC) to ensure previous and today ledger indexes edges to be available to rAddress syncer and search.

Cache flushing

Maps table - eject maping older than 30 days
Disk flushing - delete older than 30 days or begining of month
Redis cache - 30 days (in review)
HTTP cache - 2 months (in review)

Redis and Varnish needs to be set to auto evict older items when RAM is full.

Aggregation

With powerful search engine data aggregation is easy feat.

Lets explore scenario where there is need to show spending graph of specific currency eg USD by GateHub. To display this data it is enough to query search engine over specific time and simply add amounts per day or month. This will include token filter, from and to filter and, dir=out.

Wish to know how much Fee did you spent on Payment transactions?
Use conditions: Type=Payment and Dir=Out

Account value over time graph? Query Payments where token=XRP and see what goes in and what goes out over time range.

Other use cases

Track your account on exchange by using Destination and Source tag filtering.

Explore other account XRP flow to and from exchanges by querying known exchange XRP accounts.

Thank you for reading! 👋

Visit us: Twitter | Website | GitHub

DEV Community

XRPL transaction indexing and complex data searching with XWAnalyzer

Intro

Datastores overview

Caching overview

Deep dive

DynamoDB for transaction storing

Querying DynamoDB

Synchronizing rAccount history

Synchronizing Ledger Days

Cache flushing

Aggregation

Other use cases

Latest comments (0)

Read next

Infraestructura como código con AWS ACK

🏗️ Building Trustworthy LLMs: A Guide to Fairness, Safety, and Evaluation with AWS

What’s New in Laravel 11?

Laravel Reverb: make a chatbox component for your site!