DEV Community

Denis Magda
Denis Magda

Posted on • Edited on • Originally published at gridgain.com

Why My In-Memory Cluster Underperforms: Negating Network Impact

Memory access is so much faster than disk I/O that many of us expect to gain striking performance advantages by merely deploying a distributed in-memory cluster and start reading data from it. However, we might overlook the fact that a network interconnects cluster nodes with our applications, and the network can influence the performance and latency greatly.

With that being said, using proper data access patterns provided by distributed in-memory systems can negate the effect of the network latency. In this article, we're using Apache Ignite to compare the performance results of an application that solves a given task with several methods.

Running a Simple Experiment

Let's assume that our Apache Ignite in-memory cluster stores thousands of records, and the application needs to calculate the highest temperature and the longest distance across the data set. Three APIs are to be compared to show how performance changes if network utilization is minimized — individual key-value calls, bulk key-value reads, and co-located computations.

A personal laptop is suitable for this experiment. Thus, I deployed a two-node Ignite cluster on my machine and ran this example over 200,000 records preloaded to the cluster nodes (MacBook Pro of Early 2015 with 2.7 GHz Dual-Core Intel Core i5 and 8 GB 1867 MHz DDR3). Those two cluster nodes and the app interacted via the loopback network interface and competed for shared RAM and CPU resources. If we run the same test in a truly distributed environment, then the difference between the compared APIs will be more noteworthy. I encourage you to experiment with other deployment configurations by following the instructions provided in the example.

Stressing Out the Network With Thousands of Key-Value Calls

We're starting our experiment with key-value APIs that pull each record one by one from the two nodes. Ignite provides a standard cache.get(key) API method for that (check calcByPullingDataFromCluster method for full implementation):

for (int i = 0; i < RECORDS_CNT; i++) {
    //Reading a each record from the cluster one by one.
    SampleRecord record = recordsCache.get(i);

   if (record.getDistance() > longestDistance)
        longestDistance = record.getDistance();

   if (record.getTemperature() > highestTemp)
        highestTemp = record.getTemperature();
    // Running other custom logic...
  }
Enter fullscreen mode Exit fullscreen mode

What we do here can be seen as a brute-force approach as long as the application reads all 200,000 records and does the same or more network roundtrips. Unsurprisingly, it took ~35 seconds for the app to finish the calculation on my laptop. If this method of data access is selected for similar calculations, then we might not win at all by keeping data in RAM vs. disk since a bunch of records are transferred over the network.

Speeding Up by Reducing the Number of Network Roundtrips

The first obvious optimization for our experiment is to reduce the number of network roundtrips between the Ignite cluster and application. Ignite has the cache.getAll(keys) version of key-value APIs that queries data in bulk. The following code snippet shows how to use the API for our task (full implementation can be found in the calcByPullingDataInBulkFromCluster method):

//Reading all the data in bulk using a ~dozen of network calls.
Collection<SampleRecord> records = recordsCache.getAll(keys).values();

//Iterating through the data. 
Iterator<SampleRecord> recordsIterator = records.iterator(); 
while (recordsIterator.hasNext()) { 
     // Calculating highest temperature and longest distance.
}
Enter fullscreen mode Exit fullscreen mode

With this approach, my laptop completes the task in ~5 secs, which is 5x faster in comparison to individual key-value calls used previously. The application still reads all 200,000 records from the server nodes, but it does it in a handful of network roundtrips. Ignite cache.getAll(keys) does this optimization for us — when we pass the primary keys, Ignite first maps the keys to the server nodes that store the records, and then connects to the nodes reading data in bulk.

Removing Network Impact With Co-Located Computations

Finally, let's see what happens if we stop pulling 200,000 records from the server nodes to the application. With Ignite compute APIs, we can wrap our calculation into an Ignite compute task that will be executed on the server nodes over their local data set. The application only receives results from the servers and no longer pulls the actual data; thus, there is almost no network utilization (full implementation can be found in the calcByComputeTask method):

// Step #1: Application broadcast the compute task to the cluster.
Collection<Object[]> resultsFromServers = compute.broadcast(new IgniteCallable<Object[]>() {

    @Override public Object[] call() throws Exception {
        // Step #2: Each cluster node executes this method 
        // remotely to calculate the highest temperature 
        // and longest distance for its local data set and returns to
        // the application.
    }
}
// Step #3: Application selects the highest temperature 
// and longest distance by comparing results from the servers nodes.
Enter fullscreen mode Exit fullscreen mode

This co-located compute-based approach completes in ~1 second, which is 5x faster than the cache.getAll(keys) solution and 35x more performant than issuing individual key-value requests. Moreover, if we load X times more data into the cluster, the compute-based approach will keep scaling linearly while cache.getAll(keys) will be slowing down.

Should We Stick to Co-Located Compute?

The goal of this experiment was to show that with distributed in-memory systems, the performance of our applications can vary greatly depending on the way we access distributed data sets as long as the network glues the cluster and applications together.

This experiment does not discourage you from using key-value APIs or sticking to co-located computations only. If an application needs to read an individual record and return it to the end-user or join two data sets, then go ahead with key-value calls or SQL queries.

If the logic is more complicated or data-intensive, then consider compute APIs to eliminate or reduce the network traffic. In short, know the APIs an in-memory technology provides and choose those that will be most performant for a particular task.

Top comments (0)