Mihai Bojin

Posted on Sep 9, 2021 • Originally published at mihaibojin.com on Sep 6, 2021

System Design goal: efficient property reads from multiple sources

#systemdesign #java #threadsafe #datastructures

🔔 This article was originally posted on my site, MihaiBojin.com. Since it contains a few diagrams, it cannot be correctly displayed on DevTo. I highly recommend reading it on my site! 🔔

A problem of ownership

I set out to build the props library to standardize retrieving (key,value) pairs from multiple sources while also allowing a simple mechanism to establish key precedence when more than one source is aware of the same key.

Let's consider the following scenario, where multiple source_s define _property X:

Source (Layer)	Value
Prop X is defined by source 1	1
Prop X is defined by source 3	2
Prop X is defined by source 4	3
Effective value	3

Sources can be property files, system properties, or they can be coming from a key-value store or database.

Since each (key,value) pair can change independently on each source, we need to establish a key's precedence, basically answering the "Who provides the value for this key?" question.

We do so by introducing the concept of layers.

Each added layer can retrieve data from a source. As we add more layers, each takes precedence over the previous layer, for any keys it defines.

We call this concept ownership.

Sources vs. layers

Let's think for a moment about sources and layers.

A naïve approach would be to model a 1:1 relationship between Sources and Layers, basically creating a single entity to represent both, then passing a list of these entities to a registry.

We see that both sources point to the same file on disk. However, we have two objects that independently read the same data and create a potential for contention and consistency problems (e.g., SourceLayer 1 reads data, the file is updated on disk, SourceLayer2 reads different data).

Consistency is not always possible and cannot always be a property of a system, but in this case, we can ensure that a source has the same view of its data in a running JVM process.

Rather than have source_s and _layer_s be the same (1:1), let's instead have a single _source representing a dataset and multiple layers (1:N) which receive updates when the dataset changes. An added benefit of this design is that our system only needs to do one read operation per dataset instead of one per layer. Since I/O is generally more "expensive" than in-memory operations, this is a win!

Here are the components of our system:

Source: loads the data from its origin (e.g., filesystem), reloads it using user-defined logic (e.g., an interval), and notifies registered layers of any reload events
Layer: holds copies of the data for each registry
Registry: queries data from each of its layers

Retrieving the effective value

But what if a layer adds a new key or removes an existing one!?

Between two reads of X, the owner could change from Layer 3 to Layer 2.

A simple algorithm to solve this problem is always to query all layers, e.g.:

ask Layer 1-3: Do you define X?
Layers 2 and 3 respond yes
Layer 3 has precedence and owns the key
X is unset from Layer 3
we repeat the first three steps and establish Layer 2 as the owner of X

This choice is not ideal. First, we'd have to repeat this on every read , which would result in O(K) time-complexity (where K is the number of layers in the registry) for every single read property. This doesn't sound performant!

A side objective here is to deal with a high frequency of ownership changes; for example, a source dataset may repeatedly set and unset one or more keys in a short interval.

Type casting values

Before we talk about ownership, let's quickly segue into data types. The values will always be serialized as strings as we're reading them from property files, system properties, and the environment. (The library will support loading values from other data stores, e.g., databases, but the problem of casting strings from property files remains)

Here's an example of a standard Java property file:

string.prop=some value
int.prop=42
float.prop=1.23

Even though we have integer and float values, the relevant Java APIs will still return a Map<String,String>, leaving the job of casting these values to the appropriate types to the developer.

In practice, while developers "come and go" in projects, this results in many ways to, not-so-optimally, load strings and repeatedly convert them to the desired type.

When I set out to build the props library, I wanted to give developers "one" way of loading the values corresponding to each key without worrying about types, simple validations, or caching. My goal is to implement a no-hassle, efficient library for dealing with application properties!

How do we establish key ownership?

Ok, back to ownership!

The registry needs to keep track of a large enough number of keys (let's say in the 10,000s range), and retrieve values from the appropriate layer.

Developers will be able to rely on a registry to retrieve values cast to their desired type.

Let's examine the internals of a get(key, type) call.

A client queries the registry (1), which in turn checks all its layers (2), awaiting responses (3), then determining the owner and returning the effective value to the client (4).

In practical applications, we can assume a registry will have a few layers (K).

If our application handles N properties (keys), then reading all the properties at least once would be an O(N*K) operation. Given that ultimately a key is owned by a single layer, the best we can hope for is O(N) for N keys, or O(1), constant time, per key.

This time-complexity is not only more efficient but also totally achievable!

Internally, the registry needs to keep track of which layers own each key.

Since layers can be updated concurrently, we need a thread-safe data structure.

Since we need to "map" (pun intended) keys to layers, we need to use a Map (duh!)

The Java SDK provides a very efficient, thread-safe map: ConcurrentHashMap.

Let's see how this could look in code:

public class Registry {
    ConcurrentHashMap<String, Layer> owners;

    // retrieves the value for the specified key
    public <T> T get(String key, Class<T> clz) {
        // finds the owner
        Layer owner = owners.get(key);
        // retrieves the value
        String effectiveValue = owner.get(key);
        // casts it
        return clz.cast(effectiveValue);
    }
}

public interface Layer{
    // retrieves the value for the specified key, from this layer alone
    String get(String key);
}

The last piece of the puzzle is updating ownership. Below, we examine the new key case.

public class Source{
    List<Layer> layers;

    // reloads data periodically, at the specified interval
    void reload(Duration interval) {
        // the trigger will be periodically scheduled at an interval
        var trigger = () -> {
            // data is reloaded from source
            Map<String, String> data = ...;

            // and then sent to all registered layers for this source
            for (Layer layer : layers) {
                layer.onReload(data);
            }

            // we may also want a sync stage here
            // to make the new dataset available to all sources at the same time
            // but it's beyond the scope, for the moment
        }
    }
}

public class Layer{
    // processes the reloaded data
    public void onReload(Map<String, String> data) {
        // for each key in the dataset
        for (var entry : data.entrySet()) {
            // updates the value
            updateValue(entry.getKey(), entry.getValue());
            // and notifies the registry that a new owner might be available
            registry().bindKey(entry.getKey(), this);
        }
    }

    // decides the priority of this layer, in the current registry
    int priority();

    // reference to the registry that owns this layer
    abstract Registry registry();

    // stores the (key,value) pair
    abstract void updateValue(String key, String value);
}

public class Registry {
    ConcurrentHashMap<String, Layer> owners;

    // registers ownership of a layer over a key
    void bindKey(String key, Layer layer) {
        // finds the current owner
        Layer owner = owners.get(key);

        // determines if ownership change is required
        if (owner.priority() < layer.priority()) {
            // change the current owner
            owners.put(key, layer);

            // from this point, all "get"s of "key" will be routed to "layer"
        }
    }
}

The scenario above details how the system will handle new keys appearing in a layer.

A full implementation also needs to consider deleted keys and updated keys. For those cases, we'd want to avoid calling registry().bindKey(...) since it would result in a no-op anyway.

Overall, I am willing to bet that ownership changes will be less frequent than value changes in real-world scenarios. So, the upfront cost of adding and deleting keys, represented by registering and unregistering keys from a layer, will be worth it to achieve constant time (O(1)) reads per key.

Conclusion

At this point, I am genuinely excited to implement this code!

I'm sure I will learn more as I go along and may change my mind on some of these choices, but this is the plan for now!

Until next time...

If you liked this article and want to read more like it, please subscribe to my newsletter; I send one out every few weeks!

DEV Community