🔔 This article was originally posted on my site, MihaiBojin.com. Since it contains a few diagrams, it cannot be correctly displayed on DevTo. I highly recommend reading it on my site! 🔔
A problem of ownership
I set out to build the props library to standardize retrieving (key,value)
pairs from multiple sources while also allowing a simple mechanism to establish key precedence when more than one source is aware of the same key.
Let's consider the following scenario, where multiple source_s define _property X:
Source (Layer) | Value |
---|---|
Prop X is defined by source 1 | 1 |
Prop X is defined by source 3 | 2 |
Prop X is defined by source 4 | 3 |
Effective value | 3 |
Sources can be property files, system properties, or they can be coming from a key-value store or database.
Since each (key,value) pair can change independently on each source, we need to establish a key's precedence, basically answering the "Who provides the value for this key?" question.
We do so by introducing the concept of layers.
Each added layer can retrieve data from a source. As we add more layers, each takes precedence over the previous layer, for any keys it defines.
We call this concept ownership.
Sources vs. layers
Let's think for a moment about sources and layers.
A naïve approach would be to model a 1:1 relationship between Sources and Layers, basically creating a single entity to represent both, then passing a list of these entities to a registry.
We see that both sources point to the same file on disk. However, we have two objects that independently read the same data and create a potential for contention and consistency problems (e.g., SourceLayer 1 reads data, the file is updated on disk, SourceLayer2 reads different data).
Consistency is not always possible and cannot always be a property of a system, but in this case, we can ensure that a source has the same view of its data in a running JVM process.
Rather than have source_s and _layer_s be the same (1:1), let's instead have a single _source representing a dataset and multiple layers (1:N) which receive updates when the dataset changes. An added benefit of this design is that our system only needs to do one read operation per dataset instead of one per layer. Since I/O is generally more "expensive" than in-memory operations, this is a win!
Here are the components of our system:
-
Source
: loads the data from its origin (e.g., filesystem), reloads it using user-defined logic (e.g., an interval), and notifies registered layers of any reload events -
Layer
: holds copies of the data for each registry -
Registry
: queries data from each of its layers
Retrieving the effective value
But what if a layer adds a new key or removes an existing one!?
Between two reads of X, the owner could change from Layer 3 to Layer 2.
A simple algorithm to solve this problem is always to query all layers, e.g.:
- ask Layer 1-3: Do you define X?
- Layers 2 and 3 respond yes
- Layer 3 has precedence and owns the key
- X is unset from Layer 3
- we repeat the first three steps and establish Layer 2 as the owner of X
This choice is not ideal. First, we'd have to repeat this on every read , which would result in O(K)
time-complexity (where K is the number of layers in the registry) for every single read property. This doesn't sound performant!
A side objective here is to deal with a high frequency of ownership changes; for example, a source dataset may repeatedly set and unset one or more keys in a short interval.
Type casting values
Before we talk about ownership, let's quickly segue into data types. The values will always be serialized as strings as we're reading them from property files, system properties, and the environment. (The library will support loading values from other data stores, e.g., databases, but the problem of casting strings from property files remains)
Here's an example of a standard Java property file:
string.prop=some value
int.prop=42
float.prop=1.23
Even though we have integer and float values, the relevant Java APIs will still return a Map<String,String>
, leaving the job of casting these values to the appropriate types to the developer.
In practice, while developers "come and go" in projects, this results in many ways to, not-so-optimally, load strings and repeatedly convert them to the desired type.
When I set out to build the props
library, I wanted to give developers "one" way of loading the values corresponding to each key without worrying about types, simple validations, or caching. My goal is to implement a no-hassle, efficient library for dealing with application properties!
How do we establish key ownership?
Ok, back to ownership!
The registry needs to keep track of a large enough number of keys (let's say in the 10,000s range), and retrieve values from the appropriate layer.
Developers will be able to rely on a registry to retrieve values cast to their desired type.
Let's examine the internals of a get(key, type)
call.
A client queries the registry (1), which in turn checks all its layers (2), awaiting responses (3), then determining the owner and returning the effective value to the client (4).
In practical applications, we can assume a registry will have a few layers (K
).
If our application handles N
properties (keys), then reading all the properties at least once would be an O(N*K)
operation. Given that ultimately a key is owned by a single layer, the best we can hope for is O(N)
for N keys, or O(1)
, constant time, per key.
This time-complexity is not only more efficient but also totally achievable!
Internally, the registry needs to keep track of which layers own each key.
Since layers can be updated concurrently, we need a thread-safe data structure.
Since we need to "map" (pun intended) keys to layers, we need to use a Map
(duh!)
The Java SDK provides a very efficient, thread-safe map: ConcurrentHashMap.
Let's see how this could look in code:
public class Registry {
ConcurrentHashMap<String, Layer> owners;
// retrieves the value for the specified key
public <T> T get(String key, Class<T> clz) {
// finds the owner
Layer owner = owners.get(key);
// retrieves the value
String effectiveValue = owner.get(key);
// casts it
return clz.cast(effectiveValue);
}
}
public interface Layer{
// retrieves the value for the specified key, from this layer alone
String get(String key);
}
The last piece of the puzzle is updating ownership. Below, we examine the new key case.
public class Source{
List<Layer> layers;
// reloads data periodically, at the specified interval
void reload(Duration interval) {
// the trigger will be periodically scheduled at an interval
var trigger = () -> {
// data is reloaded from source
Map<String, String> data = ...;
// and then sent to all registered layers for this source
for (Layer layer : layers) {
layer.onReload(data);
}
// we may also want a sync stage here
// to make the new dataset available to all sources at the same time
// but it's beyond the scope, for the moment
}
}
}
public class Layer{
// processes the reloaded data
public void onReload(Map<String, String> data) {
// for each key in the dataset
for (var entry : data.entrySet()) {
// updates the value
updateValue(entry.getKey(), entry.getValue());
// and notifies the registry that a new owner might be available
registry().bindKey(entry.getKey(), this);
}
}
// decides the priority of this layer, in the current registry
int priority();
// reference to the registry that owns this layer
abstract Registry registry();
// stores the (key,value) pair
abstract void updateValue(String key, String value);
}
public class Registry {
ConcurrentHashMap<String, Layer> owners;
// registers ownership of a layer over a key
void bindKey(String key, Layer layer) {
// finds the current owner
Layer owner = owners.get(key);
// determines if ownership change is required
if (owner.priority() < layer.priority()) {
// change the current owner
owners.put(key, layer);
// from this point, all "get"s of "key" will be routed to "layer"
}
}
}
The scenario above details how the system will handle new keys appearing in a layer.
A full implementation also needs to consider deleted keys and updated keys. For those cases, we'd want to avoid calling registry().bindKey(...)
since it would result in a no-op anyway.
Overall, I am willing to bet that ownership changes will be less frequent than value changes in real-world scenarios. So, the upfront cost of adding and deleting keys, represented by registering and unregistering keys from a layer, will be worth it to achieve constant time (O(1)
) reads per key.
Conclusion
At this point, I am genuinely excited to implement this code!
I'm sure I will learn more as I go along and may change my mind on some of these choices, but this is the plan for now!
Until next time...
If you liked this article and want to read more like it, please subscribe to my newsletter; I send one out every few weeks!
Top comments (0)