Data Dependency Graph

#data #dependency #graph #ddg

While working with backend code, you might notice that most entry points have similar internal structure. Below, I'm trying to formalize this structure and then convert it into a convenient tool for the code analysis and transformation.

Note that this post is just a preliminary consideration, the topic requires a more in-depth look. Any feedback is greatly appreciated.

Hidden Data Dependency Patterns

Vast majority of back-end request processing methods handle incoming request by:

Retrieving some pieces of data from other internal or external components or services.
Preparing response.

Instead of retrieving, there can be storing/sending event/etc. The response object might use no information from retrieved pieces of data. Still, high-level structure remains the same - response data depends on some other data.

If to ignore transformation part and look only to how response object depends on other pieces of data, we may see that there are only three types of dependencies:

result depends on presence of all pieces of data (AND pattern)
result depends on presence of one piece of data from few possible options (OR pattern)
combination of the above

So, let's try to formalize: Data Dependency Graph consists of one or more nodes, where each node either AND node or OR node. Each node has one or more dependencies. Each dependency is either plain data or DDG node.

Let's take a look at a few simplified examples:

public ResponseObject handler1(final Parameters requestParameters) {
        final Dependency1 dependency1 = dependencyService1.retrieveDependency(requestParameters);
        final Dependency2 dependency2 = dependencyService2.retrieveDependency(requestParameters);

        return buildResponse(dependency1, dependency2);
    }

Here, the result object is built from two objects, one retrieved from dependencyService1 and the other from dependencyService2. Availability of both objects is mandatory to build a response. Such a code represents the typical case of AND node.
For convenience, such a case can be described as
ResponseObject = AND(Dependency1, Dependency2)

    public ResponseObject handler2(final Parameters requestParameters) {
        Dependency1 dependency = dependencyService1.retrieveDependency(requestParameters);

        if (dependency == null) {
            dependency = dependencyService2.retrieveDependency(requestParameters);
        }

        return buildResponse(dependency);
    }

This code shows a typical fallback scenario - dependencyService1 is called to retrieve data, but if data is not available, then dependencyService2 is called to provide a replacement. This case can be described as
ResponseObject = OR(Dependency1, Dependency2)

Combination of the above:

    public ResponseObject handler3(final Parameters requestParameters) {
        final Dependency1 dependency1 = dependencyService1.retrieveDependency(requestParameters);
        Dependency2 dependency2 = dependencyService2.retrieveDependency(requestParameters);

        if (dependency2 == null) {
            dependency2 = dependencyService3.retrieveDependency(requestParameters);
        }

        return buildResponse(dependency1, dependency2);
    }

This code combines result from dependencyService1 and result from either dependencyService2 or dependencyService3. The formula for this case is ResponseObject = AND(Dependency1, OR(Dependency2, Dependency3)).

Of course, there might be more than just two dependencies.

Case with single dependency can be described as either AND or OR pattern:
ResponseObject = AND(Dependency1) = OR(Dependency1).

Crucial moment: each dependency is independent of each other. I.e. result R requires dependency A and B but A does not depend on B and B does not depend on A. As long as this requirement is satisfied, the order in which dependencies are retrieved is irrelevant, i.e. AND and OR patterns are commutative.
This is the key property of each DDG node. My math skills don't allow me to prove that, but intuitively, I have a feeling that once the property above is satisfied, DDG should not have cycles.

Graphical Representation of Data Dependency Graph (DDG)

This is an open issue as for now. While formula-like representation used above is convenient for small graphs, it is not quite convenient for drawing on whiteboard and to represent complex graphs. Ideas are welcome.

Why Data Dependency Graph is important?

DDG has one essential property: it describes the structure of the application, and this structure remains intact regardless from the implementation. DDG itself depends only on how data is organized internally, application data inputs and outputs.

This property makes DDG very convenient tool for:

refactoring
transformation from synchronous to asynchronous processing
reworking/rewriting legacy applications and architecture analysis

DDG and Refactoring

It might be noticed that in most cases one DDG node (AND/OR) should correspond to single method/function as long as we want to maintain Single Responsibility principle. So, once we see more than one DDG level in the method code, this might be a code smell and potential place for refactoring.
As long as a function/method holds only one DDG level, the order in which dependencies are retrieved is irrelevant. This enables free reordering of calls or switching between synch/async processing without affecting application logic.

DDG and Synch/Async Processing

Traditional synchronous applications often hide the fact that the order in which dependencies are retrieved is irrelevant. This happens because all processing is done sequentially, and this creates the false impression that they depend on each other. DDG shows real dependencies and enables convenient transformation to asynchronous mode. It also should be noted that DDG looks very close to Promise-based asynchronous processing code.

DDG and Architecture

Data dependencies present in any application. By creating DDG for the existing application or architecture of the future application, it is possible to get more in-depth understanding of how the application works and identify possible (or existing) issues. Since DDG does not depend on the implementation, this greatly simplifies tasks such as rewriting legacy applications. For new architecture, DDG may serve as skeleton for further implementation.