Bogdan Nedelcu

Posted on Apr 14, 2020

The Graph in GraphQL

#graphql #javascript #node #architecture

GraphQL is today a ubiquitous choice for building APIs. The technology, open-sourced by Facebook, allows clients to fetch what only they need and aggregates requests under a unique query interface. With GraphQL, we can build faster applications, consume fewer data, and leverage fantastic developer tooling. I have been fascinated by GraphQL ever since its release. However, a question kept coming back to me: How does it leverage the power of graphs? In the following lines, we will start by taking a look at graphs, trees, and recursive properties. With that knowledge, let's dive deep into the original specifications and the javascript implementation of the server runtime. We will break apart the inner workings of GraphQL into its most simple and smaller parts and then put them back together. In the process, we will uncover how the data-structures are used to create the technology that changed the web as we know it.

What is a Graph?

A long time before GraphQL there where graphs, but what are they exactly? Graphs are data structures that resemble the natural way we build mental models and relate concepts. In graphs, relationships between represented entities are as relevant as the entities themselves.\
We build graphs with abstract objects called nodes or vertices. The connection between two nodes is called an edge. We can then explore a graph recursively following the edges in a specific order.

A-Cyclical Directed Graphs

There are different types of graphs, depending on how the nodes and the edges are arranged. We will focus for now on a-cyclical directed graphs because these are the ones we find in GraphQL. A directed edge has a start and an end and can only be traversed following that direction. Adding direction to the edges changes the meaning of the relationship between nodes and introduces a hierarchy. Let's say, for example, we want to represent money loans with a graph. Every edge would represent money borrowed, and the direction would represent the money flow from the lender to the party taking the loan.

From a Graph to a Tree

Graphs can transform into a different data structure depending on the constraints applied to them. A graph cycle or a circuit is a set of edges where the last edge is also the first edge. When a graph has no cycles is called an a-cyclical graph. A directional graph that is also a-cyclical is known as a tree.

The tree structure has several advantages because of its recursive nature. The elementary unit of a tree is a root node and one or many children nodes. If we model our data as a graph and impose the necessary constraints on it, we can leverage tree properties to process it. While one can transverse a tree as a whole is usually easier to work at a local level, node by node. Read and write operations can be extended to the full length of a tree by executing functions on the root node and then recursively on the subsequent children.

Modelling with Graph(QL)

As we all know in GraphQL, we represent our business domain by using a schema. The schema itself is a graph composed of types representing different entities. Types are extracted from a problem space using domain-driven techniques. They can have different fields, and every field points again to another type. In the picture above you can see that lastname, firstname and email point to the scalar type String. Scalar types do not have any subfields and they represent the leaves of the query tree. A path through the schema will always resolve in a collection of scalars structured like a tree. Most GraphQL implementations allow developers to add their own scalars with custom validation and serialization functions. The relationships between a type and its fields are unidirectional edges and are the building block of the schema. That makes the GraphQL Schema an acyclic directed graph. As we mentioned before this kind of graph can be read like a tree, visiting each tree once, in a process called tree traversal. A GraphQL query is a path in the graph, going from the root type to its subtypes until we reach scalar types with no subfields. As a result, a query is a projection of a certain subset of the GraphQL schema to a tree. On the backend side, every field of a type maps to a resolver function that returns its value when queried. The query result is created by merging the result of running resolver functions for every field extracted from the schema. GraphQL, however, does not stop here. Tree properties and recursive functions are used not only to model data but mainly to validate and execute queries on that schema.

Schema parsing

The GraphQl server parses the schema document at execution time. Types are extracted and stored as plain Javascript Objects with references to their fields, and to the resolver functions in a dictionary called typeMap. When a field must be resolved the execution algorithm will look for it in the dictionary and use both the resolver function and the references to its subtypes to build its value.



// Simplified structure of the type map
let typeMap = {
  rootType: {
    fields: { // array with the fields of the root ype
      user: {
        type: {
          fields: {
            lastname: {...},
            settings: {...},
          }
        },
        resolve: () => ({})  // points to a resolve function for the type
      },
      settings: {
        type: {
          fields: {
            membership: {...},
          }
        },
        resolve: () => ({})  // points to a resolve function for the type
      }
    }
  },
};

As every type contains a reference to its resolver function, one can resolve the whole schema by repeating three steps:

Retrieve a type from the typeMap dictionary
Run its resolver function
Repeat the same on the fields of this type

To sum up: the GraphQL schema document is parsed on the server. During the parsing process, the types extracted and stored together with references to its resolver functions in a dictionary called typeMap. Because of its tree-like structure, the dictionary can be read and wrote using recursive functions following different transversals.

Query Parsing

The GraphQL server parses every query from a string to an Abstract Syntax Tree(AST). An AST is a tree representation of the syntax of source code from a particular language. Every node in the tree represents a statement in the query, including its type, arguments, and position.

The AST is a common abstraction for compilers and is used to validate syntax correctness in a process called semantic analysis. Again, because of its tree-like structure, the AST can be processed and interpreted by recursive functions. This process is behind the query validation feature that GraphQL editors usually offer.

Query Execution

Once a query operation has been converted to an AST and its structure validated, we can use the tree properties to execute the query. The core of the execution algorithm is a recursive function that runs on every node of the query tree following a depth-first-search order.

The traversal ensures that fields are executed and resolved in a stable and consistent order. Following the first order traversal the field execution function will be called on each field in the following sequence:
The executeField function contains the magic behind the field value resolution and is well described in the GraphQL specifications. The function arguments are the name of the type is being run on, the definition of that type from the typeMap dictionary and the resolver function. First, the algorithm executes the resolver function and stores the return. Next, it completes the field value depending on its type. If the field type is a scalar, its value is simply "coerced" using a serialization function and directly returned. If the field type is an Object the completeValue process is started. The function collectFields assembles all the subfields on the respective object type that have not been resolved by the resolver function and returns a fieldGroup, an array ordered respecting the depth‐first‐search style. Then executeField runs recursively on each one of the subfields collected, in parallel. Finally, the algorithm merges and coerces the values returned by the first execution of the resolver function and the completeValue return and builds the final result according to the order in the query AST tree.

The resolution algorithm described above is a simplification of the GraphQL specifications. Proper error handling and response building make the actual implementation more tricky. Parsing queries into trees simplifies the resolution algorithm by leveraging recursiveness and ensures the consistency of field execution for queries on schemas of any shape and size.

Summing up

Graphs are the core reason why GraphQL is such a great choice to build and consume APIs. On the one hand, graphs allow developers to model the data in a natural way using directional relationships and hierarchies. The GraphQL Schema is a direct representation of a problem space based on natural language.

On the other hand, GraphQL leverages the recursive properties of AST trees to validate and execute queries. The depth first-order transversal of query trees enables stable and predictable parallel data fetching. The recursive nature of queries enabled fast development of tools like GraphiQL and the Apollo Client that leverage it for client-side query validation, caching and cache invalidation.

Final thoughts

To build exceptional software we need a fundamental understanding of the tools we use. It is usually simple pieces put together in harmony that make up sophisticated technology. The core abstraction in GraphQL is the graph. A linear algebra concept used to represent information in a non-linear and hierarchical way, or simply put: how we think about it every other day.

Even more fascinating is the fact that at the core of any technology we find the incredible ways humans solve problems naturally.

Originally published on bogdanned.com.

Top comments (10)

Manoj Dinesh • Apr 17 '20

This is an awesome article. Thoroughly enjoyed reading it. Recommended to everyone starting the GraphQL journey

Bogdan Nedelcu • Apr 18 '20

Thank you a lot Manoj! I am really happy you enjoyed it!

Manoj Dinesh • May 3 '20

I added your article in the reference list in my youtube video
youtu.be/5tgz-vNMy1Y

Hyeseong Kim • May 16 '20

Awesome. Most introduction articles focus on "QL". This is a great article explaining an important part of GraphQL!

Daniel Norman • Apr 16 '20

Greatly enjoyed reading it Bogdan! Thanks for sharing your thoughts

Bogdan Nedelcu • Apr 18 '20

I am really happy to hear that Daniel! Cheers, Bogdan

Melih Yıldız • Apr 28 '22 • Edited

I found your article while trying to find a way to represent DAGs using graphql. Sadly, you are wrong and fooled the Google's algorithm.
A directed acyclic graph, or a DAG, is not a tree. Tree is a special type of graph where every node only has one "parent", as in only 1 ingoing edge, and multiple children, as in multiple outgoing edges. Whereas in a DAG you can find 2 edges going into 1 node. Since the edges are directed this still does not create a cycle. Please check the attached image. While it's still a DAG, it's not a tree. Even when you reverse the edges.
dev-to-uploads.s3.amazonaws.com/up...