GraphQL is today a ubiquitous choice for building APIs. The technology, open-sourced by Facebook, allows clients to fetch what only they need and aggregates requests under a unique query interface. With GraphQL, we can build faster applications, consume fewer data, and leverage fantastic developer tooling. I have been fascinated by GraphQL ever since its release. However, a question kept coming back to me: How does it leverage the power of graphs? In the following lines, we will start by taking a look at graphs, trees, and recursive properties. With that knowledge, let's dive deep into the original specifications and the javascript implementation of the server runtime. We will break apart the inner workings of GraphQL into its most simple and smaller parts and then put them back together. In the process, we will uncover how the data-structures are used to create the technology that changed the web as we know it.
What is a Graph?
A long time before GraphQL there where graphs, but what are they exactly? Graphs are data structures that resemble the natural way we build mental models and relate concepts. In graphs, relationships between represented entities are as relevant as the entities themselves.\
We build graphs with abstract objects called node
s or vertices. The connection between two nodes is called an edge
. We can then explore a graph
recursively following the edges
in a specific order.
A-Cyclical Directed Graphs
There are different types of graphs, depending on how the nodes and the edges are arranged. We will focus for now on a-cyclical directed graphs because these are the ones we find in GraphQL. A directed edge has a start and an end and can only be traversed following that direction. Adding direction to the edges
changes the meaning of the relationship between nodes and introduces a hierarchy. Let's say, for example, we want to represent money loans with a graph. Every edge would represent money borrowed, and the direction would represent the money flow from the lender to the party taking the loan.
From a Graph to a Tree
Graphs can transform into a different data structure depending on the constraints applied to them. A graph cycle or a circuit is a set of edges where the last edge is also the first edge. When a graph has no cycles is called an a-cyclical graph. A directional graph that is also a-cyclical is known as a tree
.
The tree structure has several advantages because of its recursive nature. The elementary unit of a tree
is a root
node and one or many children
nodes. If we model our data as a graph
and impose the necessary constraints on it, we can leverage tree
properties to process it. While one can transverse a tree
as a whole is usually easier to work at a local level, node by node. Read and write operations can be extended to the full length of a tree
by executing functions on the root
node and then recursively on the subsequent children
.
Modelling with Graph(QL)
As we all know in GraphQL
, we represent our business domain by using a schema
. The schema itself is a graph
composed of type
s representing different entities. Types are extracted from a problem space using domain-driven techniques. They can have different fields, and every field
points again to another type. In the picture above you can see that lastname
, firstname
and email
point to the scalar
type String
. Scalar
types do not have any subfields and they represent the leaves of the query
tree. A path through the schema will always resolve in a collection of scalars structured like a tree
. Most GraphQL implementations allow developers to add their own scalars
with custom validation and serialization functions. The relationships between a type
and its fields are unidirectional edges and are the building block of the schema. That makes the GraphQL Schema an acyclic directed graph
. As we mentioned before this kind of graph can be read like a tree, visiting each tree once, in a process called tree traversal. A GraphQL query
is a path in the graph, going from the root type to its subtypes until we reach scalar types with no subfields. As a result, a query
is a projection of a certain subset of the GraphQL schema to a tree. On the backend side, every field of a type maps to a resolver
function that returns its value when queried. The query
result is created by merging the result of running resolver
functions for every field extracted from the schema. GraphQL, however, does not stop here. Tree
properties and recursive functions are used not only to model data but mainly to validate and execute queries on that schema.
Schema parsing
The GraphQl server parses the schema document at execution time. Types are extracted and stored as plain Javascript Objects
with references to their fields, and to the resolver functions in a dictionary called typeMap
. When a field must be resolved the execution algorithm will look for it in the dictionary and use both the resolver
function and the references to its subtypes to build its value.
// Simplified structure of the type map
let typeMap = {
rootType: {
fields: { // array with the fields of the root ype
user: {
type: {
fields: {
lastname: {...},
settings: {...},
}
},
resolve: () => ({}) // points to a resolve function for the type
},
settings: {
type: {
fields: {
membership: {...},
}
},
resolve: () => ({}) // points to a resolve function for the type
}
}
},
};
As every type
contains a reference to its resolver
function, one can resolve the whole schema by repeating three steps:
- Retrieve a
type
from thetypeMap
dictionary - Run its
resolver
function - Repeat the same on the
field
s of thistype
To sum up: the GraphQL schema document is parsed on the server. During the parsing process, the types extracted and stored together with references to its resolver
functions in a dictionary called typeMap
. Because of its tree-like structure, the dictionary can be read and wrote using recursive functions following different transversals.
Query Parsing
The GraphQL server parses every query from a string
to an Abstract Syntax Tree(AST). An AST is a tree representation of the syntax of source code from a particular language. Every node in the tree represents a statement in the query
, including its type, arguments, and position.
The AST
is a common abstraction for compilers and is used to validate syntax correctness in a process called semantic analysis. Again, because of its tree-like structure, the AST
can be processed and interpreted by recursive functions. This process is behind the query
validation feature that GraphQL editors usually offer.
Query Execution
Once a query
operation has been converted to an AST
and its structure validated, we can use the tree
properties to execute the query
. The core of the execution algorithm is a recursive function that runs on every node of the query tree following a depth-first-search order.
The traversal ensures that fields are executed and resolved in a stable and consistent order. Following the first order traversal the field execution function will be called on each field in the following sequence:
The executeField
function contains the magic behind the field value resolution and is well described in the GraphQL specifications. The function arguments are the name
of the type
is being run on, the definition of that type from the typeMap
dictionary and the resolver
function. First, the algorithm executes the resolver
function and stores the return. Next, it completes the field value depending on its type
. If the field type is a scalar
, its value is simply "coerced" using a serialization function and directly returned. If the field type is an Object
the completeValue
process is started. The function collectFields
assembles all the subfields on the respective object type that have not been resolved by the resolver
function and returns a fieldGroup
, an array
ordered respecting the depth‐first‐search style. Then executeField
runs recursively on each one of the subfields collected, in parallel. Finally, the algorithm merges and coerces the values returned by the first execution of the resolver
function and the completeValue
return and builds the final result according to the order in the query AST
tree.
The resolution algorithm described above is a simplification of the GraphQL specifications. Proper error
handling and response building make the actual implementation more tricky. Parsing queries into trees simplifies the resolution algorithm by leveraging recursiveness and ensures the consistency of field execution for queries on schemas of any shape and size.
Summing up
Graphs are the core reason why GraphQL is such a great choice to build and consume APIs. On the one hand, graphs allow developers to model the data in a natural way using directional relationships and hierarchies. The GraphQL Schema is a direct representation of a problem space based on natural language.
On the other hand, GraphQL leverages the recursive properties of AST trees to validate and execute queries. The depth first-order transversal of query trees enables stable and predictable parallel data fetching. The recursive nature of queries enabled fast development of tools like GraphiQL and the Apollo Client that leverage it for client-side query validation, caching and cache invalidation.
Final thoughts
To build exceptional software we need a fundamental understanding of the tools we use. It is usually simple pieces put together in harmony that make up sophisticated technology. The core abstraction in GraphQL is the graph. A linear algebra concept used to represent information in a non-linear and hierarchical way, or simply put: how we think about it every other day.
Even more fascinating is the fact that at the core of any technology we find the incredible ways humans solve problems naturally.
Originally published on bogdanned.com.
Top comments (10)
This is an awesome article. Thoroughly enjoyed reading it. Recommended to everyone starting the GraphQL journey
Thank you a lot Manoj! I am really happy you enjoyed it!
I added your article in the reference list in my youtube video
youtu.be/5tgz-vNMy1Y
Awesome. Most introduction articles focus on "QL". This is a great article explaining an important part of GraphQL!
Greatly enjoyed reading it Bogdan! Thanks for sharing your thoughts
I am really happy to hear that Daniel! Cheers, Bogdan
I found your article while trying to find a way to represent DAGs using graphql. Sadly, you are wrong and fooled the Google's algorithm.
A directed acyclic graph, or a DAG, is not a tree. Tree is a special type of graph where every node only has one "parent", as in only 1 ingoing edge, and multiple children, as in multiple outgoing edges. Whereas in a DAG you can find 2 edges going into 1 node. Since the edges are directed this still does not create a cycle. Please check the attached image. While it's still a DAG, it's not a tree. Even when you reverse the edges.
dev-to-uploads.s3.amazonaws.com/up...
Awesome!
Nice one, thanks!
excellent article, thanks alot!