mari tang

Posted on Apr 21, 2019 • Edited on Apr 29, 2019

OGQL post-mortem

#systemdesign #graphql #webdev #backend

So, with our initial release of OGQL, this seems like a good moment to stop and take stock of where we are, how we got here, and where we'll be going from here.

We'll start with how we got here.

How we got here

We've been contacting various software engineers in order to ask them about their pain points around GraphQL. There were a few notable points that came up.

GraphQL is a graph-based query language, but not a graph database, and lacks some of the querying functionality built into graph databases.
difficulty in reading nested JSON
difficulty in debugging queries through GQL's layer of abstraction

We decided to address points 2 and 3. Here's what our process was like.

First, we started with resolvers. Resolvers are methods that exist within GraphQL types and the "root", which return the data requested by a client. GraphQL invokes each resolver until they return scalar (non-nested) values that will populate the request's schema.

If we're looking for potential bottlenecks and debugging issues in GraphQL, this is where we'd start.

I began by using a decorator to add tracking to all of the root resolvers so that we could try doing things like figure out how frequently they're called. This worked somewhat, but wasn't enough to actually understand the behavior of our queries.

The problem happens once we start to hit nested resolvers, and asynchronous database requests. Nested resolvers work fine as long as they're synchronous operations, since we can reliably time their duration by checking a Date object before and after, but most GraphQL operations will require fetching data from another API or a database.

The question, then, is one of how to distinguish between different cases (trivial resolvers, API calls, and a variety of different database calls).

It seems that GraphQL itself has ways of resolving promises that are returned by database calls. as it accumulates its JSON response, so we tried to dig into the GraphQL JS source code to understand how this happens, and inject our own debugging/tracking code into its structure. It took us several days to research this topic and attempt to design solutions, none of which ultimately worked.

We were on a time crunch, and our morale was extremely low, with high levels of burnout. In the end, we scoped down our project quite a bit. This leads into...

What did we do?

So, we had a tightly-coupled tracking system that got some amount of data (runtime and resolvers, which came with nesting depth).

In our desperation, we ended up moving towards working backward from our result in order to figure out how we got there. It works, but it's not recommended, as you lose the ability to get granular information about individual resolvers, and it only works if all of your resolvers are functioning to return data correctly.

In this compromised form, you can trace resolvers by the key names on the nested object returned by GraphQL. You will not be able to tell which resolvers are trivial or not, how many database queries or fetch requests have been made, or to get a breakdown of which resolvers might take the longest to evaluate.

However, the benefit is that we were able to work with any GraphQL endpoint, including ones where you don't have the option of injecting code into your backend. We're still able to visualize the total amount of data returned, the breakdown of how each query influenced the ultimate result, and the total runtime between execution and returning data.

Initially, we had designed our project as an endpoint on our server, with the intention of moving it over to an electron app. However, electron has a lot of overhead and would add a lot of cruft to a project that doesn't really need to live outside of a browser. Ultimately, we looked at Graphiql and decided to implement OGQL in a similar way.

When we use Graphiql as a GraphQL extension, it will render a GUI at '/graphql' when a browser accesses it. GraphQL extensions are fairly limited in what they can serve. Graphiql only manages to serve a React frontend by sending out a string that gets rendered as HTML, which includes script tags that will get all of its dependencies from CDNs for Node promises, React, a minified stylesheet, and a minified bundle for the compiled Graphiql itself.

Following this pattern, OGQL serves HTML to '/orpheus' by default, and imports a bundle that we compiled from our frontend. Since we can't just bundle OGQL with all GQL installations, we packaged it as a Node module and uploaded it to NPM, such that any developer can easily install and use it.

Unlike Graphiql, all of our styles and resources are compiled into a single bundle by Webpack. This is inefficient, and we should pull from the same CDNs as graphiql so that we can leverage caching and minimize our bundle size.

Right now, our OGQL bundle is sitting at around 5 megabytes. This is much larger than it needs to be.

We initially built our frontend using React and Redux, Ace Editor, and React Vis. Our state is probably not quite big enough to justify the use of Redux (but it's a fairly lightweight library, so it's not 100% urgent that we cut it. As we scale/ add more functionality, it may become more useful). As mentioned earlier, we can pull React from a CDN, and we'd probably want to pull just Sunbursts/Treemaps from React-Vis (the only parts of the library we're actually using to visualize our data). Ace Editor is a fun bit of technology, but is also fairly heavy, and we're not really leveraging advanced features from it, so we'd probably want to do away with it as well, and find a lighter alternative (or actually use things like syntax highlighting?).

I know this kind of bleeds into the next section, but we'll talk mostly about features to think about adding.

What's next?

So, as OGQL exists at the moment, it's mostly a fancy JSON visualizer. I'm glad I got to do a little tree traversal/building to parse it out to a Tree in order for React Vis to read it, and feel that I have a much better understanding of GQL than I did going in, but I haven't actually done much in the way of extending it.

We implemented some history, but it's not persistent, and it doesn't cache our visualization data. It just saves the last few queries that you've done.

There would be many ways to extend our history functionality. I think that one of the clearer use-cases that we can do is to build some features that would give us comparisons between selected queries. Differences in runtime, differences in size of total data, and differences/overlaps in terms of our resolvers may be things worth building in. Having some system for comparing many queries and adding functionality to export them would help scale our use cases as well.

I think that it'd definitely be worth taking a second look at building functionality to track database requests. I'm thinking that we probably want to check if a returned value is a promise, then wait for it to resolve in some way. I don't understand how GraphQL does it. (promise.all? but then we'd have to make sure that we're pushing all of our promises to an array, and ensure that the data flows properly to whatever functionality we needed after the promises resolve. If anyone has insight into how GraphQL handles this, please say something). We could probably identify types by looking at how the GraphQLObjectType constructor works. If we can parse through the name and fields, we might even find the resolve method and the database/API calls within it.

I wouldn't expect to be able to distinguish between API calls and database calls, but we could probably get around that by recognizing them simply as asynchronous tasks.

When we get an asynchronous task, we'd want to mark that a particular resolver is doing async work, and we'd want to time and track them using async methods, since those would be particularly worth looking at. Munging data might get gnarly, but the #1 bottleneck for web-based applications is going to be network requests, so that's what we'll prioritize.

Changes in Process

We should have reassessed standups and maintained better communication. At least on my end, I experienced a sense of (largely self-imposed) pressure to complete tasks, but lacked a sense of which tasks to prioritize, and what ends they were moving towards. It sapped a lot of my motivation.

Organizing our standups around user empathy would have helped to keep everyone cohesively moving towards a better product. Instead, I ended up finding myself trying to invent and claim tasks so that I could feel productive, while finding many of them fruitless or too vague to effectively pursue. I wanted to do my best to support my team, and ended up de-prioritizing the actual product in the process. Often, I also felt "stuck" with the tasks that we set out for ourselves, and it would have saved a lot of time if I had felt authorized to reassess a task that didn't seem to work, and to consider other options in its place.

At some point, it's vital to recognize that a process is not working, or not working as well as we need it to. This is a painful thing to do, which is why I avoided doing it over the course of this project.

Domain knowledge is extremely vital as well. It's nigh-impossible to solve a problem that you do not understand. I was learning fairly rapidly as I worked, but I never stopped to take stock and recontextualize my knowledge, or my overall understanding of the task at hand. We managed to do some amount of course-correction, but, again, lacked a clear destination to correct towards. Avoiding failure is not a great way to create a success.

Conclusions

Generally, our development bottlenecks were not actually technical expertise, time, or programming ability, but product management and morale. Scoping down a project is generally a good idea, but there needs to be space and direction to grow once we've achieved MVP.

User empathy could have greatly helped us to create that direction. We were fixated on technical tasks and definitely didn't plan for what our actual use cases were. Granted, this was the first time we'd even had any understanding of GraphQL, but it was a damned shame that we struggled so much to produce a system that doesn't quite fit the use case we'd initially envisioned. By the time that we'd had anything to show for our work, we were thoroughly lost, then ran into a deadline.

As far as where we should go from here, here's the summary:

Unquestionably, we should cut down our bundle size. 5mb is not acceptable for such a simple frontend. We can also do better work, even with just the data provided to us. We should add better history tools, and analytics that will gracefully scale to compare multiple requests, as well as tools to export these logs. With these tools, a developer can experiment with queries, compare results, and save anything worth saving. GQL's introspection tools are also an extremely valuable part of the system, and would be great to implement as part of our app.

However, given the current state of our app, I also would suggest honing in on one of two models for how to develop it further:

Integration into the backend: If we're going to require that a user install our software as a Node module, we should provide functionality that can only be achieved by importing code into their backend. The fact that it functions as a GraphQL extension is nice, but doesn't necessitate a download. If we're going this route, we should dig into the actual functionality of resolvers. The ability to track API/DB requests would be the main draw of this approach.
Total separation into an independent frontend. If we're not providing extra functionality through an NPM installation, we should separate out our frontend into its own website, where it could exist as a PWA (no installation required!). This would be a lightweight, flexible, rapid querying tool that could be used to help a developer plan requests to an unfamiliar endpoint (if, say, you're making use of a public GQL API and don't have full access to its backend, you could use this tool to query and understand the shape of the API's responses)

Ultimately, I'm not entirely satisfied with the app as it exists now. It's got ease of use, good design, and provides a rapid understanding of the query's results, but continues to be weighted down by vestigial code and decisions that were made with respect to imitating Graphiql, rather than choosing the optimal solution for our own use case. OGQL's most viable use-case at the moment consists of providing visualizations for users who aren't especially familiar with the ins and outs of GraphQL.

(I hadn't thought of this before, but aiming our tool at a more casual audience would be yet another way to go. We could go as far as to integrate GQL's introspection features and allow a user to build a query through a properly graphical click / click-and-drag interface, then return data, while having the formatted query string available to export.)

However, there's plenty of time and space to learn and grow, and I'd certainly like to continue building our project into the proper tool that it's capable of becoming. Thanks for reading along!

DEV Community