A year of large scale GraphQL - the biggest takeaways

peternycander profile image Peter Nycander Updated on ・5 min read

GraphQL experiences from using it at scale (2 Part Series)

1) A year of large scale GraphQL - the biggest takeaways 2) HOWTO: Adopting GraphQL in an existing project

GraphQL has been around for quite a while now, and it has been a hot topic as a possible candidate for the next generation of data fetching.

I have been working with large scale GraphQL for over a year now, mainly for the nordic subcription video on demand (SVOD) service C More. I had never touched GraphQL before this, so I thought my experience during this time could be valuable for folks early in their GraphQL journey.

What is GraphQL

GraphQL is a query language, in which you ask the server explicitly for what you need. You can think of it as sending a string with all the keys of a JSON object, which the server should populate for you. This is what a query can look like:

query {
  series(id: 3446) {
    suggestedEpisode {

Which would return:

  "data": {
    "series": {
      "title": "Game of Thrones",
      "year": 2019,
      "suggestedEpisode": {
        "title": "Winterfell",
        "episodeNumber": 1

On C More we have completed the move to GraphQL, so all the different clients (TV clients, mobile apps and web) are using GraphQL for all their data fetching. I have been part of implementing the GraphQL server and the web implementation.

The pleasant surprises/good parts

There are a lot of upsides to using GraphQL, and ranting about all of them would require a different format. However, there are a few things that surprised me which I want to bring up.

Caching and optimistic UI

I have been using React Apollo on the client side, and I think it has just the right amount of magic to make UI development a breeze.

Say you want to implement optimistic UI (assume server call will be ok, and update UI early). It is certainly possible with a lot of different technologies. But how would you update something like "Added to my list" across a) the panel showing all items in "My List", b) the item you just clicked on, and c) any other occurrence of that item? How do you roll back those changes if the request failed? It is not easy to say the least.

This comes pretty much out-of-the-box with React Apollo. The docs do a great job explaining what optimistic UI is and how you implement it. The optimistic response and the actual server value will update the data in all places, thanks to the cache normalization.

Keeping the client code clean and stupid

With the rise of microservices, more work is being pushed to the client side. It involves things like having multiple network round trips to fetch all data, and having to duplicate complexity between different clients. Multiple round trips are solved automatically by using GraphQL. Avoiding massaging backend data to fit the UI can be solved by introducing new GraphQL fields, which might not make sense from a backend perspective, but do make sense from a UI perspective.

Works great on serverless

As long as you do not use GraphQL subscriptions, running your GraphQL server as a serverless function works great. Since you only use a single endpoint, you will run the entire server as a single function. This gives you all of the benefits from serverless, with little to none of the downsides.

The mistakes/hard parts

GraphQL is not trivial, and implementing it won't be all good. Just as the good parts, I could write tens of blog posts about the mistakes you can make with GraphQL, but I'm just going to mention the biggest ones.

Server side caching is tough

C More is a SVOD service not unlike Netflix, with some personalized data (progress, recommendations, etc), and some public data (series info, episode descriptions, etc). A GraphQL query might include series details, and which episode you are on.

When designing a REST API, it often is clear how "cachable" each endpoint is. The endpoint for series details will be very cachable, and which episode you are on is not.

Since GraphQL is, well, a graph, you probably want to connect these two endpoints to make it possible for users to query which episode they are on for any series. This makes it harder for us to set cache policies – we would not want to recommend the wrong episode due to accidental CDN caching.

There are ways around this, for example Apollo Server has cache directives. In reality we found almost any query contains some private data. It could be recommendations, progress, upsell data, "my list"-status, etc. Having to juggle the cache-control header status for the possiblity of a few CDN cache hits just wasn't worth it for us.

Not questioning backend formats

Not all backend services are designed with the UI in mind, and when migrating from using the backend service directly to proxying it through GraphQL it is easy to just copy the data format the backend service give us.

For example fetching our episodes/movies/series from our search engine, returns an array of objects with a type field, which can take values such as movie or episode. In GraphQL, it makes more sense to actually use GraphQL types to represent that. Sadly, that was not how we implemented it the first time around. We were so used to the old format that we didn't question it.

Wrong return type of mutations

GraphQL mutations is how you edit data in GraphQL (PUT/POST/DELETE in most REST APIs). What do you send as a response? A status code? A message string? Certainly possible, but that makes it impossible for something like React Apollo to update its cache automatically.

By just responding with the correct data type, the clients can ask for whatever they expect to change, and all UI will magically update to their correct state. No state merging code required – it keeps the client code simple.

Schema stitching

Schema stitching is a way to split out your GraphQL implementation and schema across different servers. We tried it, and suffered.

One of the largest pain points that GraphQL solved for us is gathering the microservices into a cohesive graph (it is right there in the name). Splitting the implementation across different servers increases the complexity in how you create the "edges" in your graph, and also the complexity of the entire GraphQL setup. We found that the reduced complexity of each "sub-graph" does not make up for the total increase of complexity. I feel that the fear of "monolithic" GraphQL implementations are promoting a misbegotten concept.

As it stands right now, I think the GraphQL layer should be wide and flat. When you are writing a lot of "business logic" in the GraphQL layer, it probably makes more sense to create a REST-based microservice out of it.

Creating a good GraphQL design is hard. It is hard to find decent sources of information and best practices – everyone is still figuring this stuff out. However I think anyone looking into implementing it should do so, it has great potential to improve your services and developer experience. Just make sure to take your time when designing the schema, getting it right the first time around will save you a lot of headache.

GraphQL experiences from using it at scale (2 Part Series)

1) A year of large scale GraphQL - the biggest takeaways 2) HOWTO: Adopting GraphQL in an existing project


markdown guide

I made an hybrid API and used both REST and GraphQL at the same time. I make fetch queries and mutations through REST and then use only GraphQL Subscriptions from GraphQL. I could use pure WebSockets too, but I liked to make typeDefs and Apollo's client side libs were easy to use.


Interesting! How do you like it? Do you in hindsight wish you did anything different?


So far decisions I made feels quite good actually.

7 month later, has there been changes in this approach? Maybe you have some advice?


With AppSync serverless gql subscriptions are simple and seamless!


It sure looks like cool tech, but I don't really understand how you implement non-trivial resolvers by reading the docs πŸ€”. I'll have to play with it sometime!


You do it with Lambda function.

Neat. Do you have experience with such use cases? Would love to read something more in-depth.

For example; on the top of my head when I'm looking at the lambda resolvers, I am thinking about the dataloader-situation. We make heavy use of dataloaders, which are placed in our graphql context. Those mutable objects can't really be transferred to a lambda function right? How would you go about solving that with AppSync? Would that be another layer of lambdas with batching logic, could something like that even work πŸ˜•?

Edit: found medium.com/@dadc/aws-appsync-the-u... , good read! I wonder if any new updates has changed the scene...

All those things from medium article are true, but most of the limits are reasonable IMO :) I've never used dataloaders, but it's an interesting topic to investigate. Our team uses AppSync heavily, it's possible to create very non-trivial resolvers using VTL only, we are using it heavily with EalsticSearch and DynamoDB.

We are even using AppSync with Redshift (via Lambda), and there we use subscriptions to serve clients with long taking queries - this was the point of my comment, you surely CAN go 100% serverless with GQL subscription and Lambdas :)

Yeah there are definitely still limitations. Not sure how much it has evolved since the update on the article as we just got started with it couple of months ago. It has been so far sufficient for our needs, but yeah data loader sounds like a lot more trickier problem, I'v yet to try to solve that.

Of course, every serverless service has its limitations, but if used wisely it can cover most of modern apps scenarios IMO, and then if your case is complex and needs fine-tuning, then maybe serverless is not for you and you should use something else :)

Hey, that is really cool! Thank you for sharing. If the need arises for subscriptions at scale I know where to look :).


Thanks for the post Peter, if you want, you can use (SOFA)[medium.com/the-guild/sofa-the-best...] in order to expose your graphql-backend as a REST API.

You can also checkout the newly release Apollo Federation for graphql Schema stitching!


Maybe! I like the approach a lot more than stitching for sure. I am a little afraid to try it though due to my history with stitching.


I don't think you should be. Federation is meant to replace it, as stiching has become deprecated now.

I will follow the discussion for sure, and see how people like it! Might just join in a little late on this one if it turns out to be great.


Good one. Been there 10 years ago, same architecture, different jargons: domain query language, stiched query,