GraphQL is currently one of the most frequently mentioned technologies when it comes to innovation in the API economy. Adopters enjoy the ease of use and tooling like for example GraphiQL, the browser-based user interface to try out any GraphQL API. The whole experience of GraphQL is exactly what frontend-developers need to build amazing interactive web applications.
However, with the rise of adoption, I'm starting to get more and more concerned about the way people understand GraphQL and use it. In this post, I'd like to share my unpopular opinion on what GraphQL really is meant to be and why you should be concerned if you're using it the popular-but-risky way.
Let's take a step back and discuss APIs and API styles in general before answering the main question of why you're probably using GraphQL the wrong way.
APIs offer a way to hide the complexity of the implementation behind a user-friendly interface. For example, a shopping basket can have methods to add and delete items or to move forward to the checkout. As a user of this shopping cart API, you don't have to think about how the data gets stored or what exactly happens when you add or remove an item.
Over the last few decades various styles of APIs have emerged, all with different implementations, depending on the use cases.
If you'd like to choose the right API style for a problem, you also have to consider how the API gets published and used. Do you know all your users and use cases? Are these users part of your own organization? Are they partners? The answers will most probably influence your choice of the API style and implementation, doesn't it?
The last sentence is where I think we get it wrong a lot of the time. I see people all over the place choose the API style and implementation long before the important questions were answered.
The current most popular example of this behaviour is GraphQL. Are you building a modern single page application with React? Awesome, use GraphQL! Facebook, Airbnb, Paypal, Netflix, they all do it so it must be a good fit.
Why don't we see more discussions around choosing the right technology for a given problem? I assume it's a lack of education, but I'm not sure on this one. If you have any relevant degree, you might respond to this with your experience of education on APIs.
Always keep in mind, if you use Facebook-scale tooling without having a Facebook-scale organization and Facebook-scale problems, you might realize painfully that you're using a sledgehammer to crack a nut. It's the same reason why chaos monkey makes sense for Netflix while it doesn't for your two docker containers running on a 5$ machine on digital ocean.
I assume this was one of the main drivers at GitHub to choose GraphQL as an implementation of the query-based API style for their new API. Their API is publicly available. They have big numbers of API consumers, all with different requirements. They can't build resource-based APIs that satisfy all of their users. In this particular use-case, GraphQL might actually be a good choice. Instead of trying to solve each problem, they rather offer a generic GraphQL API.
What are the trade-offs that GitHub is willing to accept when publicly exposing a GraphQL API? They have a whole team behind their GraphQL API, making sure you, the user, does not accidentally or intentionally break their systems. You can watch videos of them talking at conferences about the complex systems they built to secure their API and keep it stable. They've built tooling for GraphQL specific analytics to get better insights into API usage.
I assume that many developers with a focus outside of security have little experience on what it takes to secure a REST API exposed on the internet. Most of us have little experience implementing authentication, authorization, rate limiting etc. . However, I think securing a RESTful API is rather simple, compared to a GraphQL API. Any HTTP-based API framework lets you define your routes and attach standardized middlewares to solve the problems listed above. A single HTTP call always corresponds to a single call on the controller of an API. With GraphQL on the other hand, a single Query might result in thousands of calls on the controllers (resolvers) of the API. There is no simple way to solve this problem.
Depending on the language you use, various libraries are trying to help you with the issue. How trustful are these libraries? Do you fully understand how they work? Are there edge cases we're not yet fully aware of?
Are you a single developer working on a side project? Do you benefit as much as you're expecting from using GraphQL? Are you using many different clients with different data needs? Do you really need a query-based API? What's your strategy to combat the problems listed above?
You might be thinking that your GraphQL API is not really exposed. It's used on your website, but you don't show the playground anywhere. If you're using a GraphQL client in the frontend that directly talks to your GraphQL API, this API is exposed, even if not visually exposed with a GraphQL playground.
Do you allow any client to invoke the introspection Query? Are you leaking sensitive information through the introspection Query? Are you planning a new feature on the UI which will be made public in a few weeks or months? Is this feature already visible to your competition if they look at your schema? What if someone scrapes your schema every day to track changes and try attacks whenever you update your schema?
Are you aware of schema traversal attacks? A user might be allowed to see his own account balance, but how about his/her friends? Is it possible to traverse the schema in a way you didn't anticipate which leaks data? How do you test for this kind of behaviour and ensure it's not possible for your own schema?
Is there a reason why companies like Shopify participate in bug bounty programs? They seem to be aware of the complexity of securing a GraphQL API. They invite security experts to help them make their publicly available GraphQL API more secure. Do you realize that your GraphQL API is as vulnerable as Shopify's?
How to make a system 100% secure to any kind of remote attack? If you want to be 100% safe, you should consider unplugging the network cable. However, this comes with some inconvenient drawbacks. You probably don't want to store your GraphQL query on a USB dongle, walk to the remote computer and execute it manually, then copy the response back on the dongle and walk back to your own computer.
What's in between an unplugged network cable and exposing GraphQL? How about reducing the complexity to the level of a REST or RPC-based API while keeping the advantages of a query-based API?
If we primarily use GraphQL on the server to define JSON-RPC APIs, we get the best of both worlds. The flexibility of GraphQL combined with the security and predictable performance of an RPC-based API.
The GraphQL spec allows us to define multiple Operations (Queries, Mutations, Subscriptions) in a single GraphQL document. In addition to this, the validation rules of the spec require all Operations in a GraphQL document to be named. There's just one exception which allows a single anonymous Query. But in case the number of operations in a document are above 1 we're already forced to name our Operations. Another important requirement is that all Operation names must be unique. That is, there shall be no two Operations with the same name.
The design of the GraphQL specification alongside with the validation rules builds a perfect foundation for what we're trying to achieve here.
If we want to define a new JSON-RPC API, all we have to do is create a new file containing a set of GraphQL Operations. Each Operation has a unique name. This name becomes the function name of the JSON-RPC. The Operation variables become the input of the RPC call.
Next, we can "deploy" all Operations on our API backend and prepare the RPC Endpoints. Finally, based on the Operations and the known RPC-Endpoints we're able to generate a client that knows about the schema as well as all RPC endpoints.
input and outputs are typesafe
the attack surface is reduced
you know all the Operations a client is using
the generated client is very small, compared to a thick GraphQL client, which leads to smaller JS bundle size
less bandwidth usage because we're not sending Operations but just making RPC calls
Query Parsing, Normalization & Validation happens at compile-time, not at runtime, making it more secure and performant
no exposed GraphQL endpoint and therefore no exposed introspection either
graph traversal attacks are impossible as the graph is not exposed anymore
you know in advance when a change to the schema or one of the Operations would break a client and can mitigate this
JSON-RPC turns any GraphQL Query into a GET request and therefore makes them easily cacheable at the transport layer
because Operations are stored on the backend and never exposed to the client, you're able to put authorization logic into the Operations
you can no longer use your favourite GraphQL client
you have to run a code generator whenever you update the Schema or any of the Operations
you need to compile and store all Operations on the API backend
this approach does only work when the API user is allowed to prepare and store Operations on the API backend
Using GraphQL as a framework for building JSON-RPC APIs is not a solution to every problem. There are situations where it's not feasible or simply technically impossible. However, a lot of GraphQL users can benefit from this approach, as it increases security and performance at marginal costs.
For most of you, the answer to this question will probably be "no". At WunderGraph, our goal is to make APIs easy to use, secure and performant. We've already implemented the approach outlined above. If you don't want to re-invent the wheel, we'd be more than happy to work with you together. We focus on the plumbing so you can solve problems of your own business domain and not waste your time.