Nader Dabit

Posted on Mar 15, 2021

Turning the Cloud Inside Out

#graphql #cloud #serverless #webdev

GraphQL is an open-source data query and manipulation language for APIs, and a runtime for fulfilling queries with existing data. Over the past few years, GraphQL has seen a steady rise in adoption by companies both large and small who want to take advantage of its features that they view as an improvement over traditional REST APIs.

In this post, I will walk through how to leverage GraphQL to build a typed API interface into AWS, and the benefits of doing so.

GraphQL Overview

With GraphQL, you model your business domain as a graph by defining a schema; within your schema, you define different types of nodes and how they connect/relate to one another. On the client, this creates a pattern similar to Object-Oriented Programming: types that reference other types. On the server, since GraphQL only defines the interface, you have the freedom to use it with any backend. GraphQL Docs

Although there are many benefits that GraphQL brings to the table, my favorite piece is the graph itself.

The graph describes the data layer of your API as well as the the available operations for interacting with it, giving anyone who views it a comprehensive understanding of what is happening without a lot of digging.

I also very much enjoy having a typed interface that plays really well with my TypeScript or Dart front ends, providing a more cohesive full stack data layer than I was typically used to in the past.

GraphQL Benefits

In addition to type checking and validation, GraphQL offers a few other benefits that traditional REST APIs do not.

Query Efficiency

For any data returned to the caller, GraphQL prevents over and under fetching by allowing the client to ask for only the data it needs without any additional code to be written on the back end to support it.

This typically improves latency and is ideal for mobile applications or applications that take into consideration payload size for performance reasons.

Real time baked in

Subscriptions are part of the specification, and they extend GraphQL's mutation implementation in that they still only allow the data the client needs to be returned from events as they happen.

Fewer API Calls

With GraphQL you can execute multiple operations with a single API call. For example, let's say you have an onLoad query and you need to fetch data for a signed in user as well as products for an E Commerce app. These two queries are separate queries, meaning you can access them individually.

With GraphQL you can also decide to send them together into a single call enabling fewer network requests to your back end.

API Documentation

GraphQL APIs are self documenting. Schema introspection is built into GraphQL enabling developers to ask a GraphQL schema for information about what queries it supports

Consistency

Though all of these points are great and important, consistency is the one I'd like to drive home in this writing as it relates heavily to the thoughts and points I'm about to make.

As a developer, and once I'm familiar with GraphQL, I can dive into any GraphQL Schema and immediately understand what is going on. I can become productive by seeing the graph itself and reading the different types as they relate to the application. I can view the queries and mutations to understand how I can interact with the data sources, and I can even see the subscriptions to understand what real-time updates the client should expect.

I can learn one thing and then become productive with any GraphQL API not only in my company, but also in any company or with any GraphQL implementation that exists in the developer ecosystem.

Why is this important? For me it's all about efficiency. I have a talk titled "Programming in Laziness" that goes into this in depth, but the general idea for me is to learn something that has the lowest risk vs time to reward factor.

In my career I've specialized in learning things that enable me to do the most with the least. With React, I can learn one thing and build dozens of types of applications. With JavaScript, I can build front end, back end, machine learning, and pretty much anything else.

I bucket GraphQL into this same space. By learning GraphQL, I'm able to dive into any application that uses it, and the benefits come with it without having to spend any more time learning something else.

Anatomy of a GraphQL API

A GraphQL API is made of 3 main parts - the schema, the resolvers, and the data sources.

The schema defines your data and the mutations, queries, and subscriptions to interact with your data.

The resolvers can be thought of as functions that define the business logic that map the GraphQL operations in your schema to your data sources.

The data sources are the databases, microservices, serverless functions, or http endpoints for storing or retrieving data.

GraphQL Adoption

GraphQL is mainstream. Here are a few companies using GraphQL:

AWS
Netflix
Twitter
Shopify
Lyft
GitHub
Pinterest
Quora
Peleton
Automattic
Target
Twitch
Facebook
Starbucks
NBC
Paypal
Uber
GoDaddy
Reddit

Developer sentiment is important to me in gauging what will continue to be "a thing" and what will eventually die out. At the end of the day, if developers enjoy doing or using a certain thing, it's much more likely to not only stick around but get even better.

Let's have a look at GraphQL sentiment from the latest state of JS survey.

Link to survey results

Satisfaction, interest, usage, and awareness ratio rankings.

Positive/Negative Split

These survey results do not tell the entire story, but what they do tell me is that developers enjoy using GraphQL, and that tells me a lot. I care about what developers care about because I care about understanding where the industry is headed so that I am headed in the same direction (this is a personal preference).

What I can say for sure is that GraphQL is in a good place and will continue to see more and more adoption in the future.

GraphQL as an API Gateway

What am I getting at in this post? Well now that I've sold you on the present and future of GraphQL, and before I talk about how the cloud fits into any of this, I want to lay a foundation by talking about a few ways in which developers are using GraphQL as an API Gateway.

Microservices

In the most basic form, GraphQL is already being leveraged widely as an API gateway, replacing traditional API gateways.

GraphQL is a great fit for micro-services and other complex systems as it hides the complexity and unifies everything into a single data graph.

By simply replacing your legacy API gateway with GraphQL, you immediately inherit everything that GraphQL has to offer while still providing all of the things you'd expect out of this layer. This is a perfect use case and one that well understood by the community and industry at this point.

That's cool, but let's look at some more interesting use cases.

Third Party APIs

Sean Grove of One Graph is building some of the coolest shit in the industry right now.

With OneGraph, you can query data from literally dozens of third party APIs in a single graph. APIs like Twitter, Dev.to, Google, GitHub, Airtable, and many others.

As an app developer, I no longer have to dive into each individual set of documentation to understand how everything works. As an app developer, I no longer have to dive into each individual set of documentation to understand how everything works!

This is huge for me. Reading through and understanding different documentation for different APIs is a huge time suck, but it doesn't have to be this way. If only there was a single standardized way for us to build API and enforce consistency throughout the industry.

Turns out there is. People are already taking advantage of this idea, and they are building interesting and important things.

OneGraph also enables authentication and outputs real & usable client-side code so that I can implement the features I want in a ridiculously short amount of time compared to legacy REST APIs.

Blockchain / Ethereum

In the blockchain world, decentralized apps (dApps) are becoming more and more prevalent. As opposed to centralized apps that run on a single server, decentralized apps run on a network of computers peer-to-peer. These apps are built on blockchains like Ethereum.

There are thousands of dApps that have been built, most in industries or categories like finance, gaming, or digital art.

As it stands, you can’t really build great applications directly on top of blockchains. The problem is that you need to have data indexed and organized for efficient retrieval.

Traditionally, that’s the work that databases do in the centralized tech stack, but that indexing layer was missing in the decentralized web (web3) stack.

This is where The Graph fits in.

The Graph is an indexing protocol for querying networks like Ethereum and IPFS. Anyone can build and publish open APIs, called subgraphs, making data easily accessible.

Before The Graph, teams had to develop and operate proprietary indexing servers. This required significant engineering and hardware resources and broke the important security properties required for decentralization.

The Graph solves that problem by offering this consistent indexing protocol, and it does so using GraphQL. Anyone can build and publish open APIs, called subgraphs, making data easily accessible.

Most large companies building dApps are now using The Graph, including Compound and Uniswap.

If you understand GraphQL, you can query data from thousands of subgraphs which are essentially performant Ethereum APIs.

Cloud

Now that we've talked about a few other use cases, let's have a look at the cloud.

When I first looked at the AWS dashboard, I was overwhelmed. The tradeoff to offering so many powerful services is that it's a lot to take in for a new developer.

From the perspective of an app developer, I had no clue what any of this stuff meant or how I could leverage any of this stuff do build the things I wanted to build.

As an app developer, what I wanted to know was:

How to I implement authentication
How do I securely store and fetch data
How do I store things like images and videos

From there, I am 80% of the way done for almost any app that I am building. I am interested in databases and APIs and how everything fits together.

As a developer, I want to be able to save and fetch data.

As a GraphQL developer, I want to define a data type, a query, and a mutation and make it all work together.

I may need various data sources for my app, like NoSQL for my main database, Elasticsearch for complex geospatial queries, PostgreSQL for complex relational data, and even a service like Rekognition for image and video analysis with machine learning.

GraphQL helps solve this elegantly in the AWS ecosystem.

Cloud APIs

When build with cloud APIs on AWS, you have to take a few things into consideration.

Execution environment
Permissions
The AWS SDK

Execution environment

You can choose one of two types of execution environments: serverful or serverless.

Serverful

When you control your own server, you also have control over all aspects of the server environment. This means that you are in charge of reliability, maintenance, scalability, and everything else that goes along with traditional server management. This comes with benefits as well as tradeoffs.

In GraphQL, this means you are responsible for implementing subscriptions, caching, security & authentication, and making it all resilient and scalable.

Serverless

The tradeoffs for serverless are that you are limited to the API surface provided to you by the service, meaning that you have less control. You also are limited in long running tasks, typically to around 15 minutes at most.

In exchange you do not need to worry as much about scalability or server management and you will typically have less code to maintain. You will also be charged for usage vs provisioned infrastructure - trading a capital expense for a variable expense.

There are two main approaches to serverless GraphQL on AWS, but the one I'll be focusing on is AWS AppSync as it provides a lot of other things out of the box like built-in security, authentication, authorization, subscriptions at scale, and caching.

AppSync also provides and manages your GraphQL endpoint.

Permissions

Once you've chosen your execution environment, how do we talk to the different services and databases that we need in our app? This is done using something called Identity and Access Management, or IAM. IAM was confusing to me until I understood what was going on.

At the end of the day, I just wanted my Lambda function to be able to talk to my database. IAM enables the permissions between services to talk to each other. With a couple of lines of IAM permissions configured, I was able to do whatever I wanted.

By default, all requests are implicitly denied. If I want to talk to DyanmoDB from my Lambda function, the operation will be rejected unless I set the proper permissions.

To make this work, I need to say "hey, I want my function to be able to create, read, and delete items from this database". I can also say things like, "hey, I want my lambda function to be able to perform any action at all in this particular database". Once these permissions are enabled, you will be able to send requests to and from the database. (I also recommend watching the video at the end of this tutorial to see this being implemented end to end).

With the proper permissions set in your API, server, or serverless function, you are allowed to talk to any database or service that you would like to.

The next question for me was this: "Ok, now that I know that I am allowed to talk to the database, how can I actually talk to the database?".

Direct interactions in AppSync

There are two main answers to this. In AppSync, you can talk directly to DynamoDB, ElasticSearch, or Amazon Aurora.

What is also really powerful is that you can map your GraphQL operations directly into a Lambda function and from there interact with almost any AWS service using the AWS SDK.

AWS SDK

There are Node.js, Go, Python, C++, PHP, Ruby, Java, and .NET versions of the SDK.

All I need to do is import the AWS SDK and I can start talking to any service that I have permission to talk to:

// scan a DynamoDB table
const { DynamoDBClient, ScanCommand } = require("@aws-sdk/client-dynamodb");

const dbclient = new DynamoDBClient();

const command = new ScanCommand({ TableName: "PRODUCT_TABLE" });
const data = await dbclient.send(command);

I can also talk to services that have permissions enabled, like Rekognition:

// process image for labels in Rekognition
const { RekognitionClient, DetectLabelsCommand } = require("@aws-sdk/client-dynamodb");

const client = new RekognitionClient();

const command = new DetectLabelsCommand({
  Image: {
    S3Image: { Bucket: "mybucketname", Name: "myImageName" }
  }
});

Putting it all together.

Putting all of these things together was, to me, the hardest part about all of this. Each of these things made sense individually, but tying them all together was tough.

At the end of the day you typically just want to do this:

Define your GraphQL schema
Create or enable your data sources / databases / services
Map your GraphQL operations into these data sources

To make this easier, there has been an advancement in tooling over the past few years to improve how these pieces are all integrated together.

A few options that do this well are AWS CDK, the Serverless Framework, and AWS Amplify. All of these options enable doing things like creating the proper permissions between services and execution environments much easier, and sometimes do all of it for you under the hood.

With Amplify, the CLI will enable you to set up a GraphQL API and database and wire up the proper permissions in a few steps and by providing an annotated GraphQL schema.

CDK and the Serverless Framework allow you to write concise infrastructure as code, enabling you to not only set up the services and databases but also enable the proper permissions.

For instance, with CDK, we can create an API and database with AppSync and TypeScript in about 70 lines of code.

import * as cdk from '@aws-cdk/core';
import * as appsync from '@aws-cdk/aws-appsync';
import * as ddb from '@aws-cdk/aws-dynamodb';
import * as lambda from '@aws-cdk/aws-lambda';

export class AppsyncCdkAppStack extends cdk.Stack {
  constructor(scope: cdk.Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    const api = new appsync.GraphqlApi(this, 'Api', {
      name: 'cdk-notes-appsync-api',
      schema: appsync.Schema.fromAsset('graphql/schema.graphql'),
      authorizationConfig: {
        defaultAuthorization: {
          authorizationType: appsync.AuthorizationType.API_KEY,
          apiKeyConfig: {
            expires: cdk.Expiration.after(cdk.Duration.days(365))
          }
        },
      }
    });

    const notesLambda = new lambda.Function(this, 'AppSyncNotesHandler', {
      runtime: lambda.Runtime.NODEJS_12_X,
      handler: 'main.handler',
      code: lambda.Code.fromAsset('lambda-fns'),
      memorySize: 1024
    });

    // set the new Lambda function as a data source for the AppSync API
    const lambdaDs = api.addLambdaDataSource('lambdaDatasource', notesLambda);

    // create resolvers to match GraphQL operations in schema
    lambdaDs.createResolver({
      typeName: "Query",
      fieldName: "getNoteById"
    });

    lambdaDs.createResolver({
      typeName: "Query",
      fieldName: "listNotes"
    });

    lambdaDs.createResolver({
      typeName: "Mutation",
      fieldName: "createNote"
    });

    lambdaDs.createResolver({
      typeName: "Mutation",
      fieldName: "deleteNote"
    });

    lambdaDs.createResolver({
      typeName: "Mutation",
      fieldName: "updateNote"
    });

    // create DynamoDB table
    const notesTable = new ddb.Table(this, 'CDKNotesTable', {
      billingMode: ddb.BillingMode.PAY_PER_REQUEST,
      partitionKey: {
        name: 'id',
        type: ddb.AttributeType.STRING,
      },
    });

    // enable the Lambda function to access the DynamoDB table (using IAM)
    notesTable.grantFullAccess(notesLambda)

    notesLambda.addEnvironment('NOTES_TABLE', notesTable.tableName);

  }
}

We then define our GraphQL schema:

type Note {
  id: ID!
  name: String!
  completed: Boolean!
}

input NoteInput {
  id: ID!
  name: String!
  completed: Boolean!
}

input UpdateNoteInput {
  id: ID!
  name: String
  completed: Boolean
}

type Query {
  getNoteById(noteId: String!): Note
  listNotes: [Note]
}

type Mutation {
  createNote(note: NoteInput!): Note
  updateNote(note: UpdateNoteInput!): Note
  deleteNote(noteId: String!): String
}

type Subscription {
  onCreateNote: Note
    @aws_subscribe(mutations: ["createNote"])
  onDeleteNote: String
    @aws_subscribe(mutations: ["deleteNote"])
  onUpdateNote: Note
    @aws_subscribe(mutations: ["updateNote"])
}

This sets up:

A new AWS AppSync GraphQL API
A Lambda function
Resolvers to map GraphQL operations into the Lambda function
A DynamoDB Table
An environment variable in the Lambda function referencing the DynamoDB table name
Permissions for the Lambda function to interact with the DynamoDB table
Subscriptions for real-time updates for create, update, and delete operations

The only thing left to do here is write the business logic for the Lambda functions. See end to end code example here or the workshops here and here.

Inside out

Using this approach, you can build graphs that expose the power of the cloud using an infinite combination of cloud data sources.

Most exciting to me is that you can make any part of the cloud real-time and type safe, with discoverable API documentation.

Conclusion

I recommend trying to build out an API using the below video or one of the above tutorials and seeing for yourself how all of this fits together.

To drive the idea home, I also would like to plug an article by Slobodan Stojanović titled The Power of Serverless GraphQL with AWS AppSync where he lays out his own reasons why this approach works really well for cloud APIs. I'd also suggest checking out Yan Cui's article comparing AppSync to a traditional API gateway.