DEV Community

loading...
Cover image for Headless SEO: WordPress, GraphQL, schema.org and typescript

Headless SEO: WordPress, GraphQL, schema.org and typescript

shnydercom profile image Jonathan Schneider (shnyder) Updated on ・9 min read

Let's be lazy:

Here's a repo with a basic setup that maps from wpgraphql to json-linked-data in the schema.org vocabulary through a type-safe function. If that sounds complicated, here's the long explanation:

section-jsonld

Intro to schema.org

To be more easily discoverably by search engines, WebSites can add structured data to their content. In this way google find's a site's own Ratings:
showing google search results with some rating stars

But it is also used to make the Web more interactive. Emails can include data about possible Actions, which GMail turned into this "View Pull Request" button here, so you don't have to open, read and click a link in the email:
gViewPR_border

The examples above are using a common, open vocabulary, schema.org, so other email-services could understand the structured data in github's email as well. More importantly, we can add data structured in json-ld and other formats to our own web pages, and for example make it easier for people to find an Event we're hosting close to them:
gEvents_border

I don't know how many readers are hosting events, but I guess it would be easier to let Event hosts fill in the data dynamically, for example with WordPress.

Isn't there a plugin for it?

The short answer is Yes, many. This article is for typescript developers who work with GraphQL, we'll only use WordPress as a headless Content Management System (CMS). This article doesn't even include a single line of PHP, instead we'll generate typescript from a GraphQL schema. This approach actually works without WordPress, but it's easier to start with it.

section-wpgraphql

From WordPress to GraphQL endpoint

The WPGraphQL Plugin is quickly installed from the admin panel of your WordPress installation.
Wordpress Admin page: Adding a new plugin
You don't have to leave the Admin Panel to play around with queries either, you get a GraphQL IDE right inside WordPress:
GraphqlIDE
That query retrieves all the posts in a given category, a short excerpt and a featured image - ideal to create a preview for a blog post.

section-gql-schema

From GraphQL endpoint to GraphQL schema

Now this is security relevant: When we have a graphql API, it can tell a client about its structure with a so-called introspection query. You can generate a graphql schema with such a query, if you don't already have access to the graphql schema. An attacker could do the same and try to find vulnerabilities in your API, that's why it's better to have it switched off in production. WPGraphQL has this switched off by default, you need to go into the settings to switch it on:
GraphqlPublicIntrospection

While it's switched on, we can use standard functionality from the graphql package and node.js to generate the schema and save it locally.

import fs from "fs";
import fetch from "node-fetch";
import {
    getIntrospectionQuery,
    printSchema,
    buildClientSchema,
} from "graphql";

/**
 * runs an introspection query on an endpoint and retrieves its result
 * thanks to this gist:
 * https://gist.github.com/craigbeck/b90915d49fda19d5b2b17ead14dcd6da
 */
async function main() {
    const introspectionQuery = getIntrospectionQuery();
    const response = await fetch("https://blog.example.com/graphql", {
        method: "POST",
        headers: {
            "Content-Type": "application/json",
        },
        body: JSON.stringify({ query: introspectionQuery }),
    });
    const { data } = await response.json();
    const schema = buildClientSchema(data);
    const outputFile = "./wpgraphql-schema.gql";
    fs.writeFileSync(outputFile, printSchema(schema));
}

main();
Enter fullscreen mode Exit fullscreen mode

Replace blog.example.com with your domain or localhost, whereever your graphql endpoint can be found. To be able to run that script, you need to set "type": "module" in your package.json (because it's in ES6 syntax).
And once you have the wpgraphql-schema.gql file, please, switch off introspection.

section-gql2ts

From GraphQL schema to typescript

With the wpgraphql-schema.gql file we can already create typescript types. In order for that to work we'll add graphql-generator to our project:

yarn add @graphql-codegen/cli @graphql-codegen/typescript @graphql-codegen/typescript-operations -D

To run the generator, it needs a little bit of configuration, this is done in the codegen.yml-file:

overwrite: true
schema: "./wpgraphql-schema.gql"
documents: "src/**/*.graphql"
generates:
  src/generated/wp-graphql.ts:
    plugins:
      - "typescript"
      - "typescript-operations"
Enter fullscreen mode Exit fullscreen mode

The first line tells it to overwrite existing .ts-files, the second one points to the graphql schema we just got from the endpoint, and the third is the glob-pattern or file where our queries, mutations, fragments and subscriptions can be found. If we don't have any "documents", we can still run the generator with:

yarn graphql-codegen --config codegen.yml

and get types for our graphql schema. So getting all graphql schema-types is configured with the plugins: - "typescript" part. Getting queries etc is configured with the - "typescript-operations" part. That adds types for all "documents" it finds to our .ts-file. You could also generate separate .ts-files, this is up to you (to split server-features from client-features like queries and fragments, for example).

We could already use the wpgraphql Post-type to create a mapper-function to schema.org/BlogPosting. However, that type is pretty large and we might not always query all fields for a given blog post. PostTypeOverview
As you can see, we don't need to write documentation on the types from wpgraphql, it gets the documentation from the graphql schema.

To simplify our queries, we can create a graphql fragment like so:

fragment PostPreview on Post {
  title
  excerpt
  featuredImage {
    node {
      sourceUrl
      altText
      description(format: RAW)
      srcSet(size: THUMBNAIL)
    }
  }
  slug
}
Enter fullscreen mode Exit fullscreen mode

and now any time we want to query these exact same fields we can just reference it in a query:

query wpPostPreviewByCategory {
  posts(where: { categoryName: "my-side-projects" }) {
    edges {
      node {
        ...PostPreview
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

That query gets us the posts from a particular category, but only the fields from the fragment for each post. This is great for reusability in graphql-queries, and graphql-codegen generates the type PostPreviewFragment for us, so we can create a mapper-function that consistently works across queries.

We can now begin with the mapper-function:

import { PostPreviewFragment } from "./../generated/wp-graphql";
//other imports

export function mapWpPostPreviewToSchemaBlogPost(
    input: PostPreviewFragment,
    wpBaseURL?: string
): BlogPosting {
    // ...curious about wpBaseURL and BlogPosting? We'll use that to build our json-ld
}
Enter fullscreen mode Exit fullscreen mode

section-ts2jsonld

From typescript to schema.org

Here's a good example of what a Blogposting can look like with json-ld and other formats that mix with HTML. The benefit of adding a script-tag with json-ld is that we can generate our website - the un_structured data - in another way than our publicly available _structured data. This gives you more flexibility, especially since it's allowed to add several script-tags with type="application/ld+json" to web pages.

Of course, for the human readers of our website we would want something visual. For our blog post we might display a link, an image, the post's title and a short preview. The machines reading our website don't know which is what. To help them read it properly as a BlogPosting, the equivalent json-ld would look like this:

<script type="application/ld+json">
{
  "@context":"https://schema.org",
  "@type": "BlogPosting",
  "@id": "https://shnyder.com/business-model-canvas-for-metaexplorer",
  "name": "Business Model Canvas for MetaExplorer",
  "abstract": "<p>Before starting out with my plans to turn metaexplorer into a business, I wrote a business plan &#8211; and because 2020 showed us what it thinks of plans&#8230;</p>\n",
  "image": {
    "@type": "ImageObject",
    "name": "Metaexplorer business model canvas, 2020-12-28",
    "contentUrl": "https://shnyder.com/wp-content/uploads/2020/12/IMG_20201228_234552-scaled.jpg",
    "thumbnailUrl": "https://shnyder.com/wp-content/uploads/2020/12/IMG_20201228_234552-150x150.jpg"
  }
}
</script>
Enter fullscreen mode Exit fullscreen mode

As you can see in the example's "@id"-field and image urls, it includes a domain name - my blog's domain. Relative URLs are also possible in json-ld, so on shnyder.com I could use this:
"@id": "/business-model-canvas-for-metaexplorer"
Here on dev.to I would want to include the full domain name. To let our mapping-function include my domain, we add the optional parameter wpBaseURL.

So how do we get the json-ld from the graphql types? The library schema-dts helps us do exactly that, and in a typesafe way. That means we get intellisense on the actual schema.org vocabulary (ontology, even).
schema-dts-intellisense

You can install it with:

yarn add schema-dts

A naive approach would be to map only the "happy path" from graphql to json-ld, where we always have the data that we want and need. That approach would look like this:

import { PostPreviewFragment } from "./../generated/wp-graphql";
import { BlogPosting } from "schema-dts";

/**
 * mapper-function to create schema.org/BlogPosting(s) for previews from fragments
 * @param input a fragment of a WordPress blog post
 * @param wpBaseURL the base domain of your wordpress installation. Used to add the slug
 */
export function mapWpPostPreviewToSchemaBlogPost(
    input: PostPreviewFragment,
    wpBaseURL?: string
): BlogPosting {
    let featuredImgNode = input.featuredImage.node;
    let thumbnailUrl = featuredImgNode.srcSet
        .split(" ")
        .find((val) => val.startsWith("http://"));
    let output: BlogPosting = {
        "@type": "BlogPosting",
        ...() => {wpBaseURL ? {"@id":`${wpBaseURL}/${input.slug}`} : undefined},
        name: input.title,
        abstract: input.excerpt,
        image: {
            "@type": "ImageObject",
            description: featuredImgNode.description,
            name: featuredImgNode.altText,
            contentUrl: featuredImgNode.sourceUrl,
            thumbnailUrl
        }
    };
    return output;
}
Enter fullscreen mode Exit fullscreen mode

And this does compile in a standard typescript configuration. The thing is that we can't always be sure to get a preview image, and other fields might be empty as well. Handling empty fields will prevent errors in the future. So for the mapper function, we need to set compilerOptions.strict to true in our tsconfig.json. If we try to compile again, we'll receive lots of errors such as:

- error TS2533: Object is possibly 'null' or 'undefined'.
and
Type 'null' is not assignable to type 'string | PronounceableTextLeaf | readonly PronounceableText[] | undefined'.

Since we don't want something like "title": undefined in our json-ld, we have to remove the fields from the output-object altogether. The first thought to fix this might be to add lots of if-statements. But with the recent advances in ES6 and typescript, we have the option to create the output in a way that looks very much like a nested json-ld-object. Ant such a nested object is what we want to output. The main difference is that this nested object includes conditions for safe mapping as well:

/**
 * mapper-function to create schema.org/BlogPosting(s) for previews from fragments
 * @param input a fragment of a WordPress blog post
 * @param wpBaseURL the base domain of your wordpress installation. Used to add the slug
 */
export function mapWpPostPreviewToSchemaBlogPost(
    input?: PostPreviewFragment,
    wpBaseURL?: string
): BlogPosting | null {
    if (!input) return null;
    let featuredImgNode = input && input.featuredImage && input.featuredImage.node;
    let thumbnailUrl = featuredImgNode && featuredImgNode.srcSet && featuredImgNode
        .srcSet.split(" ")
        .find((val) => val.startsWith("http"));
    let output: BlogPosting = {
        "@type": "BlogPosting",
        ...(wpBaseURL && { "@id": `${wpBaseURL}/${input.slug}` }),
        ...(input.title && { name: input.title }),
        ...(input && input.excerpt && { abstract: input.excerpt }),
        ...(featuredImgNode && {
            image: {
                "@type": "ImageObject",
                ...(featuredImgNode.description && {
                    description: featuredImgNode.description,
                }),
                ...(featuredImgNode.altText && {
                    name: featuredImgNode.altText,
                }),
                ...(featuredImgNode.sourceUrl && {
                    contentUrl: featuredImgNode.sourceUrl,
                }),
                ...(thumbnailUrl && { thumbnailUrl }),
            },
        }),
    };
    return output;
}
Enter fullscreen mode Exit fullscreen mode

23 lines without null-checks, compared to 32 lines with strict type checks. Plus, this notation scales into nested structures. Not much extra effort - but the reliability has improved greatly. Let's look at it the code detail. First, the output is an Object typed "BlogPosting", and it has to include a "@type": "BlogPosting". Typescript types are lost during compilation, whereas "@type" gets baked into json:

let output: BlogPosting = { "@type": "BlogPosting" };
Enter fullscreen mode Exit fullscreen mode

The three dots that appear across the function are spread operators. Those operators are trying to spread the result of the evaluation in brackets ().

...(input.title && { name: input.title }),
Enter fullscreen mode Exit fullscreen mode

Inside the brackets there's a logical operator, which returns the second part if the first one is truthy, otherwise the first one. That means if one of our input fields like input.title is undefined, null or an empty string, this is what the spread operator gets to see:

let mytestvar = {...undefined};
console.log(mytestvar);
mytestvar = {...null};
console.log(mytestvar);
mytestvar = {...""};
console.log(mytestvar);
mytestvar = {...false};
console.log(mytestvar);
mytestvar = {...true};
console.log(mytestvar);
//always an object with no fields
Enter fullscreen mode Exit fullscreen mode

This way of converting is helpful to secure our string-based values, but it's not a magic syntax for API conversion.

However, if we do have an input.title, the second expression { name: input.title } is an object, and its key(s) and value(s) are added to the containing object.

Conclusion

Now you have a pipeline to build structured data from your graphql API, independent of which frontend technology you choose. Structured data can have a good effect on SEO, some of it is displayed as special widgets in google. With schema-dts you don't only avoid typos while you build structured data, your IDE will help you find the right fields for your @types.

If you're thinking about where to put those conversions in your architecture, maybe ideas like the middleman engine can even help improve the handling of data in your organization as a whole.

Discussion (0)

pic
Editor guide