Sacha Greif

Posted on Jul 28, 2023

How the Devographics Surveys Are Run, 2023 Edition

As we're getting ready to launch even more surveys in 2023, I thought this would be a good time to write my usual overview of how the State of JavaScript, State of CSS, etc. are run.

Maybe you'd like to help out with our work, or maybe you're just curious about how our whole setup works. If that's the case, read on!

Previously: How Devographics Surveys Are Run, 2022 Edition

Launching a New Survey

Let's imagine we want to launch a brand new survey about developer's morning habits. Do they like tea or coffee? What are their feelings on french toast? Welcome to the State of Breakfast Developer Survey 2023!

Step 1: The Survey Config

First, we'll need to define some key informations that will then be reused by all our different apps. This includes the survey id, name, and so on.

We do this once for the survey itself, and then once per survey "edition" (the 2023 edition, the 2022 edition, etc.).

For example, here are the YAML config files for the State of CSS survey and its editions.

Crucially, these config files also include the survey outline, which is what contains the actual questions of the survey.

How You Could Help

Come up with the initial draft of the outline
Collect feedback from the community

Step 2: Locales & Entities

Because everything we do has to support multiple languages, we never store text strings in our YAML outlines – instead we store canonical IDs, which we then localize in the user's language.

This means that if we have a favorite_beverage question, we'll need to add the { key: "breakfast.favorite_beverage.question", t: "What is your favorite beverage?"} object to our en-US locale

We also need to add any new entities, which is basically any "thing" that appears in the survey. For example, if "Coca-cola" is one of the selectable options for that beverage question, we might need to add an entity such as:

- id: coca_cola
  name: Coca-Cola
  homepageUrl: https://us.coca-cola.com/

This will then let us link that item to the Coca-cola homepage whenever it appears in the result, and also helps with normalization (but more on that later).

We have close to 2000 entities for things such as CSS features, JavaScript frameworks, websites, and people… (nearly) all of which were entered manually by myself!

How You Could Help

Enter missing translation strings
Translate the content into your own language
Add missing entities
Add more metadata (resource links, code examples, etc.) to existing entities

Step 3: The Survey Form

Once we have our survey outline, we're ready to have people take the survey.

This happens through the surveyform app, which is a Next.js app that lives in our big pnpm monorepo.

You can think of this app as the equivalent of Google Forms or TypeForm, except it's completely custom-made which makes it possible to add cool features such as support for multiple languages, autocomplete for some questions, and our “Reading List” feature that lets you save items you want to learn more about for later.

surveyform stores in data in our private MongoDB database. At this point the data looks pretty rough. For example, if we have a freeform textfield called "Other Beverages" it will simply hold the user's raw answer, such as:

{
  "breakfast2023__favorite_beverage__freeform": "I like hot cocoa and orange juice"
}

How You Could Help

Test the survey
Help us improve the UX of the survey app
Improve accessibility
View open issues

Step 4: Normalization

Once the survey is closed, the next step is to “normalize” the responses into a more usable format. This is when we eliminate any private information that shouldn't end up in the public dataset (although we don't collect any at the moment), remove empty responses, and generate computed fields such as a response's completion rate.

But another very important function of this step is to make freeform textfields usable.

Going back to our previous I like hot cocoa and orange juice example, we need a way to tell our system how to extra the "hot cocoa" and "orange juice" tokens so that they can be tabulated into the final dataset.

We can do exactly that by defining the following two entities:

- id: hot_chocolate
  patterns:
    - cocoa
- id: orange_juice
  patterns:
    - OJ

(Note that we can use RegExps patterns to match multiple text strings to the same canonical token.)

Once it has undergone the normalization process, the previous field will now look something like this:

{
  favorite_beverage: {
    freeform: {
      normalized: [ hot_chocolate, orange_juice],
      raw: "I like hot cocoa and orange juice"
    }
  }
}

To make this process smoother, I developed a normalization dashboard as part of our surveyadmin app, which is our own custom admin back-end:

How You Could Help

Add missing entities to catch any un-matched tokens
View open issues

Step 5: API

After the normalization is done, we now have a clean dataset!

In a typical data processing workflow you would then feed that dataset into Python, R, or any number of specialized tools.

But instead, we feed the dataset into our api Node.js app, where we use MongoDB aggregations to generate the data for each chart we need.

We do this for two reasons:

Keeping everything in TypeScript makes it easier to manage a big codebase by sharing code and avoiding too much context switching.
By running everything through an API, we can keep the entire data processing pipeline entirely dynamic, which unlocks some very cool possibilities.

The api app generates a GraphQL API that serves not only all our surveys' data, but also all locale and entities data.

And to streamline things, that API is generated from the same YAML config files that are used to generate the survey questionnaire. This means that as soon as you add a new question to the survey, it will also pop up in the API ready to be queried for its data.

How You Could Help

Clean up TypeScript errors
Add support for more advanced computations (percentiles, etc.)
View open issues

Step 6: Results

With our API up and running, we can now query it to obtain the data for any chart we need.

This means we can now build our results site (such as https://2022.stateofcss.com/), which is a static Gatsby site (although we will probably transition this to a Next.js app in the near future to simplify our architecture and keep things consistent).

Each survey edition has its own YAML sitemap config, meaning we are not tied down to the same structure as the survey questionnaire.

And we use this sitemap to build out the entire results site, from generating the GraphQL queries used to fetch the data for each chart, to defining which chart to use for visualizing said data.

A neat trick at this point is that when run locally in development mode, the Gatsby build process will cache a copy of each chart's data as a JSON file, which we then commit to that original surveys config repo.

We do this for two reasons: first, we can then use this static JSON file as a cached version of the data, and dramatically speed up the build both locally and remotely. Second, this lets us build the site even if the API is down for whatever reason.

How You Could Help

Convert codebase to TypeScript
Migrate to Next.js
Add new data visualizations
Improve accessibility
View open issues

And More…

Believe it or not, I've skipped over quite a few things, such as our caching strategy or how we generate social media images for each chart. And I haven't even mentioned non-technical aspects such as outreach, survey design, marketing, and more.

But this will probably be enough for now. If you have any questions, feel free to come say hello on Discord!. I'm always happy to chat!

DEV Community

How the Devographics Surveys Are Run, 2023 Edition

Launching a New Survey

Step 1: The Survey Config

How You Could Help

Step 2: Locales & Entities

How You Could Help

Step 3: The Survey Form

How You Could Help

Step 4: Normalization

How You Could Help

Step 5: API

How You Could Help

Step 6: Results

How You Could Help

And More…

Top comments (0)

Read next

Handling Python event loop shutdown without exceptions

Programming Problem Solving: C++ Case Study

A Introduction to Understanding Cloud Technology

Tipos brutos e código legado