As we're getting ready to launch even more surveys in 2023, I thought this would be a good time to write my usual overview of how the State of JavaScript, State of CSS, etc. are run.
Maybe you'd like to help out with our work, or maybe you're just curious about how our whole setup works. If that's the case, read on!
Previously: How Devographics Surveys Are Run, 2022 Edition
Launching a New Survey
Let's imagine we want to launch a brand new survey about developer's morning habits. Do they like tea or coffee? What are their feelings on french toast? Welcome to the State of Breakfast Developer Survey 2023!
Step 1: The Survey Config
First, we'll need to define some key informations that will then be reused by all our different apps. This includes the survey id
, name
, and so on.
We do this once for the survey itself, and then once per survey "edition" (the 2023 edition, the 2022 edition, etc.).
For example, here are the YAML config files for the State of CSS survey and its editions.
Crucially, these config files also include the survey outline, which is what contains the actual questions of the survey.
How You Could Help
- Come up with the initial draft of the outline
- Collect feedback from the community
Step 2: Locales & Entities
Because everything we do has to support multiple languages, we never store text strings in our YAML outlines – instead we store canonical IDs, which we then localize in the user's language.
This means that if we have a favorite_beverage
question, we'll need to add the { key: "breakfast.favorite_beverage.question", t: "What is your favorite beverage?"}
object to our en-US locale
We also need to add any new entities, which is basically any "thing" that appears in the survey. For example, if "Coca-cola" is one of the selectable options for that beverage question, we might need to add an entity such as:
- id: coca_cola
name: Coca-Cola
homepageUrl: https://us.coca-cola.com/
This will then let us link that item to the Coca-cola homepage whenever it appears in the result, and also helps with normalization (but more on that later).
We have close to 2000 entities for things such as CSS features, JavaScript frameworks, websites, and people… (nearly) all of which were entered manually by myself!
How You Could Help
- Enter missing translation strings
- Translate the content into your own language
- Add missing entities
- Add more metadata (resource links, code examples, etc.) to existing entities
Step 3: The Survey Form
Once we have our survey outline, we're ready to have people take the survey.
This happens through the surveyform
app, which is a Next.js app that lives in our big pnpm monorepo.
You can think of this app as the equivalent of Google Forms or TypeForm, except it's completely custom-made which makes it possible to add cool features such as support for multiple languages, autocomplete for some questions, and our “Reading List” feature that lets you save items you want to learn more about for later.
surveyform
stores in data in our private MongoDB database. At this point the data looks pretty rough. For example, if we have a freeform textfield called "Other Beverages" it will simply hold the user's raw answer, such as:
{
"breakfast2023__favorite_beverage__freeform": "I like hot cocoa and orange juice"
}
How You Could Help
- Test the survey
- Help us improve the UX of the survey app
- Improve accessibility
- View open issues
Step 4: Normalization
Once the survey is closed, the next step is to “normalize” the responses into a more usable format. This is when we eliminate any private information that shouldn't end up in the public dataset (although we don't collect any at the moment), remove empty responses, and generate computed fields such as a response's completion rate.
But another very important function of this step is to make freeform textfields usable.
Going back to our previous I like hot cocoa and orange juice
example, we need a way to tell our system how to extra the "hot cocoa" and "orange juice" tokens so that they can be tabulated into the final dataset.
We can do exactly that by defining the following two entities:
- id: hot_chocolate
patterns:
- cocoa
- id: orange_juice
patterns:
- OJ
(Note that we can use RegExps patterns to match multiple text strings to the same canonical token.)
Once it has undergone the normalization process, the previous field will now look something like this:
{
favorite_beverage: {
freeform: {
normalized: [ hot_chocolate, orange_juice],
raw: "I like hot cocoa and orange juice"
}
}
}
To make this process smoother, I developed a normalization dashboard as part of our surveyadmin
app, which is our own custom admin back-end:
How You Could Help
- Add missing entities to catch any un-matched tokens
- View open issues
Step 5: API
After the normalization is done, we now have a clean dataset!
In a typical data processing workflow you would then feed that dataset into Python, R, or any number of specialized tools.
But instead, we feed the dataset into our api
Node.js app, where we use MongoDB aggregations to generate the data for each chart we need.
We do this for two reasons:
- Keeping everything in TypeScript makes it easier to manage a big codebase by sharing code and avoiding too much context switching.
- By running everything through an API, we can keep the entire data processing pipeline entirely dynamic, which unlocks some very cool possibilities.
The api
app generates a GraphQL API that serves not only all our surveys' data, but also all locale and entities data.
And to streamline things, that API is generated from the same YAML config files that are used to generate the survey questionnaire. This means that as soon as you add a new question to the survey, it will also pop up in the API ready to be queried for its data.
How You Could Help
- Clean up TypeScript errors
- Add support for more advanced computations (percentiles, etc.)
- View open issues
Step 6: Results
With our API up and running, we can now query it to obtain the data for any chart we need.
This means we can now build our results
site (such as https://2022.stateofcss.com/), which is a static Gatsby site (although we will probably transition this to a Next.js app in the near future to simplify our architecture and keep things consistent).
Each survey edition has its own YAML sitemap config, meaning we are not tied down to the same structure as the survey questionnaire.
And we use this sitemap to build out the entire results site, from generating the GraphQL queries used to fetch the data for each chart, to defining which chart to use for visualizing said data.
A neat trick at this point is that when run locally in development mode, the Gatsby build process will cache a copy of each chart's data as a JSON file, which we then commit to that original surveys config repo.
We do this for two reasons: first, we can then use this static JSON file as a cached version of the data, and dramatically speed up the build both locally and remotely. Second, this lets us build the site even if the API is down for whatever reason.
How You Could Help
- Convert codebase to TypeScript
- Migrate to Next.js
- Add new data visualizations
- Improve accessibility
- View open issues
And More…
Believe it or not, I've skipped over quite a few things, such as our caching strategy or how we generate social media images for each chart. And I haven't even mentioned non-technical aspects such as outreach, survey design, marketing, and more.
But this will probably be enough for now. If you have any questions, feel free to come say hello on Discord!. I'm always happy to chat!
Top comments (0)