DEV Community

Yoana Popova for Datopian

Posted on

Create a dataset from scratch and publish it with Datahub Cloud

In our previous article, we talked about DataDatahub Cloud: your stupidly simple and fast tool for turning your data stories or datasets on GitHub into a published, shareable site. It converts raw data and Markdown files into beautifully presented, interactive sites.

Today, we're going to tell you how to publish a dataset (multiple data files or a single data file) with DataHub Cloud.

As an example we're going to use an example dataset with an analysis of the top 1000 global universities:
https://www.kaggle.com/datasets/zahrayazdani81/univercitiesranking?resource=download

What You'll Need

  • GitHub account and basic knowledge of GitHub UI (especially editing and adding files)
  • A DataHub Cloud account.

Step 1: Create a GitHub repository with the data files and README.md file

Any DataHub Cloud site is built off of a GitHub repository. This is where you'd put all your dataset file(s) and any related markdown content that you want to publish. For the sake of simplicity, in this tutorial, we're only going to use a single README.md file. It's going to serve as a landing page for our site.

Tip
Any README.md or index.md file, either in a root of the repository or in a subfolder, will be treated as a "landing" page (of the whole site or a given folder) by the DataHub Cloud.

Go to your GitHub account and create a new repository. Note, you can check "Add a README file" checkbox. This will make GitHub automatically add an empty README.md file to our repository.

Hint
If you're new to GitHub, here are simple instructions on creating a repository: https://docs.github.com/en/repositories/creating-and-managing-repositories/quickstart-for-repositories

Now, let's continue this tutorial in our docs: https://datahub.io/docs/Create+a+dataset+from+scratch+and+publish+it+with+Datahub+Cloud

You can check out how a published dataset page looks like here: https://datahub.io/core/co2-ppm
You can check out how a published data story page looks like here:
https://datahub.io/@cheredia19/us-cities-population

Top comments (0)