DEV Community

Cover image for How to export a CSV with my data posts in DEV using its API
msc2020
msc2020

Posted on • Originally published at dev.to

How to export a CSV with my data posts in DEV using its API

In this post we show a quick step-by-step guide to collecting data from "my" publications on DEV (dev.to) using its beta API. We use Python 3.9+ libraries (Requests, Json and Pandas) to make requests to DEV API endpoints, then pass them to the DataFrame format and then export as a .CSV file. This CSV will contain data from the posts that have been published by the user msc2020 so far. It is possible for you to collect your data too.


Contents ☕


DEV API, versions v0 and v1 [^]

DEV needs no introduction, but it's worth mentioning that it's built on Forem, an "open source tool for building communities" 👊🏼. When visiting the Forem community homepage we noticed the many similarities between the two.

DEV currently has an API (beta version 0.9.7) with documentation at https://developers.forem.com/api. There are some differences between the two available versions. The main one is that some v0 endpoints can be accessed without an access token (API_TOKEN). v1 uses tokens on all its endpoints. According to the documentation, endpoints that do not require token authentication use CORS (Cross-origin resource sharing) to control access.

Some API endpoints [^]

The table below shows some DEV API endpoints accompanied by information that may be useful.

API version Endpoint HTTP method Use API_KEY Describe Example
v0 /articles GET No Returns all posts (articles, questions, announcements, etc.) published, 30 per page curl https://dev.to/api/articles
v0 /articles POST Yes Create an article curl -X POST -H "Content-Type: application/json" -H "api-key: API_KEY" -d '{"article": "title":"Title","body_markdown":"Body","published":false,"tags":["discuss", "javascript"]}}' https://dev.to/api/articles
v0 /comments GET No Returns all comments from an article or comments from a podcast, 30 per page curl https://dev.to/api/comments?a_id=270180

Lists of API endpoints in version v0 and v1 can be found, respectively, at: https://developers.forem.com/api/v0 and https://developers.forem.com/api/v1.

🙈 Attention: Although some v0 version endpoints can be used without API_TOKEN, on the API website it is recommended that all of them use this authentication.

Getting data from the API [^]

The code below captures, via the DEV API, data relating to my (username = msc2020) posts:

import requests # install with: pip install requests

url = 'https://dev.to/api/articles'
querystring = {'username': 'msc2020'}
headers = requests.utils.default_headers()

response = requests.request('GET', url, headers=headers, params=querystring)

print(response.text)

'''
output:

[{
  "type_of":"article","id":1850779,"title":"Raspagem de dados de um site de notícias em pt-BR","description": ...
  ...
}]
'''
Enter fullscreen mode Exit fullscreen mode

The output returned from the GET call above is an object (response) from the Requests library. To convert/parse the contents of response.text (type str) into a list of dictionaries (type dict) we use:

import json # python standard library

res_json = json.loads(response.text)
Enter fullscreen mode Exit fullscreen mode

In order to facilitate analysis of data collected with Python libraries, we will be converting this JSON into a CSV.

🗒️ Note: In the script above we passed the username parameter in the GET call. To see other parameters available in endpoint /articles visit this link of the API documentation.

Exporting collected data to CSV [^]

After collecting the JSON data via API, we use Pandas' to_csv to export the data to CSV format.

Including this step, we obtain the complete export_posts.py code:

# export_posts.py

import requests # pip install requests
import pandas as pd # pip install pandas
import json # standard library

# define username
USER_NAME = 'msc2020'

# run the request
url = 'https://dev.to/api/articles'
querystring = {'username': USER_NAME}
headers = requests.utils.default_headers()
response = requests.request('GET', url, headers=headers, params=querystring)
# print(response.text)

# converts request response into a list of dict
res_json = json.loads(response.text)

# convert JSON to Pandas DataFrame
df_posts = pd.DataFrame(res_json)

# export post data to CSV
df_posts.to_csv('dataset_articles_published_msc2020.csv', index=None)

# displays the first 3 rows of the dataset
print(df_posts.head(3))

'''output:
. 1) content of the first three lines:
>>>
type_of       id  ...                                               tags                                               user
0  article  1850779  ...         tutorial, braziliandevs, python, beginners  {'name': 'msc2020', 'username': 'msc2020', 'tw...
1  article  1842575  ...  deeplearning, machinelearning, python, brazili...  {'name': 'msc2020', 'username': 'msc2020', 'tw...
2  article  1835701  ...                    python, tutorial, braziliandevs  {'name': 'msc2020', 'username': 'msc2020', 'tw...

[3 rows x 25 columns]

. 2) a CSV in local directory: `dataset_articles_published_msc2020.csv`
'''
Enter fullscreen mode Exit fullscreen mode

pandas head msc2020

Print of df_posts.head(3) output in Jupyter notebook

Tests using another username [^]

Currently, it is also possible to obtain data about other users' posts using the endpoint articles of the DEV API. For example, now using USER_NAME = 'anuragrana' and changing the output name to dataset_articles_published_user.csv in the full code export_posts.py the return is expected to be the following:

   type_of       id  ...                                               user flare_tag
0  article  1855307  ...  {'name': 'Anurag Rana', 'username': 'anuragran...       NaN
1  article  1276096  ...  {'name': 'Anurag Rana', 'username': 'anuragran...       NaN
2  article   262178  ...  {'name': 'Anurag Rana', 'username': 'anuragran...       NaN

[3 rows x 26 columns]
Enter fullscreen mode Exit fullscreen mode

pandas head

Print the output of df_posts.head(3) from the export_posts.py code in Jupyter notebook

Conclusion [^]

The CSV obtained in this post can help with data analysis with Python libraries. With few adaptations to the created code, we can obtain data from other endpoints of the DEV API. There are many possibilities for using the collected data.

☕ 🧘‍♂️ 💻 ☯️ 🪬

Top comments (0)