In this post we show a quick step-by-step guide to collecting data from "my" publications on DEV (dev.to) using its beta API. We use Python 3.9+ libraries (Requests, Json and Pandas) to make requests to DEV API endpoints, then pass them to the DataFrame format and then export as a .CSV
file. This CSV will contain data from the posts that have been published by the user msc2020 so far. It is possible for you to collect your data too.
Contents โ
DEV API, versions v0 and v1 [^]
DEV needs no introduction, but it's worth mentioning that it's built on Forem, an "open source tool for building communities" ๐๐ผ. When visiting the Forem community homepage we noticed the many similarities between the two.
DEV currently has an API (beta version 0.9.7) with documentation at https://developers.forem.com/api. There are some differences between the two available versions. The main one is that some v0 endpoints can be accessed without an access token (API_TOKEN
). v1 uses tokens on all its endpoints. According to the documentation, endpoints that do not require token authentication use CORS (Cross-origin resource sharing) to control access.
Some API endpoints [^]
The table below shows some DEV API endpoints accompanied by information that may be useful.
API version | Endpoint | HTTP method | Use API_KEY | Describe | Example |
---|---|---|---|---|---|
v0 | /articles | GET | No | Returns all posts (articles, questions, announcements, etc.) published, 30 per page | curl https://dev.to/api/articles |
v0 | /articles | POST | Yes | Create an article | curl -X POST -H "Content-Type: application/json" -H "api-key: API_KEY" -d '{"article": "title":"Title","body_markdown":"Body","published":false,"tags":["discuss", "javascript"]}}' https://dev.to/api/articles |
v0 | /comments | GET | No | Returns all comments from an article or comments from a podcast, 30 per page | curl https://dev.to/api/comments?a_id=270180 |
Lists of API endpoints in version v0 and v1 can be found, respectively, at: https://developers.forem.com/api/v0 and https://developers.forem.com/api/v1.
๐ Attention: Although some v0 version endpoints can be used without API_TOKEN
, on the API website it is recommended that all of them use this authentication.
Getting data from the API [^]
The code below captures, via the DEV API, data relating to my (username = msc2020
) posts:
import requests # install with: pip install requests
url = 'https://dev.to/api/articles'
querystring = {'username': 'msc2020'}
headers = requests.utils.default_headers()
response = requests.request('GET', url, headers=headers, params=querystring)
print(response.text)
'''
output:
[{
"type_of":"article","id":1850779,"title":"Raspagem de dados de um site de notรญcias em pt-BR","description": ...
...
}]
'''
The output returned from the GET
call above is an object (response
) from the Requests
library. To convert/parse the contents of response.text
(type str
) into a list of dictionaries (type dict
) we use:
import json # python standard library
res_json = json.loads(response.text)
In order to facilitate analysis of data collected with Python libraries, we will be converting this JSON
into a CSV
.
๐๏ธ Note: In the script above we passed the username
parameter in the GET
call. To see other parameters available in endpoint /articles
visit this link of the API documentation.
Exporting collected data to CSV [^]
After collecting the JSON
data via API, we use Pandas' to_csv
to export the data to CSV
format.
Including this step, we obtain the complete export_posts.py
code:
# export_posts.py
import requests # pip install requests
import pandas as pd # pip install pandas
import json # standard library
# define username
USER_NAME = 'msc2020'
# run the request
url = 'https://dev.to/api/articles'
querystring = {'username': USER_NAME}
headers = requests.utils.default_headers()
response = requests.request('GET', url, headers=headers, params=querystring)
# print(response.text)
# converts request response into a list of dict
res_json = json.loads(response.text)
# convert JSON to Pandas DataFrame
df_posts = pd.DataFrame(res_json)
# export post data to CSV
df_posts.to_csv('dataset_articles_published_msc2020.csv', index=None)
# displays the first 3 rows of the dataset
print(df_posts.head(3))
'''output:
. 1) content of the first three lines:
>>>
type_of id ... tags user
0 article 1850779 ... tutorial, braziliandevs, python, beginners {'name': 'msc2020', 'username': 'msc2020', 'tw...
1 article 1842575 ... deeplearning, machinelearning, python, brazili... {'name': 'msc2020', 'username': 'msc2020', 'tw...
2 article 1835701 ... python, tutorial, braziliandevs {'name': 'msc2020', 'username': 'msc2020', 'tw...
[3 rows x 25 columns]
. 2) a CSV in local directory: `dataset_articles_published_msc2020.csv`
'''
Tests using another username [^]
Currently, it is also possible to obtain data about other users' posts using the endpoint articles
of the DEV API. For example, now using USER_NAME = 'anuragrana'
and changing the output name to dataset_articles_published_user.csv
in the full code export_posts.py
the return is expected to be the following:
type_of id ... user flare_tag
0 article 1855307 ... {'name': 'Anurag Rana', 'username': 'anuragran... NaN
1 article 1276096 ... {'name': 'Anurag Rana', 'username': 'anuragran... NaN
2 article 262178 ... {'name': 'Anurag Rana', 'username': 'anuragran... NaN
[3 rows x 26 columns]
Conclusion [^]
The CSV
obtained in this post can help with data analysis with Python libraries. With few adaptations to the created code, we can obtain data from other endpoints of the DEV API. There are many possibilities for using the collected data.
Top comments (0)