This post was originally published on my blog, jacklyons.me
Just recently I was asked to scrape a Wordpress blog for a client to audit of all their posts. Naturally, the first thought was to just export all the posts, however, after a quick google I stumbled upon the Wordpress REST API. Using the API allows you to make direct requests to any wordpress site and retreive a list of blog posts as a JSON object.
Give it a try right now. Punch this into your browser and you should get a list of my 10 most recent blog posts:
It's that easy! Inside each post object there is a huge amount of data. You can extract things like post date, post status, and much more. The API documetation states that you can only retreive a maximum of 100 posts per request. In this post I'll show you how to create a function that will get all your posts in a single go! This can be helpful when the site you're scraping has hundreds or thousands of posts.
Below I created a super simple HTML snippet that you can copy and paste into a basic HTML file. Note that I'm using some modern browser and ES2017 features so you'll have to use Chrome or Firefox. Also, it may take a little while if you are scraping a site with a few hundred or thousand posts.
If you have any questions, comments or feedback to improve, please just leave a comment :)