DEV Community

loading...
Cover image for Proxying web feeds with Dancer2

Proxying web feeds with Dancer2

Dave Cross
Geek, Fintech, SEO, Lefty. Feminist, Atheist. Skeptic. Rationalist. Secularist. Humanist. Republican (UK Meaning!). Londoner. Music lover. Writer. Genealogist.
Updated on ・5 min read

I registered the domain dave.org.uk in March 1999 and I've had a web site set up there since very soon after that. Over those twenty-two years, it's had a number of different looks and has been powered by various technologies. For a lot of the time, it was raw HTML; I then think I moved it to Movable Type for a while. But for the last few years, it's been running on WordPress.

I've got a lot of time for WordPress. It's a nice content management system, but I've started to think that it's rather overcomplicated for a site that has mostly turned into a collection of widgets and web feeds that tell people what I've been doing on various other sites.

Regular readers will know I've dabbled a bit with GitHub Pages and a few weeks ago, I started to wonder if I'd be better off running this site using those. I set up a repo and started to investigate.

The big limitation with GitHub pages is that it will only serve static pages. GitHub Actions gives us ways to mitigate that slightly, but that's not what I wanted here. For most of the pages on my site, I wanted to display data from other sites using some kind of Javascript widget.

There are basically two scenarios to deal with. On the Reading page, for example, I'm displaying my recent reading history from my Goodreads account. On the other hand, the Writing page just displays the contents of a web feed. In the WordPress world, I just found plugins that did what I wanted, now I'd need to dig a bit deeper and find (or, perhaps, write) widgets to do this.

I found a couple of widgets that did what I wanted (one for Goodreads and another for Instagram) but none of the web feed widgets I could find produced output that I was happy with. So I wrote my own. Actually, I wrote two - one for RSS feeds and another for Atom feeds (I should really spend the time to combine them into one). And when I was testing my solutions I kept falling over CORS errors.

Cross-origin resource sharing (or CORS) is a mechanism that controls how domains are able to reuse resources from a different domain. And the default setting is that they probably can't.

If you think about the RSS widget on my writing page, for example, it reads the RSS feed for my dev.to posts, parses the data and then formats it into HTML which it then inserts into the page. It reads the RSS feed by making an HTTP request to the dev.to site. But CORS, by default, says that if you're not making a request from the domain that hosts the feed (i.e. dev.to) then you can't get that resource by making a Javascript request. You'll get a CORS error. The way to fix it is for the person hosting the feed you're interested in to add a header to the response saying that reuse is OK. The header looks like this:

Access-Control-Allow-Origin: http://www.example.com
Enter fullscreen mode Exit fullscreen mode

Or this:

Access-Control-Allow-Origin: *
Enter fullscreen mode Exit fullscreen mode

The first allows reuse by requests from example.com; the second allows reuse by anyone (and is probably a bad idea).

So I was getting these CORS errors while testing my RSS and Atom widgets. And they can only be fixed by the people who own the resources (i.e. the web feeds). Now, on the writing page, I was lucky. I own most of the sites where I blog - Perl Hacks and Davblog are both WordPress sites that I run. And I found a WordPress plugin that allowed me to add the required headers. I also blog on dev.to (as you'll see from this post!) and, luckily, dev.to already include the correct headers for my widgets to work.

But not every page was as simple. I use Trakt.tv to track the films and TV shows that I'm watching. They'll give me an Atom feed of what I've been watching recently, but it comes without the CORS header so I can't use it in my widget.

And then I had an idea.

CORS only blocks requests that come from browsers. Requests that come from back-end programs work just as they always have. So I could set up some kind of proxy system where I request a feed from a server that I control but that server then requests the data from another (non-CORS-friendly) site and passes the data back to the original requester with the CORS header added.

Half an hour dabbling with Dancer2 and a bit of DNS and nginx configuration and feeds.dave.org.uk was working. Currently, it only runs two feeds - the Film and TV one I mentioned above and another which tells you what I've been listening to (through the magic of Last.fm and their scrobbling service). Last.fm used to provide a web feed of tunes I'd been listening to, but they turned it off a few years ago and now I build a web feed from JSON I get back from their API (the code to do it is online).

If you go to the front page, you'll get a list of the available feeds. Clicking on one of those links will give you the required feed with the correct CORS feed added.

The feed proxy code needs a lot of cleaning up, but it does the job. It's on GitHub if you think it might be interesting to you.

And the new version of my site is coming together fast. I haven't switched the domain over yet, but it can't be that far away.

Discussion (0)