DEV Community

Eakam
Eakam

Posted on • Updated on

Telescope - filtering feed URLs - progress

I have been working on filtering feed URLS for telescope (issue 3688). I started by looking at the main blog hosts mentioned in the issue - dev.to, medium.com, blogspot.com, and wordpress.com. For most of them, I was able to find blog URLs to test and find the feed URLs returned by the feed discovery service.

For wordpress.com, I had to create an account, and play around with the UI to find out how posts were created, and how you could visit the site. Basically, once you add a post and publish it, you can click Visit Site to get redirected to the site URL. I used this URL to get a list of the feed URLs for wordpress.com.

To find out the feed URLs that could be used for viewing posts, I simply viewed their contents in the browser. If this was unreadable, I downloaded the response into a file by navigating to the URL in a new tab in Firefox and used VS Code to open the file. Then, I used an XML formatter to make the contents of the file more readable and confirmed that the URL response had the posts for the blog.

Once I had collected a list of valid feed URLs for various hosts, I noticed that there were three patterns

  • https://.../feed/userName (dev.to and medium.com)
  • https://blogName.blogspot.com/feeds/posts/default
  • https://blogName.wordpress.com/feed

I also found that there was an option to set up a custom domain for these blog hosts. Initially, my plan was to set some sort of a whitelist to only allow valid feed URLs. However, with custom domains, this could cause false positive or false negatives. So, I decided to use a blacklist filtering method instead. There were only a couple of feed URLs returned such as the wordpress comments feed: https://blogName.wordpress.com/comments/feed.
I would simply add a function to filter out any feed URLs that matched the pattern for the URLs in the blacklist. For example, a feed URL which ends with /comments/feed should not be returned.

Thus, I added a function to filter the feed URLs before returning them. Next, I need to test the sign-up process with various blog hosts to confirm that the feed URLs are returned correctly, and posts can be pulled successfully. I would also need to write some tests for the new function.

Top comments (0)