DEV Community 👩‍💻👨‍💻

Eakam
Eakam

Posted on • Updated on

Telescope - filtering feed URLs - progress

I have been working on filtering feed URLS for telescope (issue 3688). I started by looking at the main blog hosts mentioned in the issue - dev.to, medium.com, blogspot.com, and wordpress.com. For most of them, I was able to find blog URLs to test and find the feed URLs returned by the feed discovery service.

For wordpress.com, I had to create an account, and play around with the UI to find out how posts were created, and how you could visit the site. Basically, once you add a post and publish it, you can click Visit Site to get redirected to the site URL. I used this URL to get a list of the feed URLs for wordpress.com.

To find out the feed URLs that could be used for viewing posts, I simply viewed their contents in the browser. If this was unreadable, I downloaded the response into a file by navigating to the URL in a new tab in Firefox and used VS Code to open the file. Then, I used an XML formatter to make the contents of the file more readable and confirmed that the URL response had the posts for the blog.

Once I had collected a list of valid feed URLs for various hosts, I noticed that there were three patterns

  • https://.../feed/userName (dev.to and medium.com)
  • https://blogName.blogspot.com/feeds/posts/default
  • https://blogName.wordpress.com/feed

I also found that there was an option to set up a custom domain for these blog hosts. Initially, my plan was to set some sort of a whitelist to only allow valid feed URLs. However, with custom domains, this could cause false positive or false negatives. So, I decided to use a blacklist filtering method instead. There were only a couple of feed URLs returned such as the wordpress comments feed: https://blogName.wordpress.com/comments/feed.
I would simply add a function to filter out any feed URLs that matched the pattern for the URLs in the blacklist. For example, a feed URL which ends with /comments/feed should not be returned.

Thus, I added a function to filter the feed URLs before returning them. Next, I need to test the sign-up process with various blog hosts to confirm that the feed URLs are returned correctly, and posts can be pulled successfully. I would also need to write some tests for the new function.

Top comments (0)

DEV

Thank you.

 
Thanks for visiting DEV, we’ve worked really hard to cultivate this great community and would love to have you join us. If you’d like to create an account, you can sign up here.