DEV Community

Omar Benmegdoul
Omar Benmegdoul

Posted on • Updated on

Scraping Kijiji: the Idea

I want to create a website to help find apartments in Montreal. Before I do or research anything, I just want to write out what I want it to do and how I think I can do it.

Data

Apartments

The data will come from Kijiji. I've written a Kijiji scraper in Python before, and now that I know more about JS I think I can do it faster. With any luck there's an API out there that I can use instead.

The scraper will periodically check Kijiji for new rental listings in Montreal. It will also scrape all the data for these listings so the end user can filter as narrowly as the data allows (which is more than Kijiji's user interface does).

Time distance

Instead of specifying a radius around a point and showing results to the user around that point, I want to allow them to filter results by time distance: how many minutes of walking, public transit, does it take to get from the apartment to some zip code.

I assume Google Maps has an API that lets you find this out. But unless it's free I probably can't call it for every single listing. I'll probably have to have a table of time distances between every 3-digit zip code and use this as an approximation. It's less than ideal for walking as you can easily walk 20 minutes in a straight line within H2H for instance. I would have to look into more granular options...

User Interface

There are a few things I don't like about Kijiji's interface:

  • Some listing properties can't be used for filtering (date available if I remember correctly), or the filters are not great
  • It is littered with duplicate listings and office rentals. A lot of apartment swaps, too, which may not be relevant.
  • It takes many clicks to get the information you care about. You have to click on listings to read the description, then you have to click on the pictures to see what the place looks like, and you have to click to see a map showing the location of the apartment.

My solution is to:

  • simply allow filtering using every listing property, with options (e.g. exclude listings if heat/electricity/water would put them above your stated budget)
  • include toggles to filter out duplicates, office rentals, and apartment swaps. They can be identified easily.
  • Make listing info visible in the search results. Most descriptions are not long enough to warrant a clickthrough. A table with all properties can be shown. A reasonably large preview image should be shown, with a series of thumbnails that change the src of the preview image on mouseenter so all of them can be seen without clicking through. A Google map of the apartment's location should be shown on mouseover of the address.

Problems:

  • That might be a lot of requests to make on a single page, leading the significant slowdowns. My hunch? Not as bad as waiting for dozens of different pages to load.
  • Getting distances might be tough as explained above.

Optional:

Is there a walk score API?

finally {

That about covers it! I'll write about my progress once I start doing research to (in)validate my assumptions.

Top comments (0)