DEV Community

loading...

How to: Scrape Facebook Groups for Apartment Rentals

Michael Salaverry
Using React and Node.js to create reliable tools for others.
・3 min read

TL:DR;

use the Python Facebook-Scraper module with the sqlite3 module, and the "re" regex module to scrape, parse, filter, and store Facebook posts in Apartment rental groups. You can run it in browser at the bottom of this post!

Design

I am looking for a new apartment rental, but I don't have a lot of patience for reading endless Facebook posts and manually parsing the details. Instead, I want to run a cli program and have a list of relevant apartments to investigate further. I want to filter by number of rooms, price, and have pictures and the original post text to read.

Since this is built on top of many wonderful open-source libraries, I will use the "we" case for this post. Consider the code below as MIT licensed.

We want to have a CLI application that can fill an sqllite3 database with relevant facebook posts. So we delegate the cli documentation and flags to the Python built-in argparser module.

We delegate the responsibility to scrape public Facebook groups to the excellent Facebook-Scraper module.

GitHub logo kevinzg / facebook-scraper

Scrape Facebook public pages without an API key

We can provide it with a facebook group ID to scrape.

We convert the posts from that module into our Data Transfer Object (DTO) , a Facebook post class. Using our own DTO class lets us add typing interfaces to all our functions and methods, which increases our velocity and our confidence in the code through static analysis and intellisense.

After we have a FacebookPost DTO, we can pass it through "append" builder functions which have regular expressions for extracting the price and the number of rooms from the post text. Since I am looking for an apartment in Israel, the regex's look for price and the number of rooms in Hebrew.

We composed a set of filters all of which run on instances of our FacebookPost DTO class. In this example, I filter on price and number of rooms. By using arguments to the CLI we can control the number of rooms and the price we are looking for.

We also created a post "printer" based on the FacebookPost DTO. One challenge we had was printing the right-to-left hebrew text in the console. By using the wonderful python-bidi module, I was able to print the right-to-left and left-to-right languages correctly.

GitHub logo MeirKriheli / python-bidi

BIDI algorithm related functions

However, since I want to eventually display the posts from a sqlite database, I removed the python-bidi requirement since it won't be relevant. I primarily use the sqlitebrowser to view the data and run SQL analysis on it

GitHub logo sqlitebrowser / sqlitebrowser

Official home of the DB Browser for SQLite (DB4S) project. Previously known as "SQLite Database Browser" and "Database Browser for SQLite". Website at:

The full code

Below is the full code. You can run the code in browser without installing anything using the repl.it embed below.

If you have questions, let me know in a comment below!

Below is the same code at a Github Gist which you can run locally using
$ git clone https://gist.github.com/barakplasma/34e8edf1640a4265479e9183fba38e47 \
pip install -r requirements.txt

Next up (if there's interest): How to build a static site using the sqlite database!

Leave a comment below if you want to see this next!

Discussion (0)