DEV Community πŸ‘©β€πŸ’»πŸ‘¨β€πŸ’»

Cover image for Tutorial - How to build your own LinkedIn Profile Scrapper in 2022
chryzcode.dev
chryzcode.dev

Posted on

Tutorial - How to build your own LinkedIn Profile Scrapper in 2022

Hello techies,

Having entered into the mid-year of 2022, I have found a way to build a LinkedIn Profile Scrapper to help get user data and I want to share it with you all, hoping it might be of benefit to you.

Basically, this tutorial will help you out in building a tool that can fetch data of users that interest you on the LinkedIn platform.

Starting out, this is the breakdown of what this tutorial will cover.

  • What is Linkedin?
  • Have an Active Linkedin Account.
  • Set Up your Code Workspace - Code Editor, Python programming language, Selenium Webdriver.
  • Tutorial(coding).

Firstly, what is Linkedin?

LinkedIn is one of the largest professional network platforms where you can get job opportunities, connect and build professional relationships with your fellow colleagues in your field.

Have an Active Linkedin Account.

Here is the foremost and most important step, If you have one you can skip through else kindly visit https://www.linkedin.com/ and as of now here is the visual interface

Image description

Then go ahead and click the Join Now where there is a red mark and this will the display rendered, the SignUp page to create an account.

Image description

Kindly fill the form with your details, your email and password and submit. You should get a verification mail from LinkedIn, kindly follow the instruction given to verify your Linkedin account.

Setup your Code Workspace.

Firstly, a code editor is also pertinent and there are varieties of them but to save you time and undergoing stress, I recommend using Visual Studio Code(vscode), visit https://code.visualstudio.com/ and download the build compatible with your device operating system.

Image description

Secondly, as aforementioned we will be using the Python programming language. Visit the Python official website https://www.python.org/, hover over the downloads that is marked in the image below and download the build that is compatible with your operating system

Image description

Here are resources to install Python on different operating systems.

Mac Operating System

Windows Operating System

Linux Operating System(Ubuntu)

If you installed the Visual Studio Code Editor(vscode), here is also a resource on how to set up Python extensions for your code editor.

Lastly, we will need the Selenium Webdriver. In this tutorial, we will use the Selenium Webdriver to connect with the Chrome browser to use our Linkedin Profile Scrapper. Here is a tutorial to help you out with the installation https://www.youtube.com/watch?v=WnWQgUerR0c.

Tutorial(coding)

This is the last part of the tutorials and we'll start with writing of codes.

Firstly, we need to install the Python package that we'll be using, linkedin-scrapper with the Package Installer for Python(pip). Pip is is used to install Python based packages and libraries.

pip install --user linkedin_scraper
Enter fullscreen mode Exit fullscreen mode

I spoke about the Selenium Webdriver before now, so it is time we need to set the set the path.

export CHROMEDRIVER=~/chromedriver
Enter fullscreen mode Exit fullscreen mode

Here we export into the CHROMEDRIVER variable the path of the Selenium Webdriver downloaded. To avoid errors at this point, it is advisable you create a folder for project and include both the Webdriver and Python file for easy path configuration.

from linkedin_scraper import Person, actions
from selenium import webdriver
driver = webdriver.Chrome()
Enter fullscreen mode Exit fullscreen mode

Before writing this code, kindly make sure you have the linkedin-scrapper package installed.

To check or confirm kindly run this on your terminal pip freeze. This will help show all the Python packages you have installed on your device alphabetically, there you can search for the Linkedin Scrapper package for certainty.

We import some libraries/classes from both the Linkedin Scrapper and the Webdriver from Selenium.

A variable driver is created where the Selenium Webdriver is defined.

A driver using Chrome is created by default. However, if a driver is passed in, that will be used instead.

email = "some-email@email.address"
password = "password123"
Enter fullscreen mode Exit fullscreen mode

We earlier created a Linkedin account, now is the time to utilize it in the program(code).

An email and password variable is created and should be defined with validated Linkedin account details else error will be inevitable.

actions.login(driver, email, password)  
person = Person("https://www.linkedin.com/in/olanrewaju-alaba/", driver=driver)
Enter fullscreen mode Exit fullscreen mode

If you can recollect the actions class was imported from Selenium. We are going to use this to login with our Linkedin account details using the Webdriver.

The Person class installed is used to defined the profile of a particular Linkedin Profile by using the profile's url path.

Note:

  • if email and password isnt given, it'll prompt in your terminal.

  • The account used to log-in should have it's language set English to make sure everything works as expected

You might want to also get data from a Company Linkedin Profile.

from linkedin_scraper import Company
company = Company("https://ca.linkedin.com/company/google")
Enter fullscreen mode Exit fullscreen mode

Instead of importing the Person class, you'll import the Company class or both. You'll use the Company class to define a company variable using the url pattern in the code snippet above with the company name on LinkedIn.

Here is the final piece of code to get our Linkedin Profile Scrapper working.

person.scrape()
#or
company.scrape()
Enter fullscreen mode Exit fullscreen mode

This code snippet above will get(scrape) the data of the specified Linkedin Profile of a person or company. After scrapping the data, the browser powered/engineered by the Webdriver will close but to continue in this process;

person.scrape(close_on_complete=False)
#or
company.scrape(close_on_complete=False)
Enter fullscreen mode Exit fullscreen mode

By default close_on_complete is set to True, so it is important to set it to False to the keep the browser on.

Our Linkedin Profile Scrapper is now perfectly built to fetch data, you can go ahead to test this program.

For those who are willing to learn more about the linkedin-scrapper package, let move ahead to explore a little more.

Person

A Person object can be created with the following inputs:

Person(linkedin_url=None, name=None, about=[], experiences=[], educations=[], interests=[], accomplishments=[], company=None, job_title=None, driver=None, scrape=True)
Enter fullscreen mode Exit fullscreen mode
  • linkedin_url: This is the linkedin url of their profile

  • name: This is the name of the person

  • about: This is the small paragraph about the person

  • experiences: This is the past experiences they have. A list of linkedin_scraper.scraper.Experience

  • educations: This is the past educations they have. A list of linkedin_scraper.scraper.Education

  • interests: This is the interests they have. A list of linkedin_scraper.scraper.Interest

  • accomplishment: This is the accomplishments they have. A list of linkedin_scraper.scraper.Accomplishment

  • company: This the most recent company or institution they have worked at.

  • job_title: This the most recent job title they have.

  • Driver

This is the driver from which to scraper the Linkedin profile. A driver using Chrome is created by default. However, if a driver is passed in, that will be used instead.

For example

  • scrape
    When this is True, the scraping happens automatically. To scrape afterwards, that can be run by the scrape() function from the Person object.

  • scrape(close_on_complete=True)
    This is the meat of the code, where execution of this function scrapes the profile. If close_on_complete is True (which it is by default), then the browser will close upon completion. If scraping of other profiles are desired, then you might want to set that to false so you can keep using the same driver.

Company

A Company object can be created with the following inputs:

Company(linkedin_url=None, name=None, about_us=None, website=None, headquarters=None, founded=None, company_type=None, company_size=None, specialties=None, showcase_pages=[], affiliated_companies=[], driver=None, scrape=True, get_employees=True)
Enter fullscreen mode Exit fullscreen mode
  • linkedin_url: This is the linkedin url of their profile

  • name: This is the name of the company

  • about_us: The description of the company

  • website: The website of the company

  • headquarters: The headquarters location of the company

  • founded: When the company was founded

  • company_type: The type of the company

  • company_size: How many people are employeed at the company

  • specialties: What the company specializes in

  • showcase_pages: Pages that the company owns to showcase their products

  • affiliated_companies: Other companies that are affiliated with this one

  • get_employees: Whether to get all the employees of company.

  • Driver

This is the driver from which to scraper the Linkedin profile. A driver using Chrome is created by default. However, if a driver is passed in, that will be used instead.

Hi, this is where this tutorial will be put to a stop, I hope you learned a lot and had fun.


Thanks for reading through this article and I hope you found it useful, you can connect with me on;

Bye.

Top comments (2)

Collapse
jonrandy profile image
Jon Randy

*scraper

Collapse
miketalbot profile image
Mike Talbot

Although I'm wondering if the original spelling isn't a better project...

🌚 Friends don't let friends browse without dark mode.

Sorry, it's true.