DEV Community

Cover image for Scraping LinkedIn Data With Proxycurl Jobs API
Michael Hungbo
Michael Hungbo

Posted on

Scraping LinkedIn Data With Proxycurl Jobs API

Cover photo by Alexander Shatov on Unsplash

Table of Contents

  1. Overview
  2. Prerequisites
  3. Getting started
  4. Fetching listed jobs
  5. Getting a specific job details
  6. Limitations of the LinkedIn Job API

Overview

The LinkedIn Jobs API is one of the rich set of tools provided by the Proxycurl API for working with processed and structured LinkedIn data in your applications. The LinkedIn Jobs API, specifically can be used to access jobs listed by a company (through the Jobs Listing Endpoint) on LinkedIn, or request compact information about a particular job (through the Jobs Profile Endpoint) such as the job's title, description, employment type, etc.

Prerequisites

This tutorial is written in JavaScript (ES6) and Node.js, so I'm assuming you are comfortable writing and understanding code in both.

Create a new directory, cd into it and start a new Node.js project by running:

npm init -y
Enter fullscreen mode Exit fullscreen mode

Next, we'll need to install the following packages to start our application.

  1. express - a Node.js framework to bootstrap our server.
  2. axios - a data fetching library to query the Jobs API endpoints.
  3. dotenv - to load environment variables into our app.

Run the following code to install the packages:

npm install express axios dotenv

or with Yarn

yarn add express axios dotenv
Enter fullscreen mode Exit fullscreen mode

Getting Started

To begin using the Jobs API, you'll need an API Key to make requests to the endpoints. Proxycurl gives free 10 credits for trial and each successful request to the API costs 1 credit. You can get additional credits by topping up your account through your dashboard. In this tutorial, we'll use the free 10 credits for starter. To get started, visit here to get your free API Key.

Next, sign-in to your dashboard and copy the API Key from the API Key and billing tab.

To protect your API Key, create a .env file in your project root directory and add the following code:

API_KEY = 'YOUR_API_KEY'
Enter fullscreen mode Exit fullscreen mode

Finally, add the following code to spin up our server:

import express from 'express';
import axios from 'axios';
import dotenv from 'dotenv';

const app = express();

dotenv.config();

app.listen(8000, () => {
    console.log('App connected successfully!');
});
Enter fullscreen mode Exit fullscreen mode

NOTE: Remember to add "type": "module" in package.json to let Node.js know we're writing ES6 code.

Fetching listed jobs

Let's say we need the list of jobs posted by Twitter on LinkedIn for some reason, manually scraping this type of data may be cumbersome but thanks to the Proxycurl API, we can scrape this data seamlessly.

To achieve this task, we'll use the Jobs Listing Endpoint to query jobs listed by Twitter on LinkedIn and the Company Profile Endpoint to get Twitter's search_id. The search_id is a numerical string returned in the response of the Company Profile Endpoint which we'll use as a parameter in querying the Jobs Listing Endpoint.

To get started, add the following code right before the app.listen() code block:


// {...previous code omitted for brevity}

const TWITTER_URL = 'https://www.linkedin.com/company/twitter/';  // Line 1

const COMPANY_PROFILE_ENDPOINT = 'https://nubela.co/proxycurl/api/linkedin/company';

const JOBS_LISTING_ENDPOINT = 'https://nubela.co/proxycurl/api/v2/linkedin/company/job';

const JOB_PROFILE_ENDPOINT = 'https://nubela.co/proxycurl/api/linkedin/job';

const companyProfileConfig = {  // Line 2
    url: COMPANY_PROFILE_ENDPOINT,
    method: 'get',
    headers: {'Authorization': 'Bearer ' + process.env.API_KEY},
    params: {
    url: TWITTER_URL
  }
};

const getTwitterProfile = async () => {  // Line 3
    return await axios(companyProfileConfig);
}

const profile = await getTwitterProfile();

const twitterID = profile.data.search_id;

console.log('Twitter ID:', twitterID);


const jobListingsConfig = {
    url: JOBS_LISTING_ENDPOINT,
    method: 'get',
    headers: {'Authorization': 'Bearer ' + process.env.API_KEY},
    params: {
    search_id: twitterID // Line 4
    }
}

const getTwitterListings = async () => { // Line 5
     return await axios(jobListingsConfig);
}

const jobListings = await getTwitterListings();

const jobs = jobListings.data.job;

console.log(jobs);
Enter fullscreen mode Exit fullscreen mode

Let's understand what is going on in the code above.

  1. From Line 1 above, we defined the API endpoints for the Proxycurl APIs. You can find the links from the API documentation here.

  2. In Line 2, we defined the axios configuration for the Company profile Endpoint. The url field takes the company profile endpoint url, the headers field takes the Authorization Bearer with our API token, and the params field takes the linkedIn url of the company we'd like to query, which in our case is the TWITTER_URL variable.

  3. In Line 3, we created a function getTwitterProfile which uses axios and returns the profile of our company using the companyProfileConfig.

  4. In Line 4, we're simply using the search_id returned from the getTwitterProfile function as a parameter in the axios configuration for the getTwitterListings function .

  5. Finally, in Line 5, we defined a function getTwitterListings to get the list of jobs posted by Twitter on LinkedIn and attached the result to a jobs variable and then logging the result.

At this point, if you run npm dev, the following response should be logged to the console:

Twitter ID: 96622
[
  {
    company: 'Twitter',
    company_url: 'https://www.linkedin.com/company/twitter',
    job_title: 'Content Designer, Content Moderation (Canada)',
    job_url: 'https://www.linkedin.com/jobs/view/3135150334',
    list_date: null,
    location: 'Toronto, ON'
  },
  {
    company: 'Twitter',
    company_url: 'https://www.linkedin.com/company/twitter',
    job_title: 'Senior Machine Learning Engineer - Ads Predictions - Revenue',
    job_url: 'https://www.linkedin.com/jobs/view/3104474438',
    list_date: null,
    location: 'Canada'
  },
  {
    company: 'Twitter',
    company_url: 'https://www.linkedin.com/company/twitter',
    job_title: 'Sr. Software Engineer, Realtime Storage - Key Value Storage',
    job_url: 'https://www.linkedin.com/jobs/view/3135386201',
    list_date: null,
    location: 'Toronto, ON'
  },
  {
    company: 'Twitter',
    company_url: 'https://www.linkedin.com/company/twitter',
    job_title: 'Content Designer, Content Moderation (Canada)',
    job_url: 'https://www.linkedin.com/jobs/view/3135146767',
    list_date: null,
    location: 'Canada'
  },
  {
    company: 'Twitter',
    company_url: 'https://www.linkedin.com/company/twitter',
    job_title: 'Software Engineer - Content Health',
    job_url: 'https://www.linkedin.com/jobs/view/3169270490',
    list_date: null,
    location: 'Toronto, ON'
  },
  {
    company: 'Twitter',
    company_url: 'https://www.linkedin.com/company/twitter',
    job_title: 'Product Design Manager, Advertiser Experience',
    job_url: 'https://www.linkedin.com/jobs/view/3020369734',
    list_date: null,
    location: 'Toronto, ON'
  },
  {
    company: 'Twitter',
    company_url: 'https://www.linkedin.com/company/twitter',
    job_title: 'Engineering Manager - Content Health (Child Safety)',
    job_url: 'https://www.linkedin.com/jobs/view/3165908037',
    list_date: null,
    location: 'Toronto, ON'
  },
  {
    company: 'Twitter',
    company_url: 'https://www.linkedin.com/company/twitter',
    job_title: 'Senior Software Engineer - Observability',
    job_url: 'https://www.linkedin.com/jobs/view/3158647123',
    list_date: null,
    location: 'Toronto, ON'
  },
  {
    company: 'Twitter',
    company_url: 'https://www.linkedin.com/company/twitter',
    job_title: 'Outbound Sales Representative - Customer Success, Agency',
    job_url: 'https://www.linkedin.com/jobs/view/3109712849',
    list_date: null,
    location: 'Toronto, ON'
  },
  {
    company: 'Twitter',
    company_url: 'https://www.linkedin.com/company/twitter',
    job_title: 'Senior Software Engineer - Data Platform, Metadata Services (Permanently Remote!)',
    job_url: 'https://www.linkedin.com/jobs/view/2939759384',
    list_date: null,
    location: 'Toronto, ON'
  },
  {
    company: 'Twitter',
    company_url: 'https://www.linkedin.com/company/twitter',
    job_title: 'Client Account Manager',
    job_url: 'https://www.linkedin.com/jobs/view/3136650462',
    list_date: null,
    location: 'Toronto, ON'
  }
]
Enter fullscreen mode Exit fullscreen mode

Getting a specific job details

The Job Profile Endpoint returns processed and compact details of a specific job listed by a company on LinkedIn. To see this in action, we'll use one of the jobs listed by Twitter which we returned from the Jobs Listing Endpoint in our previous code.

Add the following code to your project:

// {...previous code omitted for brevity}

const jobProfileConfig = {
    url: JOB_PROFILE_ENDPOINT,
    method: 'get',
    headers: { 'Authorization': 'Bearer ' + process.env.API_KEY },
    params: {
        url: jobs[0].job_url   // Line 1
    }
};

const getJobDetails = async () => {  // Line 2
    return await axios(jobProfileConfig);
};

const jobDetails = await getJobDetails(); 

console.log(jobDetails.data);  
Enter fullscreen mode Exit fullscreen mode

Here's what we're doing above.

  1. In Line 1, we added the url of the first job in the jobs variable as a parameter to the axios configuration in jobProfileConfig.

  2. In Line 2, we defined a function getJobDetails to get the details of the first job in the jobs array.

Running the code logs the following response to the console:

{
    "apply_url": null,
    "company": {
        "logo": "https://media-exp1.licdn.com/dms/image/C4D0BAQHiNSL4Or29cg/company-logo_400_400/0/1519856215226?e=1661385600\u0026v=beta\u0026t=rUecQpduLPDavL3JswjLsJAUNgSu1Q2l3JS5sGp8nHk",
        "name": "Twitter",
        "url": "https://www.linkedin.com/company/twitter"
    },
    "employment_type": "Full-time",
    "industry": [
        "Internet"
    ],
    "job_description": "This role may also be remote. Note: By applying to this position you will have...",
    "job_functions": [],
    "linkedin_internal_id": "2400342303",
    "location": "Toronto, ON",
    "seniority_level": null,
    "title": "'Content Designer, Content Moderation (Canada)",
    "total_applicants": null
}
Enter fullscreen mode Exit fullscreen mode

NOTE: I have truncated the job_description field content for brevity.

Limitations of the LinkedIn Job API

In the present state, the LinkedIn Job API has some limitations which limit its use. Here are some of the drawbacks a user might experience while using the API.

  1. The API does not provide the date a job was posted.

  2. It does not provide the qualifications or skills required for a job.

Hopefully, future updates of the API will include significant improvements and updates that will limit the above-stated drawbacks.

Latest comments (0)