How I made a web scraper because LinkedIn

Ricardo A. Mercado on December 26, 2018

Having lots of LinkedIn connections can be convenient for many people. You and your connection agreed to be connected through the platform, thus ... [Read Full]
markdown guide
 

1) Why? What use is having all of their emails? Especially 2000+ of them at once? Maybe this is why LinkedIn stopped exporting that data?

2) You know you broke the User Agreement, right?

linkedin.com/legal/user-agreement and search for "scrape".

I'm a big fan of scrapers. I've written tons of them too.

But you have to pay attention to TOS/EULAs/etc.

 

1) If you can't think of a use of having all of their emails, doesn't mean there aren't uses for having them.

2) I guess they'll have to suspend/ban me.

 

1) I didn't say there aren't uses. I asked what yours was. Since we're having a technical discussion, I figured the typical "why am I doing this" would be a good part of the back and forth. As you mentioned in the article you think they used to export this info, but stopped. So maybe this is a time to step back and say "should I?". Also a healthy part of the discussion.

2) I suppose. Rather, I think it'd be best to once again examine the possible why and note that you are purposefully breaking an agreement you signed up for. For a fun comparison, what are the terms of service or user agreement used by AccountBerry? Do you have a similar agreement that might not allow for scraping either? And what if someone did anyway? You may not notice, but what if you did because they coded in error and slammed your system?

Like I've said, I've created lots of spiders/bots/scrapers. It is fun. And there are great reasons to make them.

But a discussion of the ethics of building them to use to scrape data from sites that you agreed not to scrape is an interesting article-worthy thing to think about. Hopefully an aspiring scraper-maker reads your article and this discussion and keeps it in mind.

1) Can't be too specific, but is for data analytics purposes. Why wouldn't they want them to be exported if I could get them by going to each connection one by one manually? The scraper basically automates that tedious process. I mean, connections agreed to share certain info, and email is just one of that information (they could even set it so the email is not shown).

2) I completely understand your point and I agree completely. I did break the agreement unknowingly (until you pointed it out), but there was no malicious intent. I only automatized a process I am allowed to do manually. I find that if you write some code to automatize a process you can achieve manually, then there shouldn't be no restriction to it. It's like a post I read yesterday, a person had 400 unread messages and couldn't select them all to mark them as read, so he just opened the dev tools and wrote a simple code to loop through all the messages and click them. My response "I guess they'll have to suspend/ban me." is based on that what is done is done.

Maybe adding "For educational purposes" changes the whole context of what is written?

Lol.. "For educational purposes" & "Don't try this at home, especially in the kitchen"

 

I wanted to get all of my LinkedIn connections' emails.

That is exactly why I never accept connection requests from anybody I don’t know in person.

 

While I kind of agree, I also don’t agree.

I also don’t connect with people I don’t know and it has nothing to do with his behaviour, but a matter of practice self harm reduction. If any one of my connections minus the recruiters of course were to do the same as the author I would assume it’s a reasonable use case and be fine with it.

Fundamentally I have no issue with someone wanting the access they were granted, but if you connect with randoms then you get what you get. Maybe it’s a little like dating ;)

 

I actually loved this. Nice article.

So do you perform a login with Nightmarejs and then just search from there?

I realize it's against TOS but I do believe it's still legal

arstechnica.com/tech-policy/2017/0...

The above article says you're good legally but I believe anything behind a password is where the line is drawn. I'm not sure if that means other people's passwords (hacking their accounts?) or your own. I've taken the former approach and I think the use you are doing is a perfect example of something that would be legal. You have access to all of the data already, this just speeds it up.

Anyway, great article!

 

Thanks!

Yeah you are prompted to fill in your personal LinkedIn credentials. The script logs you in and gets the emails from your personal connections. It's basically automizing a process I could do manually.

 

FYI, it seems that LinkedIn does actually allow you to download emails via the CSV you mentioned however each connection must opt-in for that.

LinkedIn Email Settings

 
 

PLEASE SORRY FOR THE DUMB QUESTION, AS YOU KNOW NOT ALL IS TECH SAVVY, I'M JUST IN NEED OF GETTING MY CONTACT WHICH IS STRESSFUL GETTING THEM ONE AFTER THE OTHER. PLEASE I HAVE BEEN TRYING TO FIGURE OUT THE PROCESS IN MAKING THE CHANGES YOU TALKED ABOUT BUT I HAVE NO IDEA ON STEPS TO TAKE.

PLEASE KINDLY WORK ME THROUGH THE PROCESS, A DIRECTION OF WHERE TO CHECK TO CONFIRM THE LINKEDIN CHANGES AND REPLACING WOULD BE REALLY APPRECIATED.

 

You seem to have accidentally enabled your Caps lock...

 

Nope, not really just wanted a bold text. Any help please, thanks.

 

I had a similar needs few months ago :) I created a chrome extension to accomplish several things for me:

  1. Search for people that I would like to connect and connect
  2. Endorse all their skills

It was quite an interesting exercise for me as I haven't tried developing browser extensions before. Also, I have never encountered any rate limiting so I deem browser extensions to be quite safe to use.

 
 

Email isn't a completely unused field, though it looks like they only provide publically available emails rather than any ones you're privy to as a connection.

I downloaded my 216 connections and had 1 email address (a chronic startup founder, so he wants to be seen) and 1 completely empty line other than connection date. I just reused that field as one for describing, manually, how I know them since for some awful reason LinkedIn removed the ability to tag people.

 

Hm... Firstly - thanks a lot, Ricardo!

Some code needed to be changed indeed, to account for renamed fields, but then it did start working.

The problem I'm having atm, however, is it seems to get stuck after scraping about 180 records (see screen). It gives a few errors extracting (emails exist on the profile) and then just sits there.

Any ideas?

screen

 
 
 

Either, the one that works best for what you need to do. I used nightmare because it was the first one that came to mind.

 

Script is no longer working, tried it out and all instructions were duly followed suit but the folder for supposed scrap list is empty. Any tips on how to get it working would be great.

Thanks

 

I'll check it out. The issue is this is a scraper, so if linkedin updates their page and changes the class of an element used in the script it will stop working. You can check out the source code and verify if any class has changed on linkedin

 

PLEASE POINT ME TO THE RIGHT DIRECTION SO I CAN BE ABLE TO CHANGE WHAT YOU MENTION. I HAVE SEARCH TRYING NOT TO BOTHER YOU, BUT I DON'T GET IT BECAUSE I AM NOT A PROGRAMMER, JUST A REGULAR USER.

PLEASE HELP ME OUT DISTINGUISHED

code of conduct - report abuse