Hey data science, web automation, web scraping, and data aggregation folks. Are you tired of needing to purchase proxy IP addresses that get blocked on your goal web asset within a couple of days at most? Do you not yet have your own solution for cycling IP addresses and/or user agents? Do you like super salesy pitches like this one and tend to buy things from QVC after being asked stupid rhetorical questions!?! Well then have I got great news for you!
All kidding aside, I've finally gotten around to uploading my proxy IP and user-agent cycling library Slither to PyPI! To check out the GitHub repo, go here, for the PyPI page, head here
Only python 3 is supported and no support for python 2 is planned. This is my small way of doing my part to encourage Python 3 use over Python 2. To install it in your next project in a Python-3-only environment:
pip install slitherlib
for a multi-distro environment:
pip3 install slitherlib
To actually use the library in your scraping projects:
from slitherlib.slither import Snake
from random import choice
import requests
s = Snake()
ip_address = choice(s.ips)
user-agent= choice(s.uas)
headers = {
"User-Agent": user-agents
}
r = requests.get('https://www.google.com',
proxies={'https': ip_address,
'http': ip_address},
headers=headers})
At this time, Slither pulls IP addresses and User-Agents from free sources around the web and dump them into two variables, ips
and uas
. We add new proxy ip:port sources as we can find them and verify, to the best of our ability, that they are not run by hackers looking to steal IP address information.
As this project grows, we hope to build it into a full web-scraping suite that easily supports concurrency and multi-processing, ROBOTS.txt support, webdriver browser automation, dynamic mouse-moves, and other goodies that will keep the data-collection enthusiast collecting data more and fighting 403 and 404 codes less!
If you like it, please give us a star on GitHub! I welcome bug reports, feature requests, and any comments or concerns you have so that I can make this library the best it can be! And, as always, I LOVE to collaborate so feel free to open a PR if you have improvements or ideas!
Top comments (4)
Does this package work with Selenium firefox proxy authentication?
The framework returns proxy ip:port combinations as a list of strings. Yes, it can be used with selenium Firefox as the IP and User-Agents overrides/arguments accept a string argument.
Basically, treat the Snake() object as a curated list of IP and UA choices that can be used anywhere a string object is accepted as an IP and/or UA argument.
Were you running into a particular issue using Slither in your selenium Firefox project?
Beautiful, haven't used it as yet, but looking forward. I stumbled upon your Youtube video on Reddit.
Great content keep it up.
Thanks so much! I've got a few videos up and always love to hear when people enjoy my content!