loading...
Cover image for Faster Than Requests with MultiThread Web Scraper

Faster Than Requests with MultiThread Web Scraper

juancarlospaco profile image Juan Carlos ・1 min read

  • Alternative HTTP Client, new version 0.9, API for Humans.
  • Added Multi-Thread web scraper Built-in one-liner.
  • Added Multi-Thread file downloader Built-in one-liner.
  • 1 file, 0 dependency, ~100 Lines of code, 2.7 to 3.8, Alpine & ARM.
  • GitHub Actions CI building from scratch.
  • GitHub Actions CI running Unittests from scratch.
  • Examples for web scraper and file downloader.
  • Extras for Data Science, Web Scrapping, HTTP REST JSON APIs.
  • Examples, Dockerfile, tests, FAQ, CoC, Debug helpers, JSON helpers.
  • Docs has all functions with detailed arguments and returns with types.
Library Speed Files LOC Dependency Devs Scraper
PyWGET 152.39 1 338 Wget >17 No
Requests 15.58 >20 2558 >=7 >527 No
Urllib 4.00 ??? 1200 0(std lib) ??? No
Urllib3 3.55 >40 5242 >5(SSL) >188 No
PyCurl 0.75 >15 5932 Curl,LibCurl >50 No
FTR 0.45 1 99 0 1 Yes, 2

Hello World

requests.get("http://httpbin.org/get")
  • GET, POST, PATCH, PUT, DELETE and more.

Multi-Thread Web Scraper Built-in

requests.scrapper(["http://example.org", "http://example.io"], threads=True)
  • Theres 2 ready-made Web Scrapers built-in, easy to use one-liner.

Multi-Thread File Downloader Built-in

requests.download2([("http://example.org/foo.jpg", "output.jpg"), ], threads=True)
  • delay=1000 for 1 Second delay sleep between downloads.

Multi-Thread Bulk GET

requests.get2str2(["http://example.org", "http://example.io"], threads=True)
  • threads=False for No Multi-Thread.

GitHub

πŸπŸ˜ΌπŸ‘

Posted on by:

juancarlospaco profile

Juan Carlos

@juancarlospaco

.10x frAgile FullStuck Midend Devlooper, Python, Nim, Arch, OpenSource, EN|ES, Argentina, UTC-3, Atheist, WFH Nim Team Leader

Discussion

markdown guide
 

I keep seeing the double-p in scrapper and think, "Like, a scrappy fighter?" Is it possible you intended to call the method scraper?

 

Cool, I like the color scheme of your text editor...animus, I gonna put attention to th is librarie...animus

 

Great Post Juan!

If you think this is interesting, checkout Async for HTTP requests, I bet it will blow your mind! πŸ¦„

 

Nice, I see you implemented it in Nim eheh!

Have you tried httpx's async support as well?