DEV Community

Discussion on: Who's looking for open source contributors? (August 6 edition)

Collapse
 
slotix profile image
Dmitry Narizhnyhkh • Edited

dataflowkit.com Web Scraping platform for gophers is always looking for contributors!
We've just launched recently and looking for people who can spread an information about our framework.

Here are some facts about DFK:

Dataflow kit is fast. It takes about 4-6 seconds to fetch and then parse 50 pages.
Dataflow kit is suitable to process quite large volumes of data. Our tests show the time needed to parse appr. 4 millions of pages is about 7 hours.

Headless chrome is used for data extraction from JavaScript driven web pages;
Data scraping from paginated websites;
Automatic Processing of infinite scrolled pages.
Sсraping of websites behind login form;
Cookies and sessions handling;
Following links and detailed pages processing;
Managing delays between requests per domain;
Following robots.txt directives;
Various storage types support. The following storage types are currently available Diskv, Cassandra;
Save results as CSV, JSON, XML;