Anjan Kant

Posted on Jan 22, 2020

What is Web Scraping | Data Mining

#webscraping #datamining

Web scraping is a popular term for various significant methods used to extract web metadata or gather valuable information across the Internet. Generally, this is accomplished with exclusive software that simulates web surfing to gather specific bits of information from different websites.

Visual Studio: Learn Free Web Scriping in Easy 10 Steps | Data Mining | HTMLAgilityPack | LINQ

Anjan Kant ・ Oct 17 '19

#webscraping #htmlagilitypack #datamining

Purpose of web scraping

Throughout web scraping programs, some professionals or businessmen will be able to gather some web data to sell to other companies or users, for promotional intention. Hence, Web scraping is known as screen scraping, data mining, Web harvesting or Web data extraction.
Subscribe YouTube Channel

Web scraping as data mining

Web scraping as data mining helps in report collection of weather, auction information, market pricing for any product, or any other list of gathered information can be inherited or captured. Sometimes, web scraping is restricted by many websites with respect to data mining, but web scraping is widely utilized to collect aggregated data from different private or government data sources in spite of all legal challenges.

Free Video Library: Be an Expert in Web Scraping with Free 15 Awesome Video

Anjan Kant ・ Nov 21 '19

#webscraping #htmlagilitypack #datamining

Types of data mining

Different types of data mining are practiced by developers. Four approaches are given below.

1. Text pattern fetching

A simple yet influential method to extract text from html pages can be based on the UNIX grep command or regular expression-matching facilities of programming languages (for instance Perl or Python).

2. HTML parsing (Wrapping)

In this data mining method, the wrapper extracts information or text from a specific web page having dynamically encoded data. The most important feature of the wrapper is it detects such dynamic templates in a specific information source, extracts its entire content and translates it into a relevant form. Wrapper making algorithms presume that input web pages of a wrapper orientation system conform to a common template and that they can be easily identified in terms of a URL common scheme.[3] Furthermore, some semi-structured data retrieving languages, like the HTQL and XQuery, can be utilized to parse HTML based web pages and to regain and transform html web page content.

3. HTTP programming

Static and dynamic web pages can be recovered by posting HTTP requests to the distant web server through socket applications.

4. DOM (Document Object model parsing)

By embedding a complete-matured web browser, like the Internet Explorer, Chrome or the Mozilla browser control, the application can recover the dynamic content produced by the client-side scripts. All these browsers also parse the website pages into a DOM tree, based on which web scraping applications can regain parts of the pages.
>>> Original Source

DEV Community

What is Web Scraping | Data Mining

Visual Studio: Learn Free Web Scriping in Easy 10 Steps | Data Mining | HTMLAgilityPack | LINQ

Anjan Kant ・ Oct 17 '19

Purpose of web scraping

Web scraping as data mining

Free Video Library: Be an Expert in Web Scraping with Free 15 Awesome Video

Anjan Kant ・ Nov 21 '19

Types of data mining

1. Text pattern fetching

2. HTML parsing (Wrapping)

3. HTTP programming

4. DOM (Document Object model parsing)

Top comments (0)

Read next

Day 7: Conquering Challenges and Building a ShoppingList App

How to used devfolio

Create scalable and fault-tolerant microservices architecture

Building Your First Gleam Application: A Weather CLI Tool