DEV Community

loading...

Discussion on: How to crawl website using #bash script?

Collapse
sm0ke profile image
Sm0ke

Crawling means to grab a page and extract page data into a structured format.

Wget does the first part, download the page. For the second phase, you can use Scrapy or BeautifulSoup

Collapse
arissk79 profile image
Arissk

You can also stay in bash using hxselect and other html and xml bash tools

Collapse
ankitdobhal profile image
Ankit Dobhal Author

I admire your response.
You are right to crawl i can use some python like u explain

Or also use some tools

Collapse
msamgan profile image
Mohammed Samgan Khan

that's what I was wondering, wget will only download the page. Crawling means going through the content of the page.

Collapse
ankitdobhal profile image
Ankit Dobhal Author

I admire your response I know its not a pure crawling but if I only want to crawl one page then I will use wget
Or to download or crawl whole I will surely some python stuffs