DEV Community

loading...
Cover image for How to crawl website using #bash script?

How to crawl website using #bash script?

ankitdobhal profile image Ankit Dobhal ・1 min read

Bash script is one of the amazing scripting language used to automate tasks in Linux & Unix and it is one of my favourite scripting language for automating the tasks.

A few days ago I was searching about how to crawl website page?
After founding lot of stuff in the internet I learnt about 'Wget' tool into linux system.

Wget is a useful for downloading and crawling a website page.

So after this I started writing a bash script for website page crawling.
-> Firstly open up my favourite vim editor

-> Then started writing script with case statement

->As you can see I uses case statements and automated wget tool into a simple bash script and it its a working code..

For more details about bash and automation
visit my github account

Discussion (15)

Collapse
sm0ke profile image
Sm0ke

Crawling means to grab a page and extract page data into a structured format.

Wget does the first part, download the page. For the second phase, you can use Scrapy or BeautifulSoup

Collapse
msamgan profile image
Mohammed Samgan Khan

that's what I was wondering, wget will only download the page. Crawling means going through the content of the page.

Collapse
ankitdobhal profile image
Ankit Dobhal Author

I admire your response I know its not a pure crawling but if I only want to crawl one page then I will use wget
Or to download or crawl whole I will surely some python stuffs

Collapse
ankitdobhal profile image
Ankit Dobhal Author

I admire your response.
You are right to crawl i can use some python like u explain

Or also use some tools

Collapse
arissk79 profile image
Arissk

You can also stay in bash using hxselect and other html and xml bash tools

Collapse
vlasales profile image
Vlastimil Pospichal
  • use quotation marks around the variable name: wget "$url".
  • what does the line $url do?
  • case wget in or case $wget in or case "$wget" in? There are significant differences.
  • case wget in is always string "wget".
Collapse
ankitdobhal profile image
Ankit Dobhal Author

-> Wget $url will help me to download page and The whole script working very well.

-> double will make it string

Collapse
sodonnell profile image
Collapse
ankitdobhal profile image
Ankit Dobhal Author

Good suggestion

Collapse
shubhamkhapra profile image
Shubhamkhapra

Working harder

Collapse
ankitdobhal profile image
Collapse
jodyshop profile image
Waleed Barakat

Nice idea, but what is the benefit of using such crawl process?

Collapse
ankitdobhal profile image
Ankit Dobhal Author

Its just a starting sir I enjoyed a lot when I was doing this

Collapse
vlasales profile image
Vlastimil Pospichal

An unusable script with too much mistakes. Pure wget is better.

Collapse
ankitdobhal profile image
Ankit Dobhal Author

What kind of mistakes sir if u can explain plz sir I
appreciate your comment
&its just a fun script..

Forem Open with the Forem app