DEV Community

Cover image for How to crawl website using #bash script?
powerexploit
powerexploit

Posted on

How to crawl website using #bash script?

Bash script is one of the amazing scripting language used to automate tasks in Linux & Unix and it is one of my favourite scripting language for automating the tasks.

A few days ago I was searching about how to crawl website page?
After founding lot of stuff in the internet I learnt about 'Wget' tool into linux system.

Wget is a useful for downloading and crawling a website page.

So after this I started writing a bash script for website page crawling.
-> Firstly open up my favourite vim editor

-> Then started writing script with case statement

->As you can see I uses case statements and automated wget tool into a simple bash script and it its a working code..

For more details about bash and automation
visit my github account

Top comments (15)

Collapse
 
sm0ke profile image
Sm0ke

Crawling means to grab a page and extract page data into a structured format.

Wget does the first part, download the page. For the second phase, you can use Scrapy or BeautifulSoup

Collapse
 
msamgan profile image
Mohammed Samgan Khan

that's what I was wondering, wget will only download the page. Crawling means going through the content of the page.

Collapse
 
powerexploit profile image
powerexploit

I admire your response I know its not a pure crawling but if I only want to crawl one page then I will use wget
Or to download or crawl whole I will surely some python stuffs

Collapse
 
powerexploit profile image
powerexploit

I admire your response.
You are right to crawl i can use some python like u explain

Or also use some tools

Collapse
 
arissk79 profile image
Arissk

You can also stay in bash using hxselect and other html and xml bash tools

Collapse
 
vlasales profile image
Vlastimil Pospichal
  • use quotation marks around the variable name: wget "$url".
  • what does the line $url do?
  • case wget in or case $wget in or case "$wget" in? There are significant differences.
  • case wget in is always string "wget".
Collapse
 
powerexploit profile image
powerexploit

-> Wget $url will help me to download page and The whole script working very well.

-> double will make it string

Collapse
 
sodonnell profile image
Sean O'Donnell
Collapse
 
powerexploit profile image
powerexploit

Good suggestion

Collapse
 
shubhamkhapra profile image
Shubhamkhapra

Working harder

Collapse
 
powerexploit profile image
powerexploit

Thanks

Collapse
 
jodyshop profile image
Waleed Barakat

Nice idea, but what is the benefit of using such crawl process?

Collapse
 
powerexploit profile image
powerexploit

Its just a starting sir I enjoyed a lot when I was doing this

Collapse
 
vlasales profile image
Vlastimil Pospichal

An unusable script with too much mistakes. Pure wget is better.

Collapse
 
powerexploit profile image
powerexploit

What kind of mistakes sir if u can explain plz sir I
appreciate your comment
&its just a fun script..