DEV Community

Cover image for `wget` God Mode
Connor Dillon
Connor Dillon

Posted on

`wget` God Mode

If you're anything like me, you like to download things. And sometimes, it's too cumbersome to right click > Save As... each item on a webpage. The solution to your problem sits in your terminal: the wget utility. If we add a few options, wget becomes a beast of a website downloader, and is capable of pulling an entire site for offline viewing, include all of the linked files.

All you have to do is copy & paste your desired URL into the following terminal command:

$ wget -mkEpnp WEBPAGE-URL
Enter fullscreen mode Exit fullscreen mode

The options -mkEpnp are specified below (pulled from the man page):

-m (aka --mirror): Turns on options suitable for mirroring. This option turns on recursion and time-stamping, sets infinite recursion depth and keeps FTP directory listings. It is currently equivalent to -r -N -l inf --no-remove-listing.

-k (aka --convert-links): Converts links for offline viewing.

-E (aka --adjust-extension): Adds proper filename extensions to downloaded files.

-p (aka --page-requisites): Downloads images, sounds, stylesheets, and other required files for proper offline site rendering.

-np (aka --no-parent): Prevents retrieval of the parent directory. Guarantees that only files below a certain hierarchy will be downloaded.

More fun wget options:

$ --execute robots=off #ignore robots.txt
$ --wait=30 #be gentle, wait between fetch requests
$ --random-wait #waits for a random amount of time before fetch requests
$ --user-agent=Mozilla #sends a mock user agent with each request
Enter fullscreen mode Exit fullscreen mode

Happy downloading! Oh and... I can't be held responsible if you suddenly find yourself investing in a home server setup, NAS drives, or the like.

Top comments (1)

Collapse
 
kwabenasapong profile image
kwabenasapong • Edited

How will you download the website if it requires authentication using a username, password and an authenticity token? I tried the following below but I get stuck on the sign-in page as it only downloads that for me;

!/usr/bin/env bash

username=username
password=password
code=wget -qO- https://urlname/sign_in service=https://urlname.io | cat | grep 'name="lt"' | cut -d"_" -f2
hidden_code=_$code
wget --save-cookies cookies.txt \
--keep-session-cookies \
--post-data 'username=$username&password=$password&lt=$hidden_code&_eventId=submit' \
--auth-no-challenge
--delete-after \
urlname/sign_in?service=https://ur...

wget --load-cookies cookies.txt \
urlname.io