Prepare
- wget (32 bit) for SSL – https://eternallybored.org/misc/wget/1.20.3/32/wget.exe
- wget (64 bit) for SSL – https://eternallybored.org/misc/wget/1.20.3/64/wget.exe
- FileSeek – https://www.fileseek.ca/Download/
Retrieving websites
Mirror a full website
wget --mirror --no-check-certificate https://website.com
Mirror a full website including all the linked pages and files wget ‐‐execute robots=off ‐‐recursive ‐‐no-parent ‐‐continue ‐‐no-clobber https://website.com
Mirror a full website - Limit rate and wait period wget ‐‐limit-rate=20k ‐‐wait=60 ‐‐random-wait ‐‐mirror https://website.com
Mirror multiple Websites
wget --mirror --no-check-certificate ‐-input-file sites.txt
Download all PDF, XLS, XLSX files from a Website wget -r -l3 -e robots=off --no-check-certificate -A .pdf,.xls,.xlsx https://website.com
Tell website wget is a browser and Download all PDF, XLSX files wget --user-agent="Googlebot/2.1 (+https://www.googlebot.com/bot.html)" -r -l4 -e robots=off --no-check-certificate -A .pdf,.xlsx https://website.com
Download password protected website wget --mirror --no-check-certificate ‐‐http-user=USR ‐‐http-password=PASS https://website.com/directory
Know the last modified date of a web page wget ‐‐server-response ‐‐spider https://website.com
Batch Script: collect.bat wget ‐‐limit-rate=20k ‐‐wait=60 ‐‐random-wait ‐‐mirror https://website.com