Linux – How to i make wget download only pages not css images etc

linuxwget

I want to download an entire website using wget but I don't want wget to download images, videos etc.

I tried

wget -bqre robots=off -A.html example.com –user-agent=”Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6″

but when I do that it doesn't download .php files, just downloads static .html files.

Is there a solution to this problem with wget?

Best Answer

You've explicitly told wget to only accept files which have .html as a suffix.

Assuming that the php pages have .php, you can do this:

wget -bqre robots=off -A.html,.php example.com –user-agent=”Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6″

Note that this will downloaded the rendered html, not the source of the php. If the page is sufficiently dynamic, you might not get the rendered result you expect.

However, I'd suggest that another tool such as httrack may do a better job - it depends on exactly what you need to do.