Linux – wget: how to download a file, whose url params changes dynamically, once only

I got a problem with wget, I need to download an entire site with images and other files linked in main pages, I'm using these options:

wget --load-cookies /tmp/cookie.txt -r -l 1 -k -p -nc 'https://www.example.com/mainpage.do'

(-l 1 is used for testing, I may need to travel to level 3 or even 4)

The problem is: I can't figure out how to bypass the 'random' GET parameter that is added after some recursion cycles, so my final result in the /tmp folder is like this:

/tmp/www.example.com/mainpage.do
/tmp/www.example.com/mainpage.do?cx=0.0340590343408
/tmp/www.example.com/mainpage.do?cx=0.0348934786475
/tmp/www.example.com/mainpage.do?cx=0.0032878284787
/tmp/www.example.com/mainpage.do?cx=0.0266389459023
/tmp/www.example.com/mainpage.do?cx=0.0103290334732
/tmp/www.example.com/mainpage.do?cx=0.0890345378478

Since the page it is always the same I don't need to get it other times, I tried with -nc option but it doesn't work, I also tried using -R (reject) but it only works with file extensions, not with URL parameters.

I looked extensively in the wget manual but I don't seem to find a way to do it; it is not mandatory to use wget, if you know how to do it in other ways, they are welcome.

require 'webrick/httpproxy' s = WEBrick::HTTPProxyServer.new( :Port => 2200, :ProxyContentHandler => Proc.new{|req,res| res.body.gsub!(/mainpage.do?cz=[0-9\.]*/, "mainpage.do") } ) trap("INT"){ s.shutdown } s.start

Linux – wget: how to download a file, whose url params changes dynamically, once only

Best Answer

Related Topic

Best Answer

Related Solutions

WGet or cURL: Mirror Site from http://site.com And No Internal Access

Linux – How to i make wget download only pages not css images etc

Related Topic