Copy a website and preserve the file & folder structure

webweb-crawler

I have an old web site running on an ancient version of Oracle Portal that we need to convert to a flat-html structure. Due to damage to the server we are not able to access the administrative interface, and even if we could there is no export functionality that can work with modern software versions.

It would be enough to crawl the website and have all the pages & images saved to a folder, but the file structure needs to be preserved; that is, if a page is located at http://www.oldserver.com/foo/bar/baz/mypage.html then it needs to be saved to /foo/bar/baz/mypage.html so that the various Javascript bits will continue to function.

None of the web crawlers I've found have been able to do this; they all want to rename the pages (page01.html, page02.html etc) and break the folder structure.

Is there any crawler out there that will recreate the site structure as it appears to a user accessing the site? It doesn't need to redo any of teh content of the pages; once rehosted the pages will all have the same names they did originally so links will continue to work.

Best Answer

wget -r will recursively get an entire website and save it all locally in the same structure.

Related Topic