Counting the number of pages in a website

web-crawlerwebsite

What is the easiest way to get a count of the number of pages on a website? I don't want to actually download a local copy the entire site, just get a count of pages on it. Is there a tool (or combination of tools) that can crawl all the pages and links and give a total?

Best Answer

The quick and dirty way is to go to google and run a search like:

site:mydomain.com

This example shows 232 known pages for fronde.com: http://i47.tinypic.com/j0h003.jpg

That will return the number of pages that google is aware of on that site. You may need to adjust your google preferences to include all content types (Turn SafeSearch off) and click the 'some results were omitted' warning before it'll give you its most accurate count.

To do it manually is harder. In order to discover all the pages on a particular website, you'll have to download the landing page, parse it for links that refer to the same web domain, then iteratively download those HTML pages and scan them as well. This continues iteratively until all links have been checked.

This method takes time (although with a tool like HTTrack, you can turn off non-HTML content downloading to save time).

This method will also miss orphaned pages that are not linked from the main page of the site.

Related Topic