How does a site owner change from “excluded” to “included” on archive.org

archive.org

archive.org collects web pages for its archive. It has some method of allowing an organization to "exclude" their content. How does the owner of the site change from "excluded" to "included"?

Note that this appears not to be simply a robots.txt file issue. For example, digital.com's page shows:

Sorry.
This URL has been excluded from the Wayback Machine.

whereas, fb.com's page displays:

Page cannot be crawled or displayed due to robots.txt.
See fb.com robots.txt page. Learn more about robots.txt.

Best Answer

Why isn't the site I'm looking for in the archive?

Some sites may not be included because the automated crawlers were unaware of their existence at the time of the crawl. It's also possible that some sites were not archived because they were password protected, blocked by robots.txt, or otherwise inaccessible to our automated systems. Siteowners might have also requested that their sites be excluded from the Wayback Machine. When this has occurred, you will see a "blocked site error" message. When a site is excluded because of robots.txt you will see a "robots.txt query exclusion error" message.

The exclusions are at the choice of siteowners or due to take-down notices. I'm afraid if you are asking whether you can decide to include what is presently excluded the answer is that you cannot (for other than your own site/s).

Best Answer

Related Solutions

How to save blocked site in Wayback Machine

Archive.org – Why URL is Not Available

Related Topic