How to archive a website which reads: Page cannot be crawled or displayed due to robots.txt

archive.org

If there is a page, that isn't archived in the Waybackmachine how can I add this site to the archive? Or is there another service like that where you can add custom sites to be archived "forever"?

I would like to be able to proove that the site existed like I took a screenshot today.

Does this also work for https-sites?

I found this in the FAQ: My site’s not archived! How can I add it?

but it doesen't work for that page: https://github.com/ZachWick/TableCSVExport

Some other pages have stated, that

Page cannot be crawled or displayed due to robots.txt.

like http://web.archive.org/web/*/https://github.com/gilbitron/WordPress-Settings-Framework

But http://web.archive.org/liveweb/https://github.com/ZachWick/TableCSVExport just shows the live page

Best Answer

As their FAQ's state:

How can I get my site included in the Wayback Machine?

Much of our archived web data comes from our own crawls or from Alexa Internet's crawls. Neither organization has a "crawl my site now!" submission process. Internet Archive's crawls tend to find sites that are well linked from other sites. The best way to ensure that we find your web site is to make sure it is included in online directories and that similar/related sites link to you.

Alexa Internet uses its own methods to discover sites to crawl. It may be helpful to install the free Alexa toolbar and visit the site you want crawled to make sure they know about it.

Regardless of who is crawling the site, you should ensure that your site's 'robots.txt' rules and in-page META robots directives do not tell crawlers to avoid your site.

When a site is crawled, there is usually at least a 6-month lag, and sometimes as much as a 24-month lag, between the date that web pages are crawled and when they appear in the Wayback Machine.

In some cases, crawled content from certain projects may appear in a much shorter timeframe — as little as a few weeks from when it was crawled. Older material for the same pages and sites may still appear separately, months later.