Archive.org – How to Skip 404 Errors in the Wayback Machine

archive.org

The Wayback Machine seems to archive 404 pages as well as actual content.
For example this page has a ton of 404 pages, and only the first few copies have actual content.

Is there any way to just show the non-404-pages in the calender view? And/or is there any way to (automatically) go to the latest non-404 copy in the archive?


There is a JSON API that seems to return the latest successfully (non-404) archived copy. For my example it looks like this. It could possibly be utilized by a bookmarklet or addon or something.

This script uses that API, but it is for webmasters:
http://blog.archive.org/2013/10/24/web-archive-404-handler-for-webmasters/

Best Answer

I am currently using the following bookmarklet:

var f = function(jsn){window.location.href=jsn.archived_snapshots.closest.url;};var d = document,z=d.createElement('script'),b=d.body,l=d.location;z.setAttribute('src','https://archive.org/wayback/available?url='+encodeURIComponent(l.href)+'&callback=f');z.setAttribute('type', "application/javascript");b.appendChild(z);void(0);

or for sites where I haven't enabled javascript (in the NoScript addon) i use

javascript:(function(){window.location=window.location.toString().replace(/^/,"http://archive.org/wayback/available?url=");})()

followed by

javascript:(function(){a=JSON.parse(document.getElementsByTagName("pre")[0].innerHTML);window.location=a.archived_snapshots.closest.url;})()

This uses the API I mentioned above in the question. They are tested in Firefox 32.0 and probably contains multiple bugs.