How to archive a webpage to archive.today using wget or curl

archive.todayarchiving

To archive a webpage in the Internet Archive's Wayback Machine, I usually do:

wget --spider 'https://web.archive.org/save/https://example.com'

Is there a similar method that I can use to archive web pages to archive.today?

Best Answer

I've analyzed the request of manually saving a file (Firefox' developer tools have a handy 'Copy as cURL' function for this - see the bottom of the post for the actual request). It includes a lot of fluff (user agent, cookies, origin, etc.) which can be omitted, and escaping the slashes in the URL also isn't necessary. Simply executing

curl -v 'https://archive.vn/submit/' \
  --data-raw 'url=https://webapps.stackexchange.com/users/218839/flux'

is already sufficient to archive your profile page. Initially, the response was some HTML containing a 'work in progress' link: https://archive.vn/wip/dk2xB which you can use to monitor the progress and/or as a final link.

<html><body><script>setInterval(function(){document.location.replace("https://archive.vn/wip/dk2xB")},1000)</script><div>
      <img width="48" height="48" style="vertical-align:middle" src="https://archive.vn/loading.gif"/>
      <span style="vertical-align:middle;font-size:48px;padding-left:5px">Loading</span>
      <hr/>
    </div></body></html>

Now that I try it again, a couple of hours later, I don't get HTML as response but a HTTP 302 (Found) with the final URL in the Location header: https://archive.vn/dk2xB.

This is how the archived page looks like:

The original cURL request is

curl 'https://archive.vn/submit/'\
  -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:81.0) Gecko/20100101 Firefox/81.0'\
  -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'\
  -H 'Accept-Language: en-US,en;q=0.5'\
  --compressed\
  -H 'Content-Type: application/x-www-form-urlencoded'\
  -H 'Origin: https://archive.vn'\
  -H 'Connection: keep-alive'\
  -H 'Referer: https://archive.vn/'\
  -H 'Cookie: _ga=GA1.2.661111166.1603535444'\
  -H 'Upgrade-Insecure-Requests: 1'\
  -H 'TE: Trailers'\
  --data-raw 'submitid=1Z%2FjKja%2BtkGo%2BmykS2%2BrMYgTje4YZV9xk8OIlwY4NT2mLExajP7ZRmnTbJku2aMX&url=https%3A%2F%2Fwebapps.stackexchange.com%2Fquestions%2F148066%2Fhow-do-i-archive-a-webpage-to-archive-today-using-wget-or-curl'