There's plenty of ways of doing backups, here's a few thoughts and options.
Incremental Backups
Ideally your backups should be incremental. This means you can roll back to previous versions if you need to. It can also significantly reduce CPU, bandwidth usage, and sometimes storage. You can also look at differential backups, which are slightly different, in that they're a delta to the last full backup.
On-Premise backups to AWS
The AWS Storage Gateway virtual appliance could be useful for you. Install the VM, allocate some disk. There are a few modes and they changed the names not so long ago, but in essence it works like a local disk that's backed up to S3. It can keep all your data on premise, or use the on-premise disk as a cache to S3.
Otherwise there are plenty of other options to get data into AWS.
AWS S3 Sync
I use the "aws s3 sync" command line utility to upload data to AWS S3, using the IA storage class. This keeps the backups safe on AWS S3. You can upload the files you want backed up direct to S3, you can upload the repository of a backup program that's stored locally, or your backup software may use S3 natively. Using the command line here's what you do.
aws s3 sync /path/to/backups s3://bucketname/backups --storage-class STANDARD_IA --delete --exclude ".sync/*"
You can turn on encryption and versioning within S3. It keeps each version of the file separately, it's not incremental, so it can use a more storage than incremental backup software that also does compression.
Destination
S3 is a good place for backups. If your backups are fairly static you can use S3 Infrequent Access class storage, which is cheaper than standard storage class. You can use the S3 Glacier storage class if your backups are immutable (never change), but remember it takes hours to get those backups back.
Dropbox is a reasonable place for backups. I've used the Dropbox Uploader script. It's not as flexible though - it doesn't delete files that have been removed locally, wasting disk space. This is how you use it.
/opt/Dropbox-Uploader/dropbox_uploader.sh -s -q upload /path/to/files /dropbox/path
**Borg Backup (no longer using) **
I backup my Linux server (which happens to be on AWS) using Borg Backup. This creates an incremental, de-duplicated backup on a local disk. It has retention policies which tell it how long to save data - eg every night for a week, once a week for a month, monthly for a year. There are plenty of incremental backup programs you can use.
One thing I don't love about Borg Backup is each time it runs it renames existing files. I think this behaviour changes for really large backups, but my 500MB backup definitely renames the file each night. You end up with hundreds of tiny files and one new / renamed file each day that's large. Because of that, if your backups are remote you'll probably use a lot more bandwidth that you'd expect. I stopped using Borg because of this.
Borg also supports remote repositories natively, accessed via SSH. You could have have an EC2 instance with an EBS disk come up, sync to it, then the instance go down - but EBS is a lot more expensive than S3 so it's not a great option.
**Restic Backup (now using for PC / web server backups) **
I've been using Restic Backup for a couple of years now. It aims to be easy, fast, verifiable, secure, efficient, and free. It works on most platforms (*nix, Windows, etc), and it's compiled to a single binary so installation is easy.
It's block based and efficient with disk / network. I've done multiple restore tests and it's always restored everything perfectly. I'm happy with restic. It's under slow but steady development, things like compression will be added in future - it does de-duplication but not compression as at August 2020.
Initialise the repository
set RESTIC_PASSWORD=abcdefg
restic_0.9.1_windows_amd64.exe init --repo x:\repository
set RESTIC_PASSWORD=abcdefg
restic_0.9.1_windows_amd64.exe --exclude c:\data\exclude --repo c:\data backup x:\repository
You can have Restic keep backups for configurable amounts of time - for example keep daily backups, then weekly backups for 8 weeks, monthly for 24 months.
restic_0.9.1_windows_amd64.exe --repo x:\repository forget --keep-daily 7 --keep-weekly 8 --keep-monthly 24
restic_0.9.1_windows_amd64.exe --repo x:\repository prune
If you want Restic to backup to S3 you just define your keys and do a backup like this. One thing to consider here is Restic may read the data from S3, even though it has a local cache, so IA class might end up more expensive than standard in some cases - though I suspect those would be rare cases.
Here's the basic setup of Restic:
REM setup S3 (once)
set AWS_ACCESS_KEY_ID=ABCDEFGHIJK
set AWS_SECRET_ACCESS_KEY=XXXXXXXXXXXXXXXXXXXXXXXXXX/AAA
set RESTIC_PASSWORD=abcdefg
restic_0.9.1_windows_amd64.exe --repo s3:s3.amazonaws.com/s3-bucket-name init
This is how you do the backup
REM backup
restic_0.9.1_windows_amd64.exe --repo s3:s3.amazonaws.com/s3-bucket-name backup c:\data
Prior to these changes it was required to create an archive within Glacier and place files within that archive. The link you referenced details how Glacier is now a storage class of S3. You no longer need to move files into Glacier, you can simply upload them as storage class GLACIER
or DEEP_ARCHIVE
. You can also change the storage type of existing files via the Permission tab or from the command line.
From the AWS CLI, you can use a command similar to this:
aws s3 cp /etc/hosts s3://faketest/hosts --storage-class GLACIER
You can see the storage class using s3api
:
aws s3api list-objects --bucket faketest
To do this from the console, click on the Properties tab and select GLACIER
You can similarly set the storage class if you upload a file through the console.
For existing files you can change their storage class through the CLI using something similar to:
aws s3api copy-object --copy-source faketest/temp.txt --bucket faketest --storage-class GLACIER --key temp.txt
The above command copies an existing file from the bucket back to the same bucket with a change to storage class. There may be alternative methods to this.
References
Glacier FAQ
S3 CLI cp
S3 Storage Classes
s3api copy-object
Best Answer
Per the Glacier FAQ:
So what this means is each file you upload is assigned a unique ID. Upload the same file twice and each copy of the file gets its own ID. This gives you the ability to restore to previous versions of the file if desired.
To avoid the surcharge for deleting data less than 3 months old this is likely the best approach. But it won't just be the data that doesn't exist any more that you need to track & delete. As mentioned above, any time a file changes and you re-upload it to Glacier you'll get a new ID for the file. You'll eventually want to delete the older versions of the file as well, assuming you don't want the ability to restore to those older versions.
That's the tradeoff you really have to decide for yourself. Do you tar/zip everything and then be forced to track those files and everything in them, or is it worth it to you to upload files individually so you can purge them individually as they're no longer needed.
A couple other approaches you might consider:
Having said all that, however, Glacier just may not be the best approach for your needs. Glacier is really meant for data archiving, which is different than just backing up servers. If you just want to do incremental backups of a server then using S3 instead of Glacier might be a better approach. Using a tool like Duplicity or rdiff-backup (in conjunction with something like s3fs) would give you the ability to take incremental backups to an S3 bucket and manage them very easily. I've used rdiff-backup on a few linux systems over the years and found it worked quite nicely.