AWS CLI
See the "AWS CLI Command Reference" for more information.
AWS recently released their Command Line Tools, which work much like boto and can be installed using
sudo easy_install awscli
or
sudo pip install awscli
Once installed, you can then simply run:
aws s3 sync s3://<source_bucket> <local_destination>
For example:
aws s3 sync s3://mybucket .
will download all the objects in mybucket
to the current directory.
And will output:
download: s3://mybucket/test.txt to test.txt
download: s3://mybucket/test2.txt to test2.txt
This will download all of your files using a one-way sync. It will not delete any existing files in your current directory unless you specify --delete
, and it won't change or delete any files on S3.
You can also do S3 bucket to S3 bucket, or local to S3 bucket sync.
Check out the documentation and other examples.
Whereas the above example is how to download a full bucket, you can also download a folder recursively by performing
aws s3 cp s3://BUCKETNAME/PATH/TO/FOLDER LocalFolderName --recursive
This will instruct the CLI to download all files and folder keys recursively within the PATH/TO/FOLDER
directory within the BUCKETNAME
bucket.
Well somethign is wrong in your use of the DatePicker, its kind of hard to figure out with all that code you posted, is it possible you can step through the code and mark the line that causes the exception to be thrown?
Best Answer
Sync Your S3 Bucket to an EC2 Server Periodically
This can be easily achieved by utilizing multiple command line utilities that make it possible to sync a remote S3 bucket to the local filesystem.
s3cmd
At first,
s3cmd
looked extremely promising. However, after trying it on my enormous S3 bucket -- it failed to scale, erroring out with aSegmentation fault
. It did work fine on small buckets, though. Since it did not work for huge buckets, I set out to find an alternative.s4cmd
The newer, multi-threaded alternative to
s3cmd
. Looked even more promising, however, I noticed that it kept re-downloading files that were already present on the local filesystem. That is not the kind of behavior I was expecting from the sync command. It should check whether the remote file already exists locally (hash/filesize checking would be neat) and skip it in the next sync run on the same target directory. I opened an issue (bloomreach/s4cmd/#46) to report this strange behavior. In the meantime, I set out to find another alternative.awscli
And then I found
awscli
. This is Amazon's official command line interface for interacting with their different cloud services, S3 included.It provides a useful sync command that quickly and easily downloads the remote bucket files to your local filesystem.
Benefits:
Accidental Deletion
Conveniently, the
sync
command won't delete files in the destination folder (local filesystem) if they are missing from the source (S3 bucket), and vice-versa. This is perfect for backing up S3 -- in case files get deleted from the bucket, re-syncing it will not delete them locally. And in case you delete a local file, it won't be deleted from the source bucket either.Setting up awscli on Ubuntu 14.04 LTS
Let's begin by installing
awscli
. There are several ways to do this, however, I found it easiest to install it viaapt-get
.Configuration
Next, we need to configure
awscli
with our Access Key ID & Secret Key, which you must obtain from IAM, by creating a user and attaching the AmazonS3ReadOnlyAccess policy. This will also prevent you or anyone who gains access to these credentials from deleting your S3 files. Make sure to enter your S3 region, such asus-east-1
.Preparation
Let's prepare the local S3 backup directory, preferably in
/home/ubuntu/s3/{BUCKET_NAME}
. Make sure to replace{BUCKET_NAME}
with your actual bucket name.Initial Sync
Let's go ahead and sync the bucket for the first time with the following command:
Assuming the bucket exists, the AWS credentials and region are correct, and the destination folder is valid,
awscli
will start to download the entire bucket to the local filesystem.Depending on the size of the bucket and your Internet connection, it could take anywhere from a few seconds to hours. When that's done, we'll go ahead and set up an automatic cron job to keep the local copy of the bucket up to date.
Setting up a Cron Job
Go ahead and create a
sync.sh
file in/home/ubuntu/s3
:Copy and paste the following code into
sync.sh
:Make sure to replace {BUCKET_NAME} with your S3 bucket name, twice throughout the script.
Next, make sure to
chmod
the script so it can be executed bycrontab
.Let's try running the script to make sure it actually works:
The output should be similar to this:
Next, let's edit the current user's
crontab
by executing the following command:If this is your first time executing
crontab -e
, you'll need to select a preferred editor. I'd recommend selectingnano
as it's the easiest for beginners to work with.Sync Frequency
We need to tell
crontab
how often to run our script and where the script resides on the local filesystem by writing a command. The format for this command is as follows:The following command configures
crontab
to run thesync.sh
script every hour (specified via the minute:0 and hour:* parameters) and to have it pipe the script's output to async.log
file in ours3
directory:You should add this line to the bottom of the
crontab
file you are editing. Then, go ahead and save the file to disk by pressing Ctrl + W and then Enter. You can then exitnano
by pressing Ctrl + X.crontab
will now run the sync task every hour.All set! Your S3 bucket will now get synced to your EC2 server every hour automatically, and you should be good to go. Do note that over time, as your S3 bucket gets bigger, you may have to increase your EC2 server's EBS volume size to accommodate new files. You can always increase your EBS volume size by following this guide.