Cloud Storage – Architecture for Uploading Large Files from Multiple Endpoints

cloudstorageupload

I am working on a desktop app that offers uploading to the cloud storage. Storage providers have an easy way to upload files. You get accessKeyId and secretAccessKey and you are ready to upload. I am trying to come up with optimal way for upload files.

Option 1. Pack each app instance with access keys. This way files can be uploaded directly to cloud without the middle man. Unfortunately, I cannot execute any logic before uploading to the cloud. For example.. if each users has 5GB of storage available, I cannot verify this constraint right at storage provider. I might send a request to my own server before upload to make verification, but since keys are hardcoded in app and I am sure this is an easy exploit.

Option 2. Send each uploaded file to a server, where constraint logic can be executed and forward the file to the final cloud storage. This approach suffers from bottleneck at the server. For example, if 100 users start uploading(or downloading) 1 GB file and if the server has bandwidth speed 1000Mb/s, than each user uploads at only 10Mb/s = 1.25MB/s.

Option 2 seems to be the way to go, because I get control over who can upload. I am looking for tips to minimise bandwidth bottleneck. What approach is recommended to handle simultaneous uploading of large files to the cloud storage? I am thinking of deploying many low-cpu and low-memory instances and use streaming instead of buffering the whole file first and sending it after.

Best Answer

Option 1 vs Option 2

Any validation you do on your server is obviously completely pointless, if you then allow the user to upload the file directly to your cloud storage (Option 1).

Going through your server (Option 2) may be a good approach at the beginning, if you don't expect to have large numbers of concurrent users right from the start. But your question was about how to move files to the cloud directly...

Alternative Solution

You don't want to give your users the secretAccessKey - that's why it's called secret. Instead, you'll validate your users and provide them with a temporary, restricted access key to your cloud storage (e.g. AWS STS). The client then uses this key to upload the file.

It should be possible to set up basic restrictions on file size, etc. with your storage provider. For more complex verification (e.g. only cat pics are allowed), you'll likely have to run the validation after the upload completed and then remove invalid files.

Related Topic