I was hoping there's is some "append only" or "mostly append" service by amazon that is designed for logging.
Like Amazon Kinesis, maybe?
With Amazon Kinesis you can have producers push data directly into an Amazon Kinesis stream. For example, system and application logs can be submitted to Amazon Kinesis and be available for processing in seconds. This prevents the log data from being lost if the front end or application server fails. Amazon Kinesis provides accelerated data feed intake because you are not batching up the data on the servers before you submit them for intake."
— http://aws.amazon.com/kinesis
I haven't tried this, yet, because I have a homebrew supervisory process that uses S3 and SQS... at the beginning of a stream it creates unique names for the temporary files (on the instance) that will capture the logs and sends a message via SQS that results in the information about the process and its log file locations being stored in a database; when the process stops (these are scheduled or event-driven, rather than continuously-running jobs), another SQS message is sent, which contains redundant information about where the temporary files were, and gives me the exit status of the process; then both logs (out and error) are compressed and uploaded to S3, with each of those processes generating additional SQS messages reporting on the S3 upload status...
The SQS messages, as you might observe, are largely redundant, but this is designed to virtually eliminate the chance that I wouldn't know something about the existence of the process, since all 4 messages (start, stop, stdout-upload-info, stderr-upload-info) contain enough information to identify the host, the process, the arguments, and where the log files will go or have gone or should have gone, in S3. Of course, all of this redundancy has been almost totally unnecessary, since the process and SQS/S3 are very stable, but the redundancy exists if it's needed.
I don't need real-time logging for these jobs, but if I did, another option would be to modify the log collector so that instead of saving up the logs and then sending them en bloc to S3, I could, for every "x" bytes of log collected or every "y" seconds of runtime -- whichever occurred first -- "flush" the accumulated data into an SQS message... there would be no need to send an SQS message for every line.
Best Answer
An amazing recipe is given in the nginx Dockerfile:
Simply, the app can continue writing to it as a file, but as a result the lines will go to
stdout
&stderr
!