AWS Athena – Query Results Cleanup

amazon-web-services

Each time you run a query against Athena using the aws CLI tool, 2 files are created in the query results location. Over time this location is going to contain a LOT of files unless they're cleaned up.

Is there a way to automatically clean them up? If not, what's the best approach? The delete-named-query CLI command only works if the query is named so it doesn't look like you can use that to clean up your results when you're finished with them.

Relevant part of the AWS documentation here: https://docs.aws.amazon.com/athena/latest/ug/querying.html

Best Answer

We use S3 Lifecycle Policies for the Athena temp files cleanup.

Our AthenaStagingDir is s3://.../tmp/ and we've got a Lifecycle rule for that /tmp/ prefix that:

  • expires current objects after 1 day and then
  • deletes previous objects after another day (i.e. those expired the day before)

Lifecycle policy

I haven't found a way to immediately delete objects after 1 day but I haven't tried too hard to be honest. This 2-step / 2-day approach works well.

Hope that helps :)