AWS Athena – Rolling Off Old Partitions in a Table

amazon-web-serviceslifecycle

I've got some rollup data that I create every night and store in an S3 bucket, partitioned by date. I execute an ALTER TABLE foo ADD PARTITION... to add each new partition to Athena as it's created. I've been able to verify that this successfully adds the data and that I can query it in Athena. So far, so good.

What I'd like to do now is have data older than 30 days automatically roll off. I can set up a lifecycle rule to drop the old data from S3; will this also automatically remove it from Athena, or do I need to take direct action in Athena itself to remove it from the table as well?

It seems to work as I expect when I simply remove the partition in S3, but I can't find anything definitive that says this is the recommended way to handle this.

Best Answer

IIRC we use Glue Crawler to rescan the S3 and recreate the Athena tables every night. Not 100% sure if it's needed, maybe not. It doesn't hurt though :)

Related Topic