The answer, apparently, was 1 hour, 20 minutes. I'm not sure if that's a "normal" time or if it was delayed by some inefficiency in my setup.
I am not sure your assumption that an ORDER BY clause would require an index on f1 is actually correct.
I created such a table and ran
explain SELECT DISTINCT T.f1 as result FROM rowtest T WHERE f2=10 order by result LIMIT 0,30
And I got back this:
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
1 | SIMPLE | T | ref | idx_f2 | idx_f2 | 4 | const | 3 | Using where; Using temporary; Using filesort
Now the fact that the server will be using a temporary table and filesort is not hinting at a particularly fast or efficient way of doing this. However, there is nothing in there that says you need an index on f1. Ignore the fact that in my case there will only be 3 rows in the result set (I couldn't afford to create a table with 320 million rows).
Now: if I add an index to the table on column f1, the result of the explain doesn't change at all, which means whether you do or don't have an index doesn't matter.
The reason for this lies in the fact that the server first retrieves all the rows that satisfy the where condition (using the index on f2), and then orders them using a temporary file. During the retrieval of the rows the index on f1 is of no help, and during the ordering stage it is not present.
Considering that your result set is never larger than 30 rows, the ordering in a temporary file will not take up any time at all. Try it for yourself.
EDIT Forget that last sentence, that was nonsense. I just realised that the LIMIT clause is applied AFTER the sorting takes place. So: Yes, the sorting will take some time. If your query really only returns one numerical column, it should be quite fast, though. And one truth remains: An index on f1 doesn't make any difference. Plus: AFAIK, once the rows have all been retrieved, the table is not locked for any other access. And because that doesn't change, there is no impact on other users whether you use the ORDER BY clause or not.
Best Answer
I finally had time to revisit this project and for completeness sake wanted to post what I ended up doing. The original problem with the DELETE FROM taking so long, I believe, is related to how I was specifying rows to be deleted. I was using a statement similar to
The starttime field is actually a DATETIME. I ended up creating a new table using
and then changing the new table's engine
and then renaming
After this I needed to get 6 months worth of data from the old table into the new table
Thank you Martin for the suggestion. I'm new on serverfault so I'm not sure how I would mark his comment as an answer.