Sql – AWS Athena (Presto) OFFSET support

amazon-athenaamazon-web-servicesprestosql

I would like to know if there is support for OFFSET in AWS Athena. For mysql the following query is running but in athena it is giving me error. Any example would be helpful.

select * from employee where empSal >3000 LIMIT 300 OFFSET 20

Best Answer

Athena is basically managed Presto. Since Presto 311 you can use OFFSET m LIMIT n syntax or ANSI SQL equivalent: OFFSET m ROWS FETCH NEXT n ROWS ONLY.

You can read more in Beyond LIMIT, Presto meets OFFSET and TIES.

For older versions (and this includes AWS Athena as of this writing), you can use row_number() window function to implement OFFSET + LIMIT.

For example, instead of

SELECT * FROM elb_logs
OFFSET 5 LIMIT 5 -- this doesn't work, obviously

You can execute

SELECT * FROM (
    SELECT row_number() over() AS rn, * FROM elb_logs)
WHERE rn BETWEEN 5 AND 10;

Note: the execution engine will still need to read offset+limit rows from the underlying table, but this is still much better than sending all these rows back to the client and taking a sublist there.

Warning: see https://stackoverflow.com/a/45114359/65458 for explanation why avoiding OFFSET in queries is generally a good idea.

Related Solutions

Mysql – Multiple Updates in MySQL

Yes, that's possible - you can use INSERT ... ON DUPLICATE KEY UPDATE.

Using your example:

INSERT INTO table (id,Col1,Col2) VALUES (1,1,1),(2,2,3),(3,9,3),(4,10,12)
ON DUPLICATE KEY UPDATE Col1=VALUES(Col1),Col2=VALUES(Col2);

Sql – What are the options for storing hierarchical data in a relational database

My favorite answer is as what the first sentence in this thread suggested. Use an Adjacency List to maintain the hierarchy and use Nested Sets to query the hierarchy.

The problem up until now has been that the coversion method from an Adjacecy List to Nested Sets has been frightfully slow because most people use the extreme RBAR method known as a "Push Stack" to do the conversion and has been considered to be way to expensive to reach the Nirvana of the simplicity of maintenance by the Adjacency List and the awesome performance of Nested Sets. As a result, most people end up having to settle for one or the other especially if there are more than, say, a lousy 100,000 nodes or so. Using the push stack method can take a whole day to do the conversion on what MLM'ers would consider to be a small million node hierarchy.

I thought I'd give Celko a bit of competition by coming up with a method to convert an Adjacency List to Nested sets at speeds that just seem impossible. Here's the performance of the push stack method on my i5 laptop.

Duration for     1,000 Nodes = 00:00:00:870 
Duration for    10,000 Nodes = 00:01:01:783 (70 times slower instead of just 10)
Duration for   100,000 Nodes = 00:49:59:730 (3,446 times slower instead of just 100) 
Duration for 1,000,000 Nodes = 'Didn't even try this'

And here's the duration for the new method (with the push stack method in parenthesis).

Duration for     1,000 Nodes = 00:00:00:053 (compared to 00:00:00:870)
Duration for    10,000 Nodes = 00:00:00:323 (compared to 00:01:01:783)
Duration for   100,000 Nodes = 00:00:03:867 (compared to 00:49:59:730)
Duration for 1,000,000 Nodes = 00:00:54:283 (compared to something like 2 days!!!)

Yes, that's correct. 1 million nodes converted in less than a minute and 100,000 nodes in under 4 seconds.

You can read about the new method and get a copy of the code at the following URL. http://www.sqlservercentral.com/articles/Hierarchy/94040/

I also developed a "pre-aggregated" hierarchy using similar methods. MLM'ers and people making bills of materials will be particularly interested in this article. http://www.sqlservercentral.com/articles/T-SQL/94570/

If you do stop by to take a look at either article, jump into the "Join the discussion" link and let me know what you think.

Best Answer

Related Solutions

Mysql – Multiple Updates in MySQL

Sql – What are the options for storing hierarchical data in a relational database

Related Topic