Sql – Nested Query Alternatives in AWS Athena

amazon-athenaamazon-web-servicesprestosql

I am running a query that gives a non-overlapping set of first_party_id's – ids that are associated with one third party but not another. This query does not run in Athena, however, giving the error: Correlated queries not yet supported.

Was looking at prestodb docs, https://prestodb.io/docs/current/sql/select.html (Athena is prestodb under the hood), for an alternative to nested queries. The with statement example given doesn't seem to translate well for this not in clause. Wondering what the alternative to a nested query would be – Query below.

SELECT 
         COUNT(DISTINCT i.third_party_id) AS uniques
FROM 
         db.ids i
WHERE
         i.third_party_type = 'cookie_1'
         AND i.first_party_id NOT IN (
           SELECT
             i.first_party_id
           WHERE 
             i.third_party_id = 'cookie_2'
         )

Best Answer

There may be a better way to do this - I would be curious to see it too! One way I can think of would be to use an outer join. (I'm not exactly sure about how your data is structured, so forgive the contrived example, but I hope it would translate ok.) How about this?

with 
  a as (select * 
       from (values 
       (1,'cookie_n',10,'cookie_2'), 
       (2,'cookie_n',11,'cookie_1'),
       (3,'cookie_m',12,'cookie_1'),
       (4,'cookie_m',12,'cookie_1'),
       (5,'cookie_q',13,'cookie_1'),
       (6,'cookie_n',13,'cookie_1'),
       (7,'cookie_m',14,'cookie_3')
       ) as db_ids(first_party_id, first_party_type, third_party_id, third_party_type)
      ),
  b as (select first_party_type 
        from a where third_party_type = 'cookie_2'),
  c as (select a.third_party_id, b.first_party_type as exclude_first_party_type 
        from a left join b on a.first_party_type = b.first_party_type 
        where a.third_party_type = 'cookie_1')
select count(distinct third_party_id) from c 
       where exclude_first_party_type is null;

Hope this helps!