Google-sheets – Finding duplicates of emails at domain level

formulasgoogle sheetsgoogle-sheets-query

Need some formula help with the following –

I want to compile a list of same domain name emails. Sample sheet here https://docs.google.com/spreadsheets/d/1BAB9YyPbvDgNzbT-tx-JoOHZDThcAdkX8GKBBQ5AeVo/edit?usp=sharing

My Current State:
I have 2 email lists list "A" of 200 emails of different people and list "B" with 2000 email ids of different people.

I'd like to find which emails (at domain level) in list "A" are in the larger list "B"

Eg: So if list "A" has emails – ee@yahoo.com, jd@123.com, abc@gmail.com and list "B" bc@xy.com, tt@gt.com, rf@gmail.com, tt@123.com, I want to find out through the domain level (@gmail/yahoo etc) how many of list A are in list B. I don't want to find exact matches

What formula do I use? Can I use the QUERY function here?? if I can please aid in understanding how.

Thanks for the help in advance

Best Answer

Here's a way to pick out duplicate at a domain level.

=filter(
         A2:A6,
         ArrayFormula(
              REGEXMATCH(
                     A2:A6,
                     join("|",unique(ArrayFormula(REGEXEXTRACT(B2:B11,"@(.*)"))))
                       )
                   )
)

First I used regextract to extract the domain of Col Band join them with a pipe | to create a regexp.
Then filter col A with the array that has been created with regexmatch + arrayformula