Google-sheets – Select Unique on 1 Column in Google Sheets – Distinct Doesn’t

google sheetsgoogle-sheets-query

I'm trying to set up a kind of a bid thing, where the top 5 respondents receive a prize. I was hoping to use SQL to get the top 5, but I can't remove duplicate bidders and only take their top bid. I wanted to do something like this:

=QUERY(range, "SELECT timestamp, bid, DISTINCT name, department ORDER BY bid DESC LIMIT 5")

But DISTINCT doesn't work in Google Sheets. Any idea as to what I should do in this situation?

Here is my dummy data set (ignore the timestamps):

Timestamp Bid ($) Name Department
2/16/2021 10:45:07 18 Jim Accounting
2/16/2021 9:42:40 22 Bob Accounting
2/16/2021 9:26:15 21 Samantha Sales
2/16/2021 9:26:15 30 Jim Accounting
2/16/2021 10:45:07 43 Jim Accounting
2/16/2021 9:42:40 23 Susan Engineering
2/16/2021 9:26:15 45 Susan Engineering
2/16/2021 9:26:15 26 Alex HR
2/16/2021 10:45:07 12 Geoff Sales
2/16/2021 9:42:40 12 Sarah Sales

This is my desired outcome:

Timestamp Bid ($) Name Department
2/16/2021 9:26:15 45 Susan Engineering
2/16/2021 10:45:07 43 Jim Accounting
2/16/2021 9:26:15 26 Alex HR
2/16/2021 9:42:40 22 Bob Accounting
2/16/2021 9:26:15 21 Samantha Sales

This is what I have so far. You can see that Jim and Susan are in the table twice and that Bob and Samantha have missed out on placing as a result of this.
Using the query:

=QUERY(A3:D12, "SELECT A, B, C, D ORDER BY B DESC LIMIT 5")
Timestamp Bid ($) Name Department
2/16/2021 9:26:15 45 Susan Engineering
2/16/2021 10:45:07 43 Jim Accounting
2/16/2021 9:26:15 30 Jim Accounting
2/16/2021 9:26:15 26 Alex HR
2/16/2021 9:42:40 23 Susan Engineering

Best Answer

Assuming that you don't have anything unrelated underneath your main A:D data, try this formula somewhere in Column E or rightward:

=ArrayFormula({"Timestamp","Bid ($)","Name","Department";VLOOKUP(ARRAY_CONSTRAIN(UNIQUE(SORT(C3:C,B3:B,0)),5,1),QUERY({C3:C,A3:D},"Select * Order By Col3 Desc"),{2,3,4,5},FALSE)})

How It Works:

{"Timestamp","Bid ($)","Name","Department"; ... }

Headers are generated. The semicolon means what follows will be placed below that in the virtual array formed within the curly brackets.

SORT(C3:C,B3:B,0)

Creates a virtual array of all names in order from highest to lowest bid.

UNIQUE(SORT(C3:C,B3:B,0))

Reduces that list of names to only UNIQUE entries (but still in order from highest to lowest bid).

ARRAY_CONSTRAIN(UNIQUE(SORT(C3:C,B3:B,0)),5,1)

Constrains that list to only five entries (i.e., the top five names)

QUERY({C3:C,A3:D},"Select * Order By Col3 Desc")

Forms a virtual array that starts with the names column followed by all four columns (five columns total), ordered by the bids in descending order (i.e., highest to lowest).

VLOOKUP([array one],[array two],{2,3,4,5},FALSE)

This will look up the first array (i.e. the five top names) within in the second array (i.e., the QUERY) and return the matching data from columns 2,3,4,5 (i.e., the original four columns of data, recalling that the virtual first column being used by the lookup is a repeat of all the names). FALSE simply means we're looking for exact matches on the five top names.

Your returned dates may appear as numbers in the 40000 range with decimal components. That is Google Sheets raw format for date/time (the number of days and partial days since December 31, 1899). Just format that column in the date format you prefer using Format > Number.