Google Sheets – How to Find Duplicates Across Multiple Columns

google sheets

My spreadsheet contains information about companies and is structured ledger-like in this way:

Company   Quarter   Revenues   EBITDA
A         2010.1    5          1
A         2010.2    6          1
B         2010.1    10         2
B         2010.2    12         3
B         2010.3    20         5
A         2010.3    8          2
B         2011.1    14         3
A         2010.3    8          2
A         2010.4    7          2
A         2011.1    9          1

I want:

  • a formula (not GAS) that will show me a list of duplicates for the combination Company-Quarter. In the above example, there are two entries for Company A for Quarter 2010.3. In the data there could be multiple duplicates.

  • a formula (not GAS) that give me missing Company-quarters for a given quarters range for all companies that have at least one entry in the date range (or alternatively, from a pre-determined list of Companies). In the above example, for the range 2010.1 to 2011.1, there is no entry for Company B in 2010.4. In the data there could be multiple missing information.

Ideally, I won't need to create auxiliary columns and is all contained in one formula, so as to make the spreadsheet more elegant.

— xxx —

Thanks to @Tedinoz answer below, I came down to a one-cell formula, copied here for future reference:
=unique(sort(arrayformula(if(arrayformula(isna(vlookup(unique(arrayformula(transpose(split(join("", arrayformula(rept(filter(UNIQUE(A2:A), len(UNIQUE(A2:A)))&char(9999), counta(transpose(split(join("", arrayformula(rept(filter(arrayformula(unique(mid(query(B2:B),1,4))), len(arrayformula(unique(mid(query(B2:B),1,4)))))&char(9999), counta(arrayformula(unique(mid(query(B2:B),6,1))))))), char(9999))) )))), char(9999))) &" "&transpose(split(rept(join(char(9999), filter(arrayformula((transpose(split(join("", arrayformula(rept(filter(arrayformula(unique(mid(query(B2:B),1,4))), len(arrayformula(unique(mid(query(B2:B),1,4)))))&char(9999), counta(arrayformula(unique(mid(query(B2:B),6,1))))))), char(9999))) )&"."&(transpose(split(rept(join(char(9999), filter(arrayformula(unique(mid(query(B2:B),6,1))), len(arrayformula(unique(mid(query(B2:B),6,1))))))&char(9999), counta(arrayformula(unique(mid(query(B2:B),1,4))))), char(9999))))), len(arrayformula((transpose(split(join("", arrayformula(rept(filter(arrayformula(unique(mid(query(B2:B),1,4))), len(arrayformula(unique(mid(query(B2:B),1,4)))))&char(9999), counta(arrayformula(unique(mid(query(B2:B),6,1))))))), char(9999))) )&"."&(transpose(split(rept(join(char(9999), filter(arrayformula(unique(mid(query(B2:B),6,1))), len(arrayformula(unique(mid(query(B2:B),6,1))))))&char(9999), counta(arrayformula(unique(mid(query(B2:B),1,4))))), char(9999))))))))&char(9999), counta(arrayformula(unique(mid(query(B2:B),1,4))))), char(9999))))),arrayformula(A2:A&" "&B2:B),1,0))),unique(arrayformula(transpose(split(join("", arrayformula(rept(filter(UNIQUE(A2:A), len(UNIQUE(A2:A)))&char(9999), counta(transpose(split(join("", arrayformula(rept(filter(arrayformula(unique(mid(query(B2:B),1,4))), len(arrayformula(unique(mid(query(B2:B),1,4)))))&char(9999), counta(arrayformula(unique(mid(query(B2:B),6,1))))))), char(9999))) )))), char(9999))) &" "&transpose(split(rept(join(char(9999), filter(arrayformula((transpose(split(join("", arrayformula(rept(filter(arrayformula(unique(mid(query(B2:B),1,4))), len(arrayformula(unique(mid(query(B2:B),1,4)))))&char(9999), counta(arrayformula(unique(mid(query(B2:B),6,1))))))), char(9999))) )&"."&(transpose(split(rept(join(char(9999), filter(arrayformula(unique(mid(query(B2:B),6,1))), len(arrayformula(unique(mid(query(B2:B),6,1))))))&char(9999), counta(arrayformula(unique(mid(query(B2:B),1,4))))), char(9999))))), len(arrayformula((transpose(split(join("", arrayformula(rept(filter(arrayformula(unique(mid(query(B2:B),1,4))), len(arrayformula(unique(mid(query(B2:B),1,4)))))&char(9999), counta(arrayformula(unique(mid(query(B2:B),6,1))))))), char(9999))) )&"."&(transpose(split(rept(join(char(9999), filter(arrayformula(unique(mid(query(B2:B),6,1))), len(arrayformula(unique(mid(query(B2:B),6,1))))))&char(9999), counta(arrayformula(unique(mid(query(B2:B),1,4))))), char(9999))))))))&char(9999), counta(arrayformula(unique(mid(query(B2:B),1,4))))), char(9999))))),""))))

Best Answer

DUPLICATE COMPANY-QUARTERS

  • Insert this formula in Cell E2:

=ARRAYFORMULA(COUNTIFS(A2:A11&B2:B11&C2:C11,A2:A11&B2:B11&C2:C11,ROW(A2:A11),"<="&ROW(A2:A11)))

  • There is only one duplicate in the OP data: "A 2010.3"

Snapshot


MISSING COMPANY-QUARTERS

You want a formula that gives missing Company-quarters for a given quarters range for all companies that have at least one entry in the date range.

  • Existing Company/Quarter

    • Cell E2, enter this formula:

    =arrayformula(A2:A11&" "&B2:B11)

    • This is a helper cell/column. It builds a column of values of the existing Company/Quarter.
  • All combinations of Company/Year/Quarter

    • Cell F2, enter this formula:

=arrayformula(transpose(split(join("", arrayformula(rept(filter(UNIQUE(A2:A11), len(UNIQUE(A2:A11)))&char(9999), counta(transpose(split(join("", arrayformula(rept(filter(arrayformula(unique(mid(query(B2:B11),1,4))), len(arrayformula(unique(mid(query(B2:B11),1,4)))))&char(9999), counta(arrayformula(unique(mid(query(B2:B11),6,1))))))), char(9999))) )))), char(9999))) &" "&transpose(split(rept(join(char(9999), filter(arrayformula((transpose(split(join("", arrayformula(rept(filter(arrayformula(unique(mid(query(B2:B11),1,4))), len(arrayformula(unique(mid(query(B2:B11),1,4)))))&char(9999), counta(arrayformula(unique(mid(query(B2:B11),6,1))))))), char(9999))) )&"."&(transpose(split(rept(join(char(9999), filter(arrayformula(unique(mid(query(B2:B11),6,1))), len(arrayformula(unique(mid(query(B2:B11),6,1))))))&char(9999), counta(arrayformula(unique(mid(query(B2:B11),1,4))))), char(9999))))), len(arrayformula((transpose(split(join("", arrayformula(rept(filter(arrayformula(unique(mid(query(B2:B11),1,4))), len(arrayformula(unique(mid(query(B2:B11),1,4)))))&char(9999), counta(arrayformula(unique(mid(query(B2:B11),6,1))))))), char(9999))) )&"."&(transpose(split(rept(join(char(9999), filter(arrayformula(unique(mid(query(B2:B11),6,1))), len(arrayformula(unique(mid(query(B2:B11),6,1))))))&char(9999), counta(arrayformula(unique(mid(query(B2:B11),1,4))))), char(9999))))))))&char(9999), counta(arrayformula(unique(mid(query(B2:B11),1,4))))), char(9999))))

  • This creates a list of every combination of the Company, Year and Quarter.

  • Detecting mismatches

    • Cell G2, enter this formula:

=arrayformula(isna(vlookup(F2:F17,$E$2:$E$11,1,0)))

  • Values that return FALSE represent combinations that exist in the data list.
  • Values that return TRUE represent a Coy/Year/Quarter that is missing from the data list.

Snapshot


Credit:

Combine two columns into a list of all possible combinations of entries (webapps.stackexchange)