Google Sheets Worksheet Function – Show All Combinations for a Selection of Columns

google sheetsworksheet-function

I have a sequence of letters, in Google Spreadsheets:

  | A   B   C   D   E   F
--|----------------------
1 | U   Y   0   4   X   N
2 | H   X           Y   
3 | W

The first row contains a character pattern that serves as a template. Each subsequent row contains variants of the values in some of the columns. The first row values are used for columns that don't contain variants.

There are between 1 and 2 variants per column that has variants, and these columns do not necessarily have the same number of variants. In the future, the problem could potentially have three or more variants per column.

How can I get all variations of the sequence? Such as:

UY04XN
HY04XN
UY04YN
HY04YN
...

I believe this is something to do with 'Cartesian Products', and I suppose the SQL would be roughly of the form:

SELECT table1.column1, table2.column2...
FROM  table1, table2...

… but I have no idea how that would translate into flexible spreadsheet function code.

Best Answer

Short answer

QUERY() allows the use of a subset of SQL but this subset does not include the FROM clause.

QUERY() requires a single array as the source table, so it could not be used. Instead use ARRAYFORMULA(), TRANSPOSE(), JOIN(), SPLIT(), REPT(), FILTER(), COUNTA(), SORT() and the division (/) and concatenate (&) operators to build a "flexible formula" as its performance is better than a custom function.

Procedure

Assuming that for "flexible spreadsheet function code" the OP means a formula,

  1. To make things simpler, add an apostrophe before each number value in order to format them as text.

Note: Another way, would require to use COUNT instead of COUNTA for numeric columns. This will make the following procedure less flexible as this will only work in the "future" provided all of the column values added later are numbers.

  1. Calculate the size for the cartesian product.

H1:

=COUNTA(A:A)*COUNTA(B:B)*COUNTA(C:C)*COUNTA(D:D)*COUNTA(E:E)*COUNTA(F:F)

Note: There are five columns to match the sample data provided by the OP.

  1. Calculate the cartesian product:

J1:

=ArrayFormula(
  {
     SORT(
       TRANSPOSE(         
       split(REPT(JOIN(",",TRANSPOSE(filter(A:A,LEN(A:A))))&",",$H$1/COUNTA(A:A)),","))
       &TRANSPOSE(         
       split(REPT(JOIN(",",TRANSPOSE(filter(B:B,LEN(B:B))))&",",$H$1/COUNTA(B:B)),","))
    ,1,TRUE)           
    &TRANSPOSE(         
       split(REPT(JOIN(",",TRANSPOSE(filter(C:C,LEN(C:C))))&",",$H$1/COUNTA(C:C)),","))
    &TRANSPOSE(         
       split(REPT(JOIN(",",TRANSPOSE(filter(D:D,LEN(D:D))))&",",$H$1/COUNTA(D:D)),","))         
    &TRANSPOSE(         
       split(REPT(JOIN(",",TRANSPOSE(filter(E:E,LEN(E:E))))&",",$H$1/COUNTA(E:E)),","))
    &TRANSPOSE(         
       split(REPT(JOIN(",",TRANSPOSE(filter(F:F,LEN(F:F))))&",",$H$1/COUNTA(F:F)),","))
  }
)

Explanation

f1(A): $H$1/COUNTA(A:A) number of times the A column values should be repeated.

f2(A): filter(A:A,LEN(A:A)) values in column A.

f3(A): TRANSPOSE(split(REPT(JOIN(",",TRANSPOSE(f2))&",",f1),",")) repeats the column values

f4(A-B): SORT(f3(A)&f3(B),1,TRUE) Concatenates the first pair of columns and sorts them in ascending order.

f5(): f4(A-B)&f3(C)&f3(D)&f3(E)&f3(F) Concatenates the other columns rows. As only one of this columns has more than one element, it's not required to sort them again.

Note: There is one f3() formula for each column. There are five to match the sample data provided by the OP.

f6(): ArrayFormula(f5()): "Enables the display of values returned from an array formula into multiple rows and/or columns and the use of non-array functions with arrays"1.

Demo spreadsheet

References