Google-sheets – Convert jagged array to 2D array in Google Sheets

google sheets

I have two columns in a Google sheet: an ID column with a single value per cell and Color column with an arbitrary number of comma separated values per cell. I'd like to expand the Color column so each value is in its own cell, while retaining their associated IDs. This is illustrated in the attached image, and in this worksheet.

I'm looking for a script or formulas to perform this operation on an array with ~30,000 rows. My clumsy solution (below and in second tab of worksheet) gives an error above about 1000 rows, as the length of a string made by join() is limited to 50,000 characters:

Create a list of repeated ID's with the same number of elements as as the list of colors in column C

=rept(A4&",",COUNTA(split(B4,", ",false)))

Concatenate each column then split each column:

=transpose(split(join(",",C4:C),",",false))  
=transpose(split(join(", ",B4:B),", ",false))

enter image description here

Best Answer

For the OP example try the following formula:

=ArrayFormula(
IFERROR(
REGEXEXTRACT(","&
TRANSPOSE(TRIM(SPLIT(QUERY(TRANSPOSE(QUERY(
IFERROR(A4:A6&","&
REGEXEXTRACT(","&
SUBSTITUTE(B4:B6,", ",","),"^"&REPT(",+[^,]+",
COLUMN(OFFSET(A1,,,1,6))-1)&",+([^,]+)")&";")
,,1000))
,,1000),";"))),"^"&REPT(",+[^,]+",
COLUMN(OFFSET(A1,,,1,2))-1)&",+([^,]+)"
)
)
)

For the 30,000 rows:

  • Change the references accordingly.
  • Replace 1000 by 30000 or a larger number.
    • The 1000 is the headers argument of QUERY(data,query,[headers]). We want use a number big enough to include all the rows of the source data.
  • If the second column of the source data has rows with more than 6 items, replace the 6 in OFFSET(A1,,,1,6) by the corresponding number of items.

See the reference for further details.

References

JOINing code-points after CHAR conversion in Google Spreadsheets