Google-sheets – JOINing code-points after CHAR conversion in Google Spreadsheets

formulasgoogle sheets

I have a spreadsheet with hexadecimal Unicode code-points in one column. Usually it’s just a single number, which I can convert to a character with =CHAR(HEX2DEC(XY123)). In some cases, though, there are multiple code-points separated by spaces, e.g. 1F449 1F44C, which I can split into an array and rejoin with =JOIN(; SPLIT(XY123; " ")) or =CONCATENATE(SPLIT(XY123; " ")).
How can I convert each of the numbers into a character before concatenating them (like in a for each … in … loop in many programming languages)?

Naive =CONCATENATE(CHAR(HEX2DEC(SPLIT(XY123; " ")))) results in a #VALUE! error. I can easily get a string of numeric xML character references by using ="&#x" & JOIN(";&#x"; SPLIT(XY123; " ")) & ";", but there seems to be no predefined function to convert these and neither for other escape notation like ="\u" & JOIN("\u"; SPLIT(Codepoint!F8; " ")).

PS: I know it should be pretty simple with a Custom Function in Google Sheets, but I’m hoping for a more generic solution.

Best Answer

Short answer

  • Use ARRAYFORMULA and the following to build a single formula solution:

    • Use REGEXEXTRACT and the array handling feature of Google Sheets to split column values

    • Do the required conversions

    • Use QUERY, TRANSPOSE & SUBSTITUTE to join two or several columns instead of CONCATENATE, JOIN & SPLIT.

Explanation

ARRAYFORMULA could work in a similar way than "for each item in array" loop, but not all Google Sheets functions are able to handle arrays as is the case of CONCATENATE, SPLIT and JOIN. Fortunately, REGEXTRACT could be used to split an indeterminate number of row values into columns and QUERY could be used to join two or several columns by using the headers argument to include all the rows as headers.

As QUERY will add a space character as separator of the joined headers, we will need to remove it.

  1. Split the row values into columns by using REGEXTRACT and the array handling feature of Google Sheets.
  2. Convert the values
  3. TRANSPOSE the previous result
  4. Join all the rows by using QUERY
  5. Remove the space by using SUBSTITUTE
  6. TRANSPOSE the previous result to get one column

Example

Assume that the column with hexadecimal Unicode code-points is column A and the values starts at A1 and ends at A7

=ARRAYFORMULA(TRANSPOSE(SUBSTITUTE(QUERY(TRANSPOSE(IFERROR(CHAR(HEX2DEC(REGEXEXTRACT(" "&A1:A7,REPT(" +[^ ]+",COLUMN(OFFSET(A1,,,1,6))-1)&" +([^ ]+)"))))),"SELECT *",COUNTA(A1:A7))," ","")))

Notes:

  • The 6 in COLUMN(OFFSET(A1,,,1,6))-1 gives as result {0,1,2,3,4,5} because it's inside of ARRAYFORMULA. This determines the maximum number of elements by row in the split. Calculating this number inside the formula will make the spreadsheet slow to recalculate.

  • Using open ended references (A:A) will work but also will make the spreadsheet slow to recalculate.

References