I have a spreadsheet with hexadecimal Unicode code-points in one column. Usually it’s just a single number, which I can convert to a character with =CHAR(HEX2DEC(XY123))
. In some cases, though, there are multiple code-points separated by spaces, e.g. 1F449 1F44C
, which I can split into an array and rejoin with =JOIN(; SPLIT(XY123; " "))
or =CONCATENATE(SPLIT(XY123; " "))
.
How can I convert each of the numbers into a character before concatenating them (like in a for each … in …
loop in many programming languages)?
Naive =CONCATENATE(CHAR(HEX2DEC(SPLIT(XY123; " "))))
results in a #VALUE!
error. I can easily get a string of numeric xML character references by using ="&#x" & JOIN(";&#x"; SPLIT(XY123; " ")) & ";"
, but there seems to be no predefined function to convert these and neither for other escape notation like ="\u" & JOIN("\u"; SPLIT(Codepoint!F8; " "))
.
PS: I know it should be pretty simple with a Custom Function in Google Sheets, but I’m hoping for a more generic solution.
Best Answer
Short answer
Use ARRAYFORMULA and the following to build a single formula solution:
Use REGEXEXTRACT and the array handling feature of Google Sheets to split column values
Do the required conversions
Use QUERY, TRANSPOSE & SUBSTITUTE to join two or several columns instead of CONCATENATE, JOIN & SPLIT.
Explanation
ARRAYFORMULA could work in a similar way than "for each item in array" loop, but not all Google Sheets functions are able to handle arrays as is the case of CONCATENATE, SPLIT and JOIN. Fortunately, REGEXTRACT could be used to split an indeterminate number of row values into columns and QUERY could be used to join two or several columns by using the headers argument to include all the rows as headers.
As QUERY will add a space character as separator of the joined headers, we will need to remove it.
Example
Assume that the column with hexadecimal Unicode code-points is column A and the values starts at A1 and ends at A7
=ARRAYFORMULA(TRANSPOSE(SUBSTITUTE(QUERY(TRANSPOSE(IFERROR(CHAR(HEX2DEC(REGEXEXTRACT(" "&A1:A7,REPT(" +[^ ]+",COLUMN(OFFSET(A1,,,1,6))-1)&" +([^ ]+)"))))),"SELECT *",COUNTA(A1:A7))," ","")))
Notes:
The 6 in COLUMN(OFFSET(A1,,,1,6))-1 gives as result {0,1,2,3,4,5} because it's inside of ARRAYFORMULA. This determines the maximum number of elements by row in the split. Calculating this number inside the formula will make the spreadsheet slow to recalculate.
Using open ended references (A:A) will work but also will make the spreadsheet slow to recalculate.
References