Google-sheets – How to remove duplicate words in a cell only if longer than a certain number of characters

concatenateformulasgoogle sheetsgoogle-sheets-arrayformulagoogle-sheets-query

I need to remove a duplicate text in a cell but only if it is longer than 4 characters.

I have this formula that only works for any types of duplicate words:

=join(" ",unique(transpose(split(A1,", "))))

In this case, if a cell contains:

W3-X500 Samsung Galaxy W3-X500 5 inches and 5 different colors

It returns me:

W3-X500 Samsung Galaxy 5 inches and different colors

I will lose the second 5 character that instead I need.

How can I do it?

Best Answer

=REGEXREPLACE(ARRAYFORMULA(JOIN(" ", 
 UNIQUE(IF(LEN(TRANSPOSE(SPLIT(B1, ", "))), 
        IF(LEN(TRANSPOSE(SPLIT(B1, ", ")))>4, 
               TRANSPOSE(SPLIT(B1, ", ")), 
        IF(LEN(TRANSPOSE(SPLIT(B1, ", ")))<=4, 
               TRANSPOSE(SPLIT(B1, ", "))&
 "ᅇ"&CHAR(RANDBETWEEN(SIGN(ROW($A:$A))*1041, 1071))&
      CHAR(RANDBETWEEN(SIGN(ROW($A:$A))*1041, 1071))&
      CHAR(RANDBETWEEN(SIGN(ROW($A:$A))*1041, 1071))&
      CHAR(RANDBETWEEN(SIGN(ROW($A:$A))*1041, 1071))&
      CHAR(RANDBETWEEN(SIGN(ROW($A:$A))*1041, 1071))&"ᅇ", )), )))), 
 "\ᅇ([Б-Я]+)\ᅇ", "")

0