Google Sheets – More Compact Version of Nested SUBSTITUTE Function

google sheetsgoogle-sheets-arrayformulagoogle-sheets-arrays

I want to make about a hundred changes in a column on a spreadsheet. These changes have to be remade each time a new copy of the original comes. The changes are spread over several thousand rows.

The first solution that came to mind was to just use VLookup.
Bad column in A, Good column in B.
Start by copying A to B. Fix the entries in B.
This requires fixing some mistakes multiple times. Ok, Search and replace in a limited range.

This works, but isn't very robust.

If I fixed Eleagnus to Elaeagnus for
Eleagnus commutata and for Eleagnus augustifolia, I have to fix it again when my supplier introduces Eleagnus 'Silverleaf'


I found a partial solution to my make a mass of changes in a column problem.

enter image description here

If I use the yellow formula, it implements the yellow changes.
Nesting that using B10,C10 — the pink formula — adds the changes in pink.
Nesting that in another substitution() applies adds the changes in green.

This gets unwieldy.

At present my solution is messy, and requires at minimum a new tab. Assume the neat stuff starts in Row 4, and we use the first 3 for labels and explanations.

  • Copy the column from the source into column A, use an array formula e.g.

A4=ArrayFormula(SheetX!C4:C)

  • Create the substitute expressions in columns B and C
  • Put the messy substitute array formula in D4
  • If N is the number of substitutions in my formula, then columns E,F are copies of BC, but starting N rows further down. For N=10

E = ArrayFormula(B14:B) F = ArrayFormula (C14:C)

  • Copy D4,E4,F4 to G4,H4,I4
  • Repeat until you have exhausted the expression list.

Copy the last column to where you want it.

Whew

(In passing N=40 is quite reasonable. I create it in a text editor.)

Is there an alternative in google sheets to the nested substitutions.

In essence I want to:

FOR $counter = 4 to N
ArrayFormula(substitute(A4:A, B$counter, C$counter))
End FOR

One fairly straight forward solution is do it in perl. This makes my solution non-portable. Few of my people are perl users.

Is there an elegant way to do this in sheets?

Best Answer

The following method works, and is not too arcane:

SCREENSHOT

A: I defined 3 named ranges, just to make my formulas easier. This sheet is called 'Fix' so the named ranges use Fix as a prefix.

  • FixDataToChange is the stuff with the errors in it.
  • FixRegex (Yellow) are the patterns to use. Most of them are simple text patterns.
  • FixChangeTo is the material to replace each pattern with.

B: The formula in Column D is where the magic happens. Look at the blue box D3

=query(
    {ArrayFormula(regexmatch(A3,FixRegex)),FixRegex,FixChangeTo},
    "select Col2 WHERE Col1 = TRUE")

Line 2 constructs a temporary array looking for each pattern in FixRegex. This constructs a column of mostly false results, but usually just 1 true result. More on that later.

Line 3 brings back the regex to look for.

E3 is exactly the same, but brings back the replacement.

F3=if(ISNA(D3),A3,RegexReplace(A3,D3,E3)) 

If no regex was found, just copy the data over. Otherwise run the replacement. This formula is copied down. You could wrap this in ArrayFormula, but you have to copy columns D and E down anyway. They can't be wrapped. No nesting. I'm open to suggestions on this.

Since these are regexes it's easy to come up with something that matches more than one regex. This causes an error, because the query returns more than one value, it tries to stomp on the cell below.

This is a feature, not a bug. I could have used VLookup instead of Query, but it would have returned the first match to True.

What I did was to search for #REF, then blank out the cell below, find out which multiple patterns had matched, then go back are rewrite one or both of the patterns.

In one case I had cultivar name in smart single quotes. The two were specified as two regexes. Combined them both into a character class. This left me with a blank cell.

That caused all hell to break loose. Matched everything. Unused cells in FixRegex have to be filled with something that is guaranteed not to match.

In practical use, FixRegex and FixChangeTo can be put on a Reference tab, and the intermediate columns can be hidden