Google-sheets – Categorize text in Google Sheets based on matching partial text in the Category index

google sheetsvlookup

I'm looking to match, with VLOOKUP, part of the words in a column in the main sheet to an index of keywords/categories where partial words are stored for ease in another sheet.

Here's an example worksheet:
https://docs.google.com/spreadsheets/d/13QrS1nDjTN6oxIoXhvdyoiqAa4lf8x3eSY3566_eTJ8/edit?usp=sharing

  1. The Expenses sheet has the data as we receive it, in columns A to C.
  2. In Column D, we'd like to auto-assign a category to these expenses. If we do this with exact text in the column B, for example, BANK VISA PREAUTH PYMT, then it's easy. The VLOOKUP formula in column D works by getting this precise reference from the 'Categories' tab (cell A1 is worded exactly this way). This formula works: =IFERROR(VLOOKUP(B2,Categories!A:B,2,FALSE),"")
  3. The challenge is that our expenses of a similar type won't always follow the exact text. For example: SEND E-TFR US***ke5 and SEND E-TFR US***xja. As such, we'd like to match just the text SEND E-TFR.

This means the VLOOKUP needs to match partials in the 'Categories' tab first column. How does adjust the VLOOKUP formula to achieve this?

PS. My question is quite similar to this one with the vital distinction of wanting to match substrings.
Categorize cells in Google Spreadsheets

EDIT: Answered.

Best Answer

Thanks to @Ruben for the answer. Eventually we decided to take a simpler approach. Just to check the regexp of the search string in VLOOKUP instead, and have those substrings in the key indexes in Categories tab.

So if an expense has this text in the cell B1:

SEND E-TFR US***ke5
SEND E-TFR US***abc
SEND E-TFR US***xyz
...

Which was causing issues with exact matches, we just truncate the string to the substring that occurs before the second space:

REGEXEXTRACT(B1, "[^\ ]*\ [^\ ]*")

To give us just this:

SEND E-TFR

And our renewed VLOOKUP function:

=IFERROR(VLOOKUP(TRIM(REGEXEXTRACT(B1, "[^\ ]*\ [^\ ]*")),Categories!A:B,2,FALSE), "")

The categories key index then refers to that truncated text for the lookup:

Truncated text to check

Which therefore matches and populates the right category:

Correct match for substring

A test file to see this working is: https://docs.google.com/spreadsheets/d/13QrS1nDjTN6oxIoXhvdyoiqAa4lf8x3eSY3566_eTJ8/edit?usp=sharing

Thanks!