Google-sheets – Sum column value if any of several criteria is met in the adjacent column

google sheetsgoogle-sheets-arrayformulagoogle-sheets-arrays

I have a sheet in which I have 2441 words (hashtags) on their own rows in the first column and in the next (adjacent) column, a number related to them (occurrences of the hashtag in our data).

I want to sum together the occurrences of hashtags, which mention any word from a certain list. The word can be either as part of the hashtag or the whole hashtag for the number in the adjacent column to be counted.

I searched the forum and found this helpful thread I tried to incorporate, but unfortunately I am still doing something wrong.

My current formula is:

=ArrayFormula(sum(sumif(A2:A20,{"*business*"; "*entrepeneur*";"*shop*";"*biz*";"*marketing*";"*hustle*"},B2:B20)))

As you can see from the screenshot, the formula doesn't sum all of the instances from the given range. Have I formatted something in the formula wrong? Or well, I obviously have, but I wonder which part it is.

Sheet screenshot


Sample data – text format

Hashtag Count
internationalwomensday 1148
womensupportingwomen 81
womeninbusiness 77
girlpower 72
womenempowerment 69
bossbabe 50
women 48
love 47
womensday 47
smallbusiness 40
entrepreneur 39
strongwomen 36
fashion 34
iwd2021 33
motivation 33
femaleentreprencur 26
beauty 25
bosslady 24

Best Answer

Please note the comments made directly to your original post, noting that you have spelling errors in both the formula and the Col-A data.

However, assuming that your spelling were correct in Column A, you could use this in F1:

=SUM(FILTER(B2:B,NOT(ISERROR(REGEXEXTRACT(A2:A,"business|entrepreneur|shop|biz|marketing|hustle")))))

Here, SUM acts only on the FILTERed subset where REGEXEXTRACT was able to find any of the patters found between the 'or' pipes (e.g., "x|y|z|...").

If you think there may be spelling errors, you can reduce some words (like "entrepreneur") to a unique portion of the word that you think will most likely be spelled correctly in all of (or the majority of) the raw data: perhaps "neur" or "entre". But this method would still require that you spell that portion correctly in the formula.