I have a Google spreadsheet with items and prices. Column D
has the prices, column C
has item names, some of which contain multiple words. When an item has multiple words, only the first one is really significant.
What I'd like to achieve is this:
- Process the item list to strip off anything after the non-alpha character,
- SORT(UNIQUE()) the results,
- SUM() all the items that were matched to create this new list.
I will then generate a pie chart (this is food, after all) with the results. So for example, I might have:
1 C | D
-------------------+------------
2 Cheese | $9.08
3 Milk, 2L | $3.89
4 Avocado | $0.99
5 Avocado 3/$5 | $5.00
6 Apples Fuji | $2.75
7 Avocado | $1.49
8 Milk 4L | $4.75
9 Apples | $2.25
The end result will be:
Cheese | $9.08
Milk | $8.64
Avocado | $7.48
Apples | $5.00
I know that I can use REGEXEXTRACT()
to parse an individual field:
=REGEXEXTRACT(C3; "^([A-Za-z]+)")
returns Milk
, as I would expect. As well, I know that I can add up totals for a column using FILTER()
:
L | M
--------------------+----------
=UNIQUE( C2:C ) | =SUM( FILTER( D$2:D; L2=C$2:C ) )
=CONTINUE(C2, 2, 1) | =SUM( FILTER( D$2:D; L3=C$2:C ) )
...
But I want to combine these, and add up the totals of the items as modified by REGEXEXTRACT()
. I was hoping I'd be able to do something like:
=UNIQUE( REGEXEXTRACT( C2:C; "^([A-Za-z]+)" ) )
and
=SUM( FILTER( D$2:D; L2=REGEXEXTRACT(C$2:C, "^([A-Za-z]+)") ) )
But when presented with a range, the REGEXEXTRACT()
function delivers just a string, not an array. This is confirmed, because when I try the formula:
=REGEXEXTRACT(UNIQUE(C2:C); "^([A-Za-z]+)")
the error I get is "Not a string: Cheese".
So my question is … How do I make this work?
Do I need to make a "scratch" worksheet as a bridge? Is it possible to REGEXEXTRACT()
each element of a UNIQUE()
result set within a formula?
Best Answer
My approach would be to "normalise" the data first:
=ArrayFormula(IF({1,0};IFERROR(REGEXEXTRACT(A2:A;"^([a-zA-Z]+)"));B2:B))
and then you can actually wrap this directly in a QUERY function to produce the desired output:
=ArrayFormula(QUERY(IF({1,0};IFERROR(REGEXEXTRACT(A2:A;"^([a-zA-Z]+)"));B2:B);"select Col1, sum(Col2) where Col1 != '' group by Col1 label sum(Col2) ''";0))
You can even output a header row as well:
=ArrayFormula(QUERY(IF({1,0};IFERROR(REGEXEXTRACT(A2:A;"^([a-zA-Z]+)"));B2:B);"select Col1, sum(Col2) where Col1 != '' group by Col1 label Col1 'Food Item', sum(Col2) 'Total'";0))