Google-sheets – Finding all word pairs from range of strings in Google Sheets

google sheetsgoogle-sheets-arrayformula

Let's say I have a column of values, each of which is a string.

COL A
This is a string
This is longer than that
My favorite is short
But sometimes short is longer
How is a short longer

I would like to extract all of the two word combinations from this range and then count them.
The unique word pairs would be (ignoring capitalization).

this is – 2
is a – 2
a string – 1
is longer -1
longer than -1
than that -1
my favorite -1
favorite is -1
is short – 1 (etc.)
but sometimes
sometimes short
short is
is longer
how is
a short
short longer

I've tried split/join combinations, but I can't figure out how to sequence the discovery of each pair.

Best Answer

See test sheet.

Unique pairs of 2 word from column A are in column D as a result of following formula:

=ARRAYFORMULA(
 UNIQUE(
  FILTER(
  {"#";TRANSPOSE(SPLIT(CONCATENATE(A2:A&" ")," "))}&" "&{TRANSPOSE(SPLIT(CONCATENATE(A2:A&" ")," "));"#"},
  NOT(REGEXMATCH({"#";TRANSPOSE(SPLIT(CONCATENATE(A2:A&" ")," "))}&" "&{TRANSPOSE(SPLIT(CONCATENATE(A2:A&" ")," "));"#"},".*#")))))

Idea is make column of single words and concatenate is with the same column offset by 1 row.

Following part returns column of single words.

TRANSPOSE(SPLIT(CONCATENATE(A2:A&" ")," "))

Then we shift it down, please note that I used "#" symbol for shifting purposes and further filter and if original text contains "#" it won't be included in final result, so you can change that symbol if you need.

{"#";TRANSPOSE(SPLIT(CONCATENATE(A2:A&" ")," "))}

concatenate with other column and filter rows that would contain only one word.