Google-sheets – How to get most frequent terms from a set of lists

google sheetsgoogle-sheets-charts

I have a list on Google Spreadsheets. Each line is a mission, and one of the columns indicates who I've worked together on the mission (which is between 1 and 3 people). The format is "Aquaman, Batman, Cyclops", but I can change it if needed.

Is there a way to use the plot/chart tool to create a graph which shows the frequency of each collaborator (essentially a histogram, perhaps sorted by frequency)?

In case a graphical representation is not possible, a textual representation will do, e.g., a sorted list of all collaborators, with the number of collaborations next to each person's name.

I'll be very glad to hear any ideas!

Best Answer

The comma-separated representation is really inconvenient in a spreadsheet. So the first thing I'd do is to split it. Suppose column A has the data; put =split(A1,",") in cell B1 and copy down. This fills some of columns B,C,D with the names.

Then you probably want a single column (E) listing each name only once. This can be achieved with

=unique({filter(B:B, len(B:B)); filter(C:C, len(C:C)); filter(D:D, len(D:D))})

Then you need to count how many times each appeared. Like this:

=countif(B$1:D,E1)

copied down the F column.

Now you have a table with names and their frequencies. This can already be plotted. But if you want to sort by decreasing frequency, use

=sort(E:F, 2, False)

which sorts by the 2nd column of the range, descending.