Google Sheets REGEX Challenge – Mastering REGEX in Google Sheets

google sheetsregexregexextract

I have a long Google Sheet column of cells with content looking like this:

#text: Jacob Zuma
@domain: https://www.theguardian.com/world/zuma

#text: World news
@domain: https://www.theguardian.com/world/world

#text: ANC (African National Congress)
@domain: https://www.theguardian.com/world/anc-african-national-congress

#text: South Africa
@domain: https://www.theguardian.com/world/southafrica

#text: Africa
@domain: https://www.theguardian.com/world/africa

I need it to look like this:

Jacob Zuma, World news, ANC (African National Congress), South Africa, Africa

Basically, i'm trying to find a way to get rid of all the headers (#text+@domain) and the URLs, and have only the subjects themselves, all lined-up, separated with commas.

Would appreciate help in figuring out how to do it.

Best Answer

Here is a possible solution.

First, I am assuming that your data is all in column a of the sheet and each two lines of text seperated by a blank line represent one cell. Also I am assuming that the pattern in column a repeats exactly after 5 cells.

  1. Paste the following formula down column b:

    =trim(REGEXEXTRACT(A1,"(?:\s)[^\n]*"))

  2. Paste the following formula down column c:

    =JOIN(", ",indirect(ADDRESS((row()-1)*5+1,2)&":"&ADDRESS((row())*5,2)))

Your list of desired texts will be in column c.