Google-sheets – Remove UTM code from the end of HTML link in Google Sheets

google sheetsworksheet-function

I have a link that I pull in elsewhere. Let's say it looks like this:
https://www.mysite.com/site/interesting-content/name-of-article.html?utm_source=source&utm_campaign=ab&utm_medium=medium&utm_term=paid

I've been able to truncate it down to /site/interesting-content/name-of-article.html?utm_source=source&utm_campaign=ab&utm_medium=medium&utm_term=paid
using a substitute function.

But I'd like to further truncate it down to /site/interesting-content/name-of-article.html

The problem is, there's quite a few UTM campaigns that I can pull links from. So ideally, I'd like to remove anything after .html, without having to specify exactly what needs to be removed.

I tried using a right() function to get a copy of what I want to remove, so that in another cell I could do a substitute function on it to remove the unneeded bits:

=RIGHT(W8,SEARCH(".html",W8))

But it ended up looking like this:
ent/name-of-article.html?utm_source=source&utm_campaign=ab&utm_medium=medium&utm_term=paid

Best Answer

Try replacing with REGEXREPLACE:

=REGEXREPLACE(W8,"\?[^?]*$","")

Explanation

REGEXREPLACE takes string (first parameter), then using pattern (second parameter) finds text that starts with ? until end of string, and replaces it with empty string (third parameter).