How can I get a subrange of second to last element in a formula under Google Sheets?
I'm trying to extract all occurrences of https://example.com/*
in a cell which may contain multiple URLs scattered across a string like:
abc https://example.com/abc https://google.com/ghi https://example.com/def/abc
so that I get the paths in a single string, separated by spaces as follows:
abc def/abc
As SPLIT looks the easiest way for that, I need to get rid of the first item in a formula like:
=join("
",
arrayformula(
regexextract(
split(A5, "https://example.com/", false, true),
"[a-zA-Z0-9/\?=_]+")
)
)
Best Answer
You have the basic concept right just piece it together step by step. This does the job:
split
the string at whitespaces, this is safe because URLs don't contain white space so you're definitely not breaking them up. This gives you an array of stringsexample.com
youREGEXEXTRACT
exactly those paths and only those paths using a capturing group(\S+)
.IFERROR
to simply replace those strings with empty strings, we don't care about those.ARRAYFORMULA
since the output of split is an array. So now we have an array of empty strings and paths.JOIN
them together using white space, again, paths don't contain white space so we're not losing any structure.REGEXREPLACE
. Done!The result is a string of all the paths for the correct domain only, separated by one whitespace each.
Tip: I built this formula step by step, not as one big formula but piecewise in exactly the sequence outlined above to keep things manageable.