Regex – Non greedy (reluctant) regex matching in sed

greedypcreregexregex-greedysed

I'm trying to use sed to clean up lines of URLs to extract just the domain.

So from:

http://www.suepearson.co.uk/product/174/71/3816/

I want:

http://www.suepearson.co.uk/

(either with or without the trailing slash, it doesn't matter)

I have tried:

 sed 's|\(http:\/\/.*?\/\).*|\1|'

and (escaping the non-greedy quantifier)

sed 's|\(http:\/\/.*\?\/\).*|\1|'

but I can not seem to get the non-greedy quantifier (?) to work, so it always ends up matching the whole string.

Best Answer

Neither basic nor extended Posix/GNU regex recognizes the non-greedy quantifier; you need a later regex. Fortunately, Perl regex for this context is pretty easy to get:

perl -pe 's|(http://.*?/).*|\1|'

Related Solutions

How to replace a newline (\n) using sed

sed is intended to be used on line-based input. Although it can do what you need.

A better option here is to use the tr command as follows:

tr '\n' ' ' < input_filename

or remove the newline characters entirely:

tr -d '\n' < input.txt > output.txt

or if you have the GNU version (with its long options)

tr --delete '\n' < input.txt > output.txt

Regex – How to make the match non greedy in vim

Instead of .* use .\{-}.

%s/style=".\{-}"//g

Also, see :help non-greedy

Best Answer

Related Solutions

How to replace a newline (\n) using sed

Regex – How to make the match non greedy in vim

Related Topic