C# – Need regex (using C#) to condense all whitepace into single whitespaces

chtmlregex

I need to replace multiple whitespaces into a single whitespace (per iteration) in a document. Doesn't matter whether they are spaces, tabs or newlines, any combination of any kind of whitespace needs to be truncated to a single whitespace.

Let's say we have the string: "Hello,\t \t\n  \t    \n world", (where \t and \n represent tabs and newlines respectively) then I'd need it to become "Hello, world".

I'm so completely bewildered by regex more generally that I ended up just asking.

Considerations:

  • I have no control over the document, since it could be any document on the internet.

  • I'm using C#, so if anyone knows how to do this in C# specifically, that would be even more awesome.

  • I don't really have to use regex (before someone asks), but I figured it's probably the optimal way, since regex is designed for this sort of stuff, and my own strpos/str_replace/substr soup would probably not perform as well. Performance is important on this one so what I'm essentially looking for is an efficient way to do this to any random text file on the internet (remember, I can't predict the size!).

Thanks in advance!
– Helgi

Best Answer

newString = Regex.Replace(oldString, @"\s+", " ");

The "\s" is a regex character class for any whitespace character, and the + means "one or more". It replaces each occurence with a simple space character.