Linux – Removing Windows newlines on Linux (sed vs. awk)

awklinuxsed

Have some delimited files with improperly placed newline characters in the middle of fields (not line ends), appearing as ^M in Vim. They originate from freebcp (on Centos 6) exports of a MSSQL database. Dumping the data in hex shows \r\n patterns:

$ xxd test.txt | grep 0d0a
0000190: 3932 3139 322d 3239 3836 0d0a 0d0a 7c43

I can remove them with awk, but am unable to do the same with sed.

This works in awk, removing the line breaks completely:

awk 'gsub(/\r/,""){printf $0;next}{print}'

But this in sed does not, leaving line feeds in place:

sed -i 's/\r//g'

where this appears to have no effect:

sed -i 's/\r\n//g'

Using ^M in the sed expression (ctrl+v, ctrl+m) also does not seem to work.

For this sort of task, sed is easier to grok, but I am working on learning more about both. Am I using sed improperly, or is there a limitation?

Best Answer

You can use the command line tool dos2unix

dos2unix input

Or use the tr command:

tr -d '\r' <input >output

Actually, you can do the file-format switching in vim:

Method A:
:e ++ff=dos
:w ++ff=unix
:e!
Method B:
:e ++ff=dos
:set ff=unix
:w

EDIT

If you want to delete the \r\n sequences in the file, try these commands in vim:

:e ++ff=unix           " <-- make sure open with UNIX format
:%s/\r\n//g            " <-- remove all \r\n
:w                     " <-- save file

Your awk solution works fine. Another two sed solutions:

sed '1h;1!H;$!d;${g;s/\r\n//g}' input
sed ':A;/\r$/{N;bA};s/\r\n//g' input