Bash – Explaining sed, grep and cut syntax

bashcygwin

I am analyzing a batch file and there is a line that it edit a text file (input) and making a txt file (output).

The batch is using three helping tools.exe: grep, sed and cut. I tried to read their manual use but it wasn't easy.

The line is:

type input.txt | sed "s#""#'#g" | grep -o "class='name[^>]*" | sed -n "/id=/p" | grep -o "surname=[^>]*" | cut -d"'" -f2 >output.txt

I want to know how the line is interpreted? What are the rules? Is there a smarter way of doing this (for example using one tool instead of all three)?

Best Answer

I'll add to jeb's answer, although it covers most of what you asked.
These three commands are emulated commands ported from Linux, and they do the following:

  1. sed: a stream editor for filtering and transforming text.
  2. grep: a tool for printing lines matching a pattern.
  3. cut: a tool for cutting out selected portions of each line of a file.

I recommend that you read more about these three commands by either typing man <command name> in Linux, or Googling that same string (for instance, "man grep").
Also, look up regular expressions. Though they are usually unclear for beginners, they are a common and compact way for representing patterns.

Regarding the specific usage in the question:

sed "s#""#'#g"

For each line, this replaces any quotation marks ("") with an apostrophes (').

grep -o "class='name[^>]*"

This prints only the part of the line starting with class='name but without a following >.

sed -n "/id=/p"

By default Sed prints every line. On the other hand, sed -n "<some pattern> /p" prints only the lines that match the specified pattern. In this case, Sed prints only the lines containing id=.

grep -o "surname=[^>]*"

This prints only the part of the line that starts with surname=name' but without a following >.

cut -d"'" -f2

This parses each line as successive fields separated by an apostrophe ('), and picks the second one.

Everything is piped, meaning that the output of the each command serves as input for the next command to the right. The contents of "input.txt" are fed into the Sed command, the output of which is then fed into the grep command, and so on. The final output is obviously printed into a new file named "output.txt".

And yes, like jeb mentioned, this looks like an awkward solution, because everything here can be done sed alone, presumably by only one or two commands.