Bash – Explaining sed, grep and cut syntax

bashcygwin

I am analyzing a batch file and there is a line that it edit a text file (input) and making a txt file (output).

The batch is using three helping tools.exe: grep, sed and cut. I tried to read their manual use but it wasn't easy.

The line is:

type input.txt | sed "s#""#'#g" | grep -o "class='name[^>]*" | sed -n "/id=/p" | grep -o "surname=[^>]*" | cut -d"'" -f2 >output.txt

I want to know how the line is interpreted? What are the rules? Is there a smarter way of doing this (for example using one tool instead of all three)?

Best Answer

I'll add to jeb's answer, although it covers most of what you asked.
These three commands are emulated commands ported from Linux, and they do the following:

sed: a stream editor for filtering and transforming text.
grep: a tool for printing lines matching a pattern.
cut: a tool for cutting out selected portions of each line of a file.

I recommend that you read more about these three commands by either typing man <command name> in Linux, or Googling that same string (for instance, "man grep").
Also, look up regular expressions. Though they are usually unclear for beginners, they are a common and compact way for representing patterns.

Regarding the specific usage in the question:

sed "s#""#'#g"

For each line, this replaces any quotation marks ("") with an apostrophes (').

grep -o "class='name[^>]*"

This prints only the part of the line starting with class='name but without a following >.

sed -n "/id=/p"

By default Sed prints every line. On the other hand, sed -n "<some pattern> /p" prints only the lines that match the specified pattern. In this case, Sed prints only the lines containing id=.

grep -o "surname=[^>]*"

This prints only the part of the line that starts with surname=name' but without a following >.

cut -d"'" -f2

This parses each line as successive fields separated by an apostrophe ('), and picks the second one.

Everything is piped, meaning that the output of the each command serves as input for the next command to the right. The contents of "input.txt" are fed into the Sed command, the output of which is then fed into the grep command, and so on. The final output is obviously printed into a new file named "output.txt".

And yes, like jeb mentioned, this looks like an awkward solution, because everything here can be done sed alone, presumably by only one or two commands.

Related Solutions

Linux – Quick unix command to display specific lines in the middle of a file

I found two other solutions if you know the line number but nothing else (no grep possible):

Assuming you need lines 20 to 40,

sed -n '20,40p;41q' file_name

awk 'FNR>=20 && FNR<=40' file_name

When using sed it is more efficient to quit processing after having printed the last line than continue processing until the end of the file. This is especially important in the case of large files and printing lines at the beginning. In order to do so, the sed command above introduces the instruction 41q in order to stop processing after line 41 because in the example we are interested in lines 20-40 only. You will need to change the 41 to whatever the last line you are interested in is, plus one.

Windows – the difference between Cygwin and MinGW

As a simplification, it's like this:

Compile something in Cygwin and you are compiling it for Cygwin.
Compile something in MinGW and you are compiling it for Windows.

About Cygwin

Cygwin is a compatibility layer that makes it easy to port simple Unix-based applications to Windows, by emulating many of the basic interfaces that Unix-based operating systems provide, such as pipes, Unix-style file and directory access, and so on as documented by the POSIX standards. If you have existing source code that uses these interfaces, you may be able to compile it for use with Cygwin after making very few or even no changes, greatly simplifying the process of porting simple IO based Unix code for use on Windows.

When you distribute your software, the recipient will need to run it along with the Cygwin run-time environment (provided by the file cygwin1.dll). You may distribute this with your software, but your software will have to comply with its open source license. Even just linking your software with it, but distributing the dll separately, can still impose license restrictions on your code.

About MinGW

MinGW aims to simply be a port of GNU's development tools for Windows. It does not attempt to emulate or provide comprehensive compatibility with Unix, other that to provide a version of the GNU Compiler Collection, GNU Binutils and GNU Debugger that can be used natively in Windows. It also includes header files allowing the use of Windows' native API in your code.

As a result your application needs to specifically be programmed for Windows, using the Windows API, which may mean significant alteration if it was created to rely on being run in a standard Unix environment and use Unix-specific features. By default, code compiled in MinGW's GCC will compile to a native Windows X86 target, including .exe and .dll files, though you could also cross-compile with the right settings, since you are basically using the GNU compiler tools suite.

MinGW is a free and open source alternative to using the Microsoft Visual C++ compiler and its associated linking/make tools on Windows. It may be possible in some cases to use MinGW to compile something that was intended for compiling with Microsoft Visual C++ without too many modifications.

Even though MingW includes some header files and interface code allowing your code to interact with the Windows API, as with the regular standard libraries this doesn't impose licensing restrictions on software you have created.

Other considerations

For any non-trivial software application, such as one that uses a graphical interface, multimedia or accesses devices on the system, you leave the boundary of what Cygwin can do for you and further work will be needed to make your code cross-platform. But, this task can be simplified by using cross-platform toolkits or frameworks that allow coding once and having your code compile successfully for any platform. If you use such a framework from the start, you can not only reduce your headaches when it comes time to port to another platform but you can use the same graphical widgets - windows, menus and controls - across all platforms if you're writing a GUI app, and have them appear native to the user.

For instance, the open source Qt framework is a popular and comprehensive cross-platform development framework, allowing the building of graphical applications that work across operating systems including windows. There are other such frameworks too. In addition to the large frameworks there are thousands of more specialized software libraries in existence which support multiple platforms allowing you to worry less about writing different code for different platforms.

When you are developing cross-platform software from the start, you would not normally have any reason to use Cygwin. When compiled on Windows, you would usually aim to make your code able to be compiled with either MingW or Microsoft Visual C/C++, or both. When compiling on Linux/*nix, you'd most often compile it with the GNU compilers and tools directly.

Best Answer

Related Solutions

Linux – Quick unix command to display specific lines in the middle of a file

Windows – the difference between Cygwin and MinGW

Related Topic