Regex – Regular Expression with findstr (ms-dos)

dosregex

I am trying to use ms-dos command findstr to find a string and eliminate it from the file.

At the moment I can find an explicit string but I am really struggling with regular expressions.
The file looks something like the below:

PLs - TULIP Report  
Output_Format, PLS - TULIP REPORT  
NUMLINES,    110907
VARIABLE_TYPES,T1,T8,I,T9,T2,N,N,N  
[[data below]]

The file is an export from some system and annoyingly has that header in it – so I would like to clean it before using SQL Loader to bring it into an Oracle database.

There's more than just the one file and all would have the same type of header but ever so slightly different in every file.
Although I am happy to first remove the first 2 lines using hardcoded values, e.g.:

findstr /v "PLs - TULIP Report" "c:\myfiles\file1.PRO"  > "c:\myfiles\file1.csv"</code><br>
findstr /v "Output_Format, PLS - TULIP REPORT" "c:\myfiles\file1.csv" > "c:\myfiles\file2.csv"

(note how I do that in 2 steps – any suggestions to make this happen in a single step, would be massivelly appreciated)

The third line is mnore complicated for me, it will always be in that format:

NUMLINES,    110907

except that the number at the end would be different for each file. So how do I get to find this entire line using a regular expression? I have tried:

findstr /v /b /r "\D+ \s+ \d+"

but without any luck.

FYI, the data in [[data below]] looks like

*,"00000161",456823,"017896532","FU",23.95,3.34,20.61

etc ..
Obviously, I do not want to modify the data area.

I hope the above makes sense,

Thanks

Best Answer

You must exclude single lines, findstr cannot match multiple lines. Just separate the different regexes with a space

findstr /r /b /v "NUMLINES PLs Output_Format" *.txt 
                  ^regex1  ^2  ^3

Specifying /b allows you to find matches only at the beginning of the lines and /v excludes those lines.

EDIT:

Of course the usage is

 findstr /r /b /v "NUMLINES PLs Output_Format" yourfile > yourtarget

And in yourtarget you will find the data of yourfile except the lines excluded by the regex.

EDIT 2:

Based on your comments you need just to add VARIABLE_TYPES to your regex making it

findstr /r /b /v "NUMLINES PLs Output_Format VARIABLE_TYPES" yourfile > yourtarget

This is the way to complete the whole operation in one single instruction.