Powershell – Using powershell to clean up text file

powershell

Been struggling with this and wondering if someone can help. I have a large text file that have extra data in it I want to strip out. Here is a sample of the input file:

Text In Page - 1

S
Dept
l<m RKB)
"1915
slightly 234234
"sil dsf 56
"gr
gl
1920 100
1925 100
1930 100 Cls
"1935 100 Cl


Text In Page - 2

l<m RKB)
"1915
slightly
"sil
"gr
glauc
1920 100
1925 100
1930 100 Cls
"1935 100 Cl

I want to remove the following:

  • Any blank lines
  • Any " at the beginning of lines
  • Any lines that begin with a letter A-Z, a-z

So with the above example I'd be left with

1915
1920 100
1925 100
1930 100 Cls
1935 100 Cl
1915
1920 100
1925 100
1930 100 Cls
1935 100 Cl

Best Answer

I'm thinking:

(gc D:\test.txt) -replace '^"' | sls '\S' | sls -NotMatch '^[A-Za-z]' | sc out.txt

Which does:

  • get the lines of the file, and if the first character is a quote, replace it with nothing
  • select lines which match "not whitespace" (i.e. empty lines get filtered out)
  • select lines which don't start with A-Za-z
  • writes the results to out.txt

There are various ways to write the long version depending on how much you like chaining things with the pipeline versus working with variables over and over, but it's doing this:

$lines = Get-Content D:\test.txt
$lines = $lines -replace '^"'
$lines = $lines | Select-String '\S'
$lines = $lines | Select-String -NotMatch '^[A-Za-z]'
$lines | Set-Content out.txt