Powershell 2: How to strip a specific character from a body of ASCII text

asciipowershellregular expressionsunicode

I am trying to strip odd characters from strings using PowerShell. I used the following output to attempt to learn on my own:

get-help about_regular_expressions

I am trying to take a string that is mostly ASCII, but that has one anomalous character that needs to be removed. (The registered trademark symbol; the R with a circle around it.) I'd like to strip any occurrence of that character out of a string, leaving everything else intact. What is the cleanest expression to accomplish this using PowerShell 2.0?

[EDIT]

I have done a little further digging, and I believe the problem is stemming from the Import-CSV call I'm using.

When I cut-and-paste this symbol from within notepad into the PS prompt, and assign it to a string, I match just fine:

# This code yields 'True'
$string -match "\u00ae"

However, when I use Import-CSV on a CSV file where one of the fields contains the special symbol, I believe somehow the raw bytes are getting converted, because doing something like this doesn't work:

# This code yields 'False'
$source = Import-CSV -path testing.csv
# The following extracts the entry / line containing the special symbol that was
# copy-and-pasted above
$culprit = $source[5].COMMITTEE_NAME
$culprit -match "\u00ae"

However, the following DOES work:

# This yields True
$filedata = get-content testing.csv
$filedata[6] -match "\u00ae"

So I think my followup question to all of this is:

How can I keep the strings intact through the import-csv call so that calls to -match for the individual fields will still work?

Best Answer

It's important to note that the console PS doesn't display Unicode well. You'll have to use the ISE to "see" what's happening. Have a look at this related SO question for some additional reading. You can use the ® character in PS, regardless, if you don't need to watch the script in-action.

In the ISE:

PS C:\Users\jscott> $string = "This string contains the ® character"
PS C:\Users\jscott> $string
This string contains the ® character

PS C:\Users\jscott> $string.Replace("®","")
This string contains the  character

PS C:\Users\jscott> $string ="This ® string ® contains ® many ® characters ®®®®"
PS C:\Users\jscott> $string
This ® string ® contains ® many ® characters ®®®®

PS C:\Users\jscott> $string.Replace("®","")
This  string  contains  many  characters 

To use character code instead of the literal:

PS C:\Users\jscott> $string.Replace("$([char]0x00AE)","")

Per your question update:

You need to convert the ASCII file to Unicode/UTF8 before running it through Import-Csv -- I didn't realize you were using this. Have all look at this and this for other examples.

You may just want to pipe the initial CSV file thought Get-Content or Export-Csv -Encoding Unicode to pre-process the file and make life easier.