I've got a bunch of csv
files that I'm reading into R and including in a package/data folder in .rdata
format. Unfortunately the non-ASCII characters in the data fail the check. The tools
package has two functions to check for non-ASCII characters (showNonASCII
and showNonASCIIfile
) but I can't seem to locate one to remove/clean them.
Before I explore other UNIX tools, it would be great to do this all in R so I can maintain a complete workflow from raw data to final product. Are there any existing packages/functions to help me get rid of the non-ASCII characters?
Best Answer
These days, a slightly better approach is to use the stringi package which provides a function for general unicode conversion. This allows you to preserve the original text as much as possible: