Convert ISO-8859-1 to UTF-8 using groovy


i need to convert a ISO-8859-1 file to utf-8 encoding, without loosing content intormations…

i have a file which looks like this:

<?xml version="1.0" encoding="ISO-8859-1" ?> 
<HelloEncodingWorld>Üöäüßßß Test!!!</HelloEncodingWorld>

Not i want to encode it into UTF-8.
I tried following:

f=new File('c:/temp/myiso88591.xml').getText('ISO-8859-1')
ts=new String(f.getBytes("UTF-8"), "UTF-8")
g=new File('c:/temp/myutf8.xml').write(ts)

didnt work due to String incompatibilities.
Then i read something about bytestreamreaders/writers/streamingmarkupbuilder and other…

then i tried

f=new File('c:/temp/myiso88591.xml').getText('ISO-8859-1')
mb = new groovy.xml.StreamingMarkupBuilder()
mb.encoding = "UTF-8"

new OutputStreamWriter(new FileOutputStream('c:/temp/myutf8.xml'),'utf-8') << mb.bind {
    out << f

this was totally not that what i wanted..

I just want to get the content of an xml read with an ISO-8859-1 reader and then put it into a new (old) file… why this is so complicated :-/

The result should just be, and the file should be really encoded in utf-8:

<?xml version="1.0" encoding="UTF-8" ?> 
<HelloEncodingWorld>Üöäüßßß Test!!!</HelloEncodingWorld>

Thanks for any answers

Best Answer

def f=new File('c:/data/myiso88591.xml').getText('ISO-8859-1')
new File('c:/data/myutf8.xml').write(f,'utf-8')

(I just gave it a try, it works :-)

same as in java: the libraries do the conversion for you... as deceze said: when you specify an encoding, it will be converted to an internal format (utf-16 afaik). When you specify another encoding when you write the string, it will be converted to this encoding.

But if you work with XML, you shouldn't have to worry about the encoding anyway because the XML parser will take care of it. It will read the first characters <?xml and determines the basic encoding from those characters. After that, it is able to read the encoding information from your xml header and use this.

Related Topic