Python – xml.parsers.expat.ExpatError on parsing XML

pythonxml

I am trying to parse XML with Python but not getting very far. I think it's due to wrong XML tree this API returns.

So this is what is returned by the GET request:

<codigo>3</codigo><valor></valor><operador>Dummy</operador>

The GET request goes here:

http://69.36.9.147:8090/clientes/SMS_API_OUT.jsp?codigo=ABCDEFGH&cliente=XX

This is the Python code I am using without any luck:

import urllib
from xml.dom import minidom

url = urllib.urlopen('http://69.36.9.147:8090/clientes/SMS_API_OUT.jsp?codigo=ABCDEFGH&cliente=XX')
xml = minidom.parse(url)
code = doc.getElementsByTagName('codigo')

print code[0].data

And this is the response I get:

xml.parsers.expat.ExpatError: junk after document element: line 1, column 18

What I need to do is retrieve the value inside the <codigo> element and place it in a variable (same for the others).

Best Answer

The main problem here is that the XML code being returned by that service doesn't include a root node, which is invalid. I fixed this by simply wrapping the output in a <root> node.

import urllib
from xml.etree import ElementTree

url = 'http://69.36.9.147:8090/clientes/SMS_API_OUT.jsp?codigo=ABCDEFGH&cliente=XX'
xmldata = '<root>' + urllib.urlopen(url).read() + '</root>'
tree = ElementTree.fromstring(xmldata)
codigo = tree.find('codigo').text

print codigo

You can use whatever parser you wish, but here I used ElementTree to get the value.