Python – How to parse XML file from European Central Bank with Python

elementtreepythonpython-3.xxml

I am trying to parse an XML file from the European Central Bank with the Euro rates.
Unfortunatly I get stuck with parsing the XML file. When I remove the difficult part (everything related with "gesmes") I have no problem iterating through the "Cube" elements but I am not able to deal with the "gesmes" part of the xml file.
I used the ElementTree API for this.

Sample XML file: http://www.ecb.int/stats/eurofxref/eurofxref-daily.xml

<?xml version="1.0" encoding="UTF-8"?>
<gesmes:Envelope xmlns:gesmes="http://www.gesmes.org/xml/2002-08-01" xmlns="http://www.ecb.int/vocabulary/2002-08-01/eurofxref">
    <gesmes:subject>Reference rates</gesmes:subject>
    <gesmes:Sender>
        <gesmes:name>European Central Bank</gesmes:name>
    </gesmes:Sender>
    <Cube>
        <Cube time='2013-06-21'>
            <Cube currency='USD' rate='1.3180'/>
            <Cube currency='JPY' rate='128.66'/>
            <Cube currency='BGN' rate='1.9558'/>
            <Cube currency='CZK' rate='25.825'/>
            <Cube currency='DKK' rate='7.4582'/>
            <Cube currency='GBP' rate='0.85330'/>
            <Cube currency='HUF' rate='298.87'/>
            <Cube currency='LTL' rate='3.4528'/>
            <Cube currency='LVL' rate='0.7016'/>
            <Cube currency='PLN' rate='4.3289'/>
            <Cube currency='RON' rate='4.5350'/>
            <Cube currency='SEK' rate='8.6927'/>
            <Cube currency='CHF' rate='1.2257'/>
            <Cube currency='NOK' rate='7.9090'/>
            <Cube currency='HRK' rate='7.4905'/>
            <Cube currency='RUB' rate='43.2260'/>
            <Cube currency='TRY' rate='2.5515'/>
            <Cube currency='AUD' rate='1.4296'/>
            <Cube currency='BRL' rate='2.9737'/>
            <Cube currency='CAD' rate='1.3705'/>
            <Cube currency='CNY' rate='8.0832'/>
            <Cube currency='HKD' rate='10.2239'/>
            <Cube currency='IDR' rate='13088.24'/>
            <Cube currency='ILS' rate='4.7891'/>
            <Cube currency='INR' rate='78.1200'/>
            <Cube currency='KRW' rate='1521.52'/>
            <Cube currency='MXN' rate='17.5558'/>
            <Cube currency='MYR' rate='4.2222'/>
            <Cube currency='NZD' rate='1.7004'/>
            <Cube currency='PHP' rate='57.707'/>
            <Cube currency='SGD' rate='1.6790'/>
            <Cube currency='THB' rate='41.003'/>
            <Cube currency='ZAR' rate='13.4906'/>
        </Cube>
    </Cube>
</gesmes:Envelope>

What I want is to search for a specific currency (from users input) and get the rate back so I can use the result.

Best Answer

You have a namespaced XML file. ElementTree is not too smart about namespaces. You need to give the .find(), findall() and iterfind() methods an explicit namespace dictionary. This is not documented very well:

namespaces = {'ex': 'http://www.ecb.int/vocabulary/2002-08-01/eurofxref'} # add more as needed

for cube in root.findall('.//ex:Cube[@currency]', namespaces=namespaces):
    print(cube.attrib['currency'], cube.attrib['rate'])

This uses a simple XPath query; './/' means find any child tag, ex:Cube limits the search to the <Cube> tags in the namespace labeled with the ex prefix (from the namespaces mapping) and [@currency] limits the search to elements that have a currency attribute.

Demo:

>>> import requests
>>> r = requests.get('http://www.ecb.int/stats/eurofxref/eurofxref-daily.xml', stream=True)
>>> from xml.etree import ElementTree as ET
>>> tree = ET.parse(r.raw)
>>> root = tree.getroot()
>>> namespaces = {'ex': 'http://www.ecb.int/vocabulary/2002-08-01/eurofxref'}
>>> for cube in root.findall('.//ex:Cube[@currency]', namespaces=namespaces):
...     print(cube.attrib['currency'], cube.attrib['rate'])
... 
USD 1.3180
JPY 128.66
BGN 1.9558
CZK 25.825
DKK 7.4582
GBP 0.85330
HUF 298.87
LTL 3.4528
LVL 0.7016
PLN 4.3289
RON 4.5350
SEK 8.6927
CHF 1.2257
NOK 7.9090
HRK 7.4905
RUB 43.2260
TRY 2.5515
AUD 1.4296
BRL 2.9737
CAD 1.3705
CNY 8.0832
HKD 10.2239
IDR 13088.24
ILS 4.7891
INR 78.1200
KRW 1521.52
MXN 17.5558
MYR 4.2222
NZD 1.7004
PHP 57.707
SGD 1.6790
THB 41.003
ZAR 13.4906

You can use this information to search for the specific rate too; either build a dictionary, or search the XML document directly for matching currencies:

currency = input('What currency are you looking for? ')
match = root.find('.//ex:Cube[@currency="{}"]'.format(currency.upper()), namespaces=namespaces)
if match is not None:
    print('The rate for {} is {}'.format(currency, match.attrib['rate']))
Related Topic