I have a python script that exports data from a mysql utf-8 table into an textfile. Here is the code that does the job
csvDatei = codecs.open( csvDateiName, "w", "utf-8" )
...
cursor = db.cursor();
sql = "select * from %s.%s;" % (dbAusgang, tabelle)
cursor.execute(sql);
...
daten = cursor.fetchall();
for i in xrange(len(daten)):
line = '';
for j in xrange(len(daten[i])):
line += '"%s";' % unicode(daten[i][j]);
line = line[:-1];
line += '\n';
csvDatei.write(line);
csvDatei.close();
I have also tried this
line += '"%s";' % str(daten[i][j]);
and
line += '"%s";' % daten[i][j];
And now the part I do not understand:
Normally this script should be invoked by a cron job. But when I read a varchar from a table that contains an umlaut like ä,ö or ü the script simply terminates. I checked that by piping the output of the script into a file.
Therefore I tested the script by invoking it manually on the shell by simply typing "python myscript.py" and it runs perfectly fine without any problems.
So my guess is that the problem does not really lie in the script itself but rather in the cron enviroment somehow.
Hopefully anybody of you can give me advice. I'm completly confused.
Any help is appreciated.
—————- Answer to comment 1:
Thanks for the hint with the locale.
First I wrote "locale" on the standard shell. It gave me the following output:
dhl@srv1093:~$ locale
LANG=de_DE.UTF-8
LC_CTYPE="de_DE.UTF-8"
LC_NUMERIC="de_DE.UTF-8"
LC_TIME="de_DE.UTF-8"
LC_COLLATE="de_DE.UTF-8"
LC_MONETARY="de_DE.UTF-8"
LC_MESSAGES="de_DE.UTF-8"
LC_PAPER="de_DE.UTF-8"
LC_NAME="de_DE.UTF-8"
LC_ADDRESS="de_DE.UTF-8"
LC_TELEPHONE="de_DE.UTF-8"
LC_MEASUREMENT="de_DE.UTF-8"
LC_IDENTIFICATION="de_DE.UTF-8"
LC_ALL=de_DE.UTF-8
Then I edited the cron file with "crontab -e" and added the folloing line
*/1 * * * * locale > /home/user/locale.ouput
The output of this cronjob is:
dhl@srv1093:~$ cat locale.ouput
LANG=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=
So, this might be the problem? How can I fix that?
Best Answer
You decode the lines to unicode with
unicode(daten[i][j])
. When you give no encoding then Python uses the system default, which is probably ascii when you run the script through cron.In either way, you must give the actual encoding used by the database. You can use
unicode(daten[i][j], dbencoding)
instead, or get your database adapter to give you unicode directly.Btw: There are probably a million tools that generate cvs files from database queries, MySQL has that even built-in. Your code on the other hand is rather fragile because you do no escaping whatsoever.