How to make imp use right charachter set

charsetimportoracle

I am trying to import a dmp file from one database to another. The problem is that some special danish characters are not imported right because some character map conversion is done during import.

This is the info when I start the import with imp (see the note about possible character conversion):

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

Export file created by EXPORT:V10.02.01 via conventional path

import done in WE8ISO8859P1 character set and AL16UTF16 NCHAR character set
import server uses AL32UTF8 character set (possible charset conversion)
. importing USERNAME's objects into USERNAME    
. . importing table                      "TABLE2" 
IMP-00019: row rejected due to ORACLE error 12899
IMP-00003: ORACLE error 12899 encountered
ORA-12899: værdi er for stor for kolonnen "USERNAME"."TABLE2"."NAME" (faktisk: 32, maksimum: 30)
Column 1 408261
Column 2 KUBEN FÆLLES MÅLER            

Each special character (typically Æ, Ø and Å) are using 2 charactes instead of 1, and the data are padded with spaces to fill up, so this causes an error that the data contains 31 caharacters for this field, and the field is defined as size 30.

The export that generated the dmp file was executed like this:

SET CHARACTERSET=WE8PC850
SET NLS_LANG=DANISH
SET NLS_NUMERIC_CHARACTERS=.,
exp username/password@server1 dumpfile.dmp

The import was executed like this:
How do I make this work correctly?

SET CHARACTERSET=WE8PC850
SET NLS_LANG=DANISH
SET NLS_NUMERIC_CHARACTERS=.,
imp username/password@server2 fromuser=username touser=username commit=y ignore=y file=dumpfile.dmp log=dumpfile.log

What can be done to get this right? It seems like the imp ignores the charset used, as the import server uses AL32UTF8 (which is wrong).

Best Answer

You are importing data into a database using the AL32UTF8 characterset (this is the default). The importer must therefore convert the characters to fit that and some will become multi-byte characters as you have found. There are two ways to deal with this:

1) If you don't need Unicode in the new database recreate it in the same characterset as the old database. Run this code on the old database to get the characterset and use that

SELECT parameter, value
FROM nls_database_parameters
WHERE parameter
LIKE '%CHARACTERSET';

Once the new and old database have the same characterset the import won't have to do the conversion.

2) If you can pre-create the tables, you could use the NLS_LENGTH_SEMANTICS parameter. If you set that to CHAR rather than the default of BYTE, a VARCHAR2(5) will be allocated enough space to store 5 characters in the database character set (potentially up to 20 bytes) rather than 5 bytes (which could allow just 1 character). Or instead you could modify the table creation DDL to add CHAR to every VARCHAR2 column declaration. e.g.

CREATE TABLE xyz (column_x VARCHAR2(10 CHAR) NOT NULL);

This way you can convert the data to Unicode and have a better characterset going forward, and so long as your application can support it this is perhaps the preferred method.