SQL Performance – Internationalization Through SQL Database and Performance Issues

internationalizationperformancesql

I'm using .Net technologies and because ASP.Net "instant translation" is not really easy to do with ResX, since it has to be compiled after every change, there are a few hacks available (http://www.onpreinit.com/2009/06/updatable-aspnet-resx-resource-provider.html), but as it is, they're hacks… and they seem to have problems.

So I decided a while ago (in 2006, before I even saw solutions like the previous link) that I was going to set all the translation into an SQL Database, and cache all the translation needed in memory for performance.

The real problem begins when I want to do joins betweens table with multiple fields that are translated. Basically what I do is to store the ID of the translation inside each field, so I can add or edit translations into languages independently.

For example (it's not real but should show the concept):

ITEM(ItemID, TranslationReference_Name, TranslationReference_Use, ...)
TRANSLATION(TranslationReference, IDLanguage, Content)

After that as you can guess, when I want a SQL Query to contain I need to do joins
And this is where it gets dirty, because I did not find any other way to do it, and for each field of ITEM that needs a translation into a specific IDLanguage, I need to do a join.

This is bad for performance, because the TRANSLATION table contains thousands of lines. So each time I do need to do a full join for every column that needs a translation.

Plus, the other problem is that I'd like to have the basic english translation if a specific language has not been translated yet, so as you can guess it becomes much more dirty to code…

I have no idea how to improve it. Should I simply give up the database joins idea and translate each result of the database on my .Net code, refering to a dictionnary (that I already have) containing the translation, so it will require no actual joins?

And this is only a few of the drawbacks of that solution… because in real use, I need users to upload their own dynamically translated content, and manage versions of these translations.

Keep in mind that I do not want to have N rows (N being the number of languages that this record is available) for the "ITEM" table, I know I could set 1 row by language, and have its translated content in nvarchar inside it, but that's not what I want because I need the same ID for every record, independant to the language it can be translated into.

I've searched accross the web and I did not find any good article about this.

I talked to another developper that had several languages translation for his data-heavy website, if I remember well he was using multiple table for the translation, he would not store IDLanguage inside a global TRANSLATION table, but would split every translation in its own table and do something like injecting the language in the SQL code to change, something like:

SELECT Content FROM Translation_EN ...
SELECT Content FROM Translation_FR ...

It would be sure more effective than doing a join on a full table, but that won't simplify my SQL full of joins…

So I probably have to think about alternatives and I need your ideas and comments about this.

Thank you very much.

Best Answer

Joins are good, use joins. For them to be effective though the join fields must be indexed as well as fields you intend to use in where clauses. Foreign Keys are not indexed automatically in SQL Server and I imagine they aren't in most other dbs either. PKs are generally indexed bu only if you formally make them a PK (which of course you should do).

Thousands of records is teeny tiny for a database. Our medium sized db has around 20 million records in one of it's main tables and we don't have performance problems. And we often join to 15-20 tables.

Related Topic