It has been a while, so this is not comprehensive.
Character Sets
Unicode is great, but you can't get away with ignoring other character sets. The default character set on Windows XP (English) is Cp1252. On the web, you don't know what a browser will send you (though hopefully your container will handle most of this). And don't be surprised when there are bugs in whatever implementation you are using. Character sets can have interesting interactions with filenames when they move to between machines.
Translating Strings
Translators are, generally speaking, not coders. If you send a source file to a translator, they will break it. Strings should be extracted to resource files (e.g. properties files in Java or resource DLLs in Visual C++). Translators should be given files that are difficult to break and tools that don't let them break them.
Translators do not know where strings come from in a product. It is difficult to translate a string without context. If you do not provide guidance, the quality of the translation will suffer.
While on the subject of context, you may see the same string "foo" crop up in multiple times and think it would be more efficient to have all instances in the UI point to the same resource. This is a bad idea. Words may be very context-sensitive in some languages.
Translating strings costs money. If you release a new version of a product, it makes sense to recover the old versions. Have tools to recover strings from your old resource files.
String concatenation and manual manipulation of strings should be minimized. Use the format functions where applicable.
Translators need to be able to modify hotkeys. Ctrl+P is print in English; the Germans use Ctrl+D.
If you have a translation process that requires someone to manually cut and paste strings at any time, you are asking for trouble.
Dates, Times, Calendars, Currency, Number Formats, Time Zones
These can all vary from country to country. A comma may be used to denote decimal places. Times may be in 24hour notation. Not everyone uses the Gregorian calendar. You need to be unambiguous, too. If you take care to display dates as MM/DD/YYYY for the USA and DD/MM/YYYY for the UK on your website, the dates are ambiguous unless the user knows you've done it.
Especially Currency
The Locale functions provided in the class libraries will give you the local currency symbol, but you can't just stick a pound (sterling) or euro symbol in front of a value that gives a price in dollars.
User Interfaces
Layout should be dynamic. Not only are strings likely to double in length on translation, the entire UI may need to be inverted (Hebrew; Arabic) so that the controls run from right to left. And that is before we get to Asia.
Testing Prior To Translation
- Use static analysis of your code to locate problems. At a bare minimum, leverage the tools built into your IDE. (Eclipse users can go to Window > Preferences > Java > Compiler > Errors/Warnings and check for non-externalised strings.)
- Smoke test by simulating translation. It isn't difficult to parse a resource file and replace strings with a pseudo-translated version that doubles the length and inserts funky characters. You don't have to speak a language to use a foreign operating system. Modern systems should let you log in as a foreign user with translated strings and foreign locale. If you are familiar with your OS, you can figure out what does what without knowing a single word of the language.
- Keyboard maps and character set references are very useful.
- Virtualisation would be very useful here.
Non-technical Issues
Sometimes you have to be sensitive to cultural differences (offence or incomprehension may result). A mistake you often see is the use of flags as a visual cue choosing a website language or geography. Unless you want your software to declare sides in global politics, this is a bad idea. If you were French and offered the option for English with St. George's flag (the flag of England is a red cross on a white field), this might result in confusion for many English speakers - assume similar issues will arise with foreign languages and countries. Icons need to be vetted for cultural relevance. What does a thumbs-up or a green tick mean? Language should be relatively neutral - addressing users in a particular manner may be acceptable in one region, but considered rude in another.
Resources
C++ and Java programmers may find the ICU website useful: http://www.icu-project.org/
JSP is a Java view technology running on the server machine which allows you to write template text in client side languages (like HTML, CSS, JavaScript, ect.). JSP supports taglibs, which are backed by pieces of Java code that let you control the page flow or output dynamically. A well-known taglib is JSTL. JSP also supports Expression Language, which can be used to access backend data (via attributes available in the page, request, session and application scopes), mostly in combination with taglibs.
When a JSP is requested for the first time or when the web app starts up, the servlet container will compile it into a class extending HttpServlet
and use it during the web app's lifetime. You can find the generated source code in the server's work directory. In for example Tomcat, it's the /work
directory. On a JSP request, the servlet container will execute the compiled JSP class and send the generated output (usually just HTML/CSS/JS) through the web server over a network to the client side, which in turn displays it in the web browser.
Servlet is a Java application programming interface (API) running on the server machine, which intercepts requests made by the client and generates/sends a response. A well-known example is the HttpServlet
which provides methods to hook on HTTP requests using the popular HTTP methods such as GET
and POST
. You can configure HttpServlet
s to listen to a certain HTTP URL pattern, which is configurable in web.xml
, or more recently with Java EE 6, with @WebServlet
annotation.
When a Servlet is first requested or during web app startup, the servlet container will create an instance of it and keep it in memory during the web app's lifetime. The same instance will be reused for every incoming request whose URL matches the servlet's URL pattern. You can access the request data by HttpServletRequest
and handle the response by HttpServletResponse
. Both objects are available as method arguments inside any of the overridden methods of HttpServlet
, such as doGet()
and doPost()
.
JSF is a component based MVC framework which is built on top of the Servlet API and provides components via taglibs which can be used in JSP or any other Java based view technology such as Facelets. Facelets is much more suited to JSF than JSP. It namely provides great templating capabilities such as composite components, while JSP basically only offers the <jsp:include>
for templating in JSF, so that you're forced to create custom components with raw Java code (which is a bit opaque and a lot of tedious work) when you want to replace a repeated group of components with a single component. Since JSF 2.0, JSP has been deprecated as view technology in favor of Facelets.
Note: JSP itself is NOT deprecated, just the combination of JSF with JSP is deprecated.
Note: JSP has great templating abilities by means of Taglibs, especially the (Tag File) variant. JSP templating in combination with JSF is what is lacking.
As being a MVC (Model-View-Controller) framework, JSF provides the FacesServlet
as the sole request-response Controller. It takes all the standard and tedious HTTP request/response work from your hands, such as gathering user input, validating/converting them, putting them in model objects, invoking actions and rendering the response. This way you end up with basically a JSP or Facelets (XHTML) page for View and a JavaBean class as Model. The JSF components are used to bind the view with the model (such as your ASP.NET web control does) and the FacesServlet
uses the JSF component tree to do all the work.
Related questions
Best Answer
In case of a basic JSP/Servlet webapplication, the basic approach would be using JSTL
fmt
taglib in combination with resource bundles. Resource bundles contain key-value pairs where the key is a constant which is the same for all languages and the value differs per language. Resource bundles are usually properties files which are loaded byResourceBundle
API. This can however be customized so that you can load the key-value pairs from for example a database.Here's an example how to internationalize the login form of your webapplication with properties file based resource bundles.
Create the following files and put them in some package, e.g.
com.example.i18n
(in case of Maven, put them in the package structure insidesrc/main/resources
).text.properties
(contains key-value pairs in the default language, usually English)text_nl.properties
(contains Dutch (nl
) key-value pairs)text_es.properties
(contains Spanish (es
) key-value pairs)The resource bundle filename should adhere the following pattern
name_ll_CC.properties
. The_ll
part should be the lowercase ISO 693-1 language code. It is optional and only required whenever the_CC
part is present. The_CC
part should be the uppercase ISO 3166-1 Alpha-2 country code. It is optional and often only used to distinguish between country-specific language dialects, like American English (_en_US
) and British English (_en_GB
).If not done yet, install JSTL as per instructions in this answer: How to install JSTL? The absolute uri: http://java.sun.com/jstl/core cannot be resolved.
Create the following example JSP file and put it in web content folder.
login.jsp
The
<c:set var="language">
manages the current language. If the language was supplied as request parameter (by language dropdown), then it will be set. Else if the language was already previously set in the session, then stick to it instead. Else use the user supplied locale in the request header.The
<fmt:setLocale>
sets the locale for resource bundle. It's important that this line is before the<fmt:setBundle>
.The
<fmt:setBundle>
initializes the resource bundle by its base name (that is, the full qualified package name until with the sole name without the_ll_CC
specifier).The
<fmt:message>
retrieves the message value by the specified bundle key.The
<html lang="${language}">
informs the searchbots what language the page is in so that it won't be marked as duplicate content (thus, good for SEO).The language dropdown will immediately submit by JavaScript when another language is chosen and the page will be refreshed with the newly chosen language.
You however need to keep in mind that properties files are by default read using ISO-8859-1 character encoding. You would need to escape them by unicode escapes. This can be done using the JDK-supplied
native2ascii.exe
tool. See also this article section for more detail.A theoretical alternative would be to supply a bundle with a custom
Control
to load those files as UTF-8, but that's unfortunately not supported by the basic JSTLfmt
taglib. You would need to manage it all yourself with help of aFilter
. There are (MVC) frameworks which can handle this in a more transparent manner, like JSF, see also this article.