Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

1. JSP pages must include the header:

No Format
 <%@ page
 contentType="text/html; charset=UTF-8"
%> 

2. In the Catalina.bat (windows) catalina.sh (windows) apache$jakarta_config.com (OpenVMS), file there must be a switch added to the call to java.exe. The switch is:

-Dfile.encoding=UTF-8

I cannot find documentation for this environment variable anywhere or what it actually does but it is essential.

3. For translation of inputs coming back from the browser there must be a method that translates from the browser's ISO-8859-1 to UTF-8. It seems to me that -1 is used in all regions as I have had people in countries such as Greece & Bulgaria test this and they always send input back in -1 encoding. The method which you will use constantly should go something like this:

No Format
  /**
  * Convert ISO8859-1 format string (which is the default sent by IE
  * to the UTF-8 format that the database is in.
  */
 public String toUTF8(String isoString)
 {
  String utf8String = null;
  if (null != isoString && !isoString.equals(""))
  {
   try
   {
    byte[] stringBytesISO = isoString.getBytes("ISO-8859-1");
    utf8String = new String(stringBytesISO, "UTF-8");
   }
   catch(UnsupportedEncodingException e)
   {
    // As we can't translate just send back the best guess.
    System.out.println("UnsupportedEncodingException is: " +
e.getMessage());
    utf8String = isoString;
   }
  }
  else
  {
   utf8String = isoString;
  }
  return utf8String;
 } 

I have found that these three steps are all that is necessary to make your site accept any language that UTF-8 can work with. I extend my thanks to those of you on the Tomcat users list who helped me find these little gems.

(from the tomcat-user mailing list)

Alternative solution

The solution suggested above works fine with steps (1) and (2) only, but from the architecture perspective the correct way is to add a filter to the Tomcat that will do necessary correction for the application deployed without any additional changes to the rest of the code.

  1. Make sure JSP header is set as suggested:
No Format

<%@ page contentType="text/html; charset=UTF-8"%>

2. Example of filter:

{{{import java.io.*; import java.util.*; import javax.servlet.*; import javax.servlet.http.*;

public class CharsetFilter implements Filter
{
private String encoding;

public void init(FilterConfig config) throws ServletException
{
encoding = config.getInitParameter("requestEncoding");

if( encoding==null ) encoding="UTF-8";
}

public void doFilter(ServletRequest request, ServletResponse response, FilterChain next)
throws IOException, ServletException
{
request.setCharacterEncoding(encoding);
next.doFilter(request, response);
}

public void destroy(){}
}
}}}

Corresponding portion of web.xml configuration will look like:

No Format
  <!--CharsetFilter start-->

  <filter>
    <filter-name>Charset Filter</filter-name>
    <filter-class>CharsetFilter</filter-class>
      <init-param>
        <param-name>requestEncoding</param-name>
        <param-value>UTF-8</param-value>
      </init-param>
  </filter>

  <filter-mapping>
    <filter-name>Charset Filter</filter-name>
    <url-pattern>/*</url-pattern>
  </filter-mapping>

  <!--CharsetFilter end-->

Wiki MarkupThe suggested solution originates from \[http://people.comita.spb.ru/users/sergeya/java/ruschars.html Sergey Astakhov (all texts are in russian)\] (sergeya@comita.spb.ru)Describe Tomcat/UTF-8 here.