You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

1. JSP pages must include the header:

 <%@ page
 contentType="text/html; charset=UTF-8"
%> 

2. In the Catalina.bat (windows) catalina.sh (windows) apache$jakarta_config.com (OpenVMS), file there must be a switch added to the call to java.exe. The switch is:

-Dfile.encoding=UTF-8

I cannot find documentation for this environment variable anywhere or what it actually does but it is essential.

3. For translation of inputs coming back from the browser there must be a method that translates from the browser's ISO-8859-1 to UTF-8. It seems to me that -1 is used in all regions as I have had people in countries such as Greece & Bulgaria test this and they always send input back in -1 encoding. The method which you will use constantly should go something like this:

  /**
  * Convert ISO8859-1 format string (which is the default sent by IE
  * to the UTF-8 format that the database is in.
  */
 public String toUTF8(String isoString)
 {
  String utf8String = null;
  if (null != isoString && !isoString.equals(""))
  {
   try
   {
    byte[] stringBytesISO = isoString.getBytes("ISO-8859-1");
    utf8String = new String(stringBytesISO, "UTF-8");
   }
   catch(UnsupportedEncodingException e)
   {
    // As we can't translate just send back the best guess.
    System.out.println("UnsupportedEncodingException is: " +
e.getMessage());
    utf8String = isoString;
   }
  }
  else
  {
   utf8String = isoString;
  }
  return utf8String;
 } 

I have found that these three steps are all that is necessary to make your site accept any language that UTF-8 can work with. I extend my thanks to those of you on the Tomcat users list who helped me find these little gems.

(from the tomcat-user mailing list)

Alternative solution

The solution suggested above works fine with steps (1) and (2) only, but from the architecture perspective the correct way is to add a filter to the Tomcat that will do necessary correction for the application deployed without any additional changes to the rest of the code.

Example of filter:

{{{import java.io.*; import java.util.*; import javax.servlet.*; import javax.servlet.http.*;

public class CharsetFilter implements Filter
{
private String encoding;

public void init(FilterConfig config) throws ServletException
{
encoding = config.getInitParameter("requestEncoding");

if( encoding==null ) encoding="UTF-8";
}

public void doFilter(ServletRequest request, ServletResponse response, FilterChain next)
throws IOException, ServletException
{
request.setCharacterEncoding(encoding);
next.doFilter(request, response);
}

public void destroy(){}
}
}}}

Corresponding portion of web.xml configuration will look like:

  <!--CharsetFilter start-->

  <filter>
    <filter-name>Charset Filter</filter-name>
    <filter-class>CharsetFilter</filter-class>
      <init-param>
        <param-name>requestEncoding</param-name>
        <param-value>UTF-8</param-value>
      </init-param>
  </filter>

  <filter-mapping>
    <filter-name>Charset Filter</filter-name>
    <url-pattern>/*</url-pattern>
  </filter-mapping>

  <!--CharsetFilter end-->

The suggested solution originates from [http://people.comita.spb.ru/users/sergeya/java/ruschars.html Sergey Astakhov (all texts are in russian)] (sergeya@comita.spb.ru)

  • No labels