You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 8 Next »

1. JSP pages must include the header:

 <%@ page
 contentType="text/html; charset=UTF-8"
%> 

2. For translation of inputs coming back from the browser there must be a method that translates from the browser's ISO-8859-1 to UTF-8. ISO-8859-1 is the default character encoding for servers and browsers according to the
[http://www.ietf.org/rfc/rfc2616.txt HTTP specification] section 3.4.1.

  /**
  * Convert ISO-8859-1 format string (which is the default sent by IE
  * to the UTF-8 format that the database is in.
  */
 public String toUTF8(String isoString)
 {
  String utf8String = null;
  if (null != isoString && !isoString.equals(""))
  {
   try
   {
    byte[] stringBytesISO = isoString.getBytes("ISO-8859-1");
    utf8String = new String(stringBytesISO, "UTF-8");
   }
   catch(UnsupportedEncodingException e)
   {
    //  TODO: This should never happen. The UnsupportedEncodingException
    // should be propagated instead of swallowed. This error would indicate
    // a severe misconfiguration of the JVM.

    // As we can't translate just send back the best guess.
    System.out.println("UnsupportedEncodingException is: " +
e.getMessage());
    utf8String = isoString;
   }
  }
  else
  {
   utf8String = isoString;
  }
  return utf8String;
 } 

I have found that these three steps are all that is necessary to make your site accept any language that UTF-8 can work with. I extend my thanks to those of you on the Tomcat users list who helped me find these little gems.

(from the tomcat-user mailing list)

Alternative solution

The solution suggested above works, but from the architecture perspective the correct way is to add a filter to the Tomcat that will do necessary correction for the application deployed without any additional changes to the rest of the code.

  1. Make sure JSP header is set as suggested:
<%@ page contentType="text/html; charset=UTF-8"%>

2. Example of filter:

{{{import java.io.*; import java.util.*; import javax.servlet.*; import javax.servlet.http.*;

public class CharsetFilter implements Filter
{
private String encoding;

public void init(FilterConfig config) throws ServletException
{
encoding = config.getInitParameter("requestEncoding");

if( encoding==null ) encoding="UTF-8";
}

public void doFilter(ServletRequest request, ServletResponse response, FilterChain next)
throws IOException, ServletException
{
// Respect the client-specified character encoding
// (see HTTP specification section 3.4.1)
if(null == request.getCharacterEncoding())
request.setCharacterEncoding(encoding);

next.doFilter(request, response);
}

public void destroy(){}
}
}}}

Corresponding portion of web.xml configuration will look like:

  <!--CharsetFilter start-->

  <filter>
    <filter-name>Charset Filter</filter-name>
    <filter-class>CharsetFilter</filter-class>
      <init-param>
        <param-name>requestEncoding</param-name>
        <param-value>UTF-8</param-value>
      </init-param>
  </filter>

  <filter-mapping>
    <filter-name>Charset Filter</filter-name>
    <url-pattern>/*</url-pattern>
  </filter-mapping>

  <!--CharsetFilter end-->

The suggested solution originates from [http://people.comita.spb.ru/users/sergeya/java/ruschars.html Sergey Astakhov (all texts are in russian)] (sergeya@comita.spb.ru)

Important note: Note that this filter should be as far towards the front of your filter chain as possible. If some other code calls request.getParameter (or a similar method) before this filter is invoked, then the encoding will not be set properly, and your parameters will still be decoded improperly.

    • TIP -*

Update the file $CATALINA_HOME/conf/server.xml for UTF-8 support by connectors. Example:

<Connector port="8080"
maxThreads="150" minSpareThreads="25" maxSpareThreads="75"
enableLookups="false" redirectPort="8443" acceptCount="100"
debug="0" connectionTimeout="20000"
disableUploadTimeout="true"
URIEncoding="UTF-8"/>

Note that this changes the behavior of reading GET parameters from the request URI and will not affect POST parameters at all.

See Also

  • No labels