1. JSP pages must include the header:
No Format |
---|
<%@ page
contentType="text/html; charset=UTF-8"
%> |
Wiki Markup |
---|
2. For translation of inputs coming back from the browser there must be a method that translates from the browser's ISO-8859-1 to UTF-8. ISO-8859-1 is the default character encoding for servers and browsers according to the
\[http://www.ietf.org/rfc/rfc2616.txt HTTP specification\] section 3.4.1. |
No Format |
---|
/**
* Convert ISO-8859-1 format string (which is the default sent by IE
* to the UTF-8 format that the database is in.
*/
public String toUTF8(String isoString)
{
String utf8String = null;
if (null != isoString && !isoString.equals(""))
{
try
{
byte[] stringBytesISO = isoString.getBytes("ISO-8859-1");
utf8String = new String(stringBytesISO, "UTF-8");
}
catch(UnsupportedEncodingException e)
{
// TODO: This should never happen. The UnsupportedEncodingException
// should be propagated instead of swallowed. This error would indicate
// a severe misconfiguration of the JVM.
// As we can't translate just send back the best guess.
System.out.println("UnsupportedEncodingException is: " +
e.getMessage());
utf8String = isoString;
}
}
else
{
utf8String = isoString;
}
return utf8String;
} |
I have found that these three steps are all that is necessary to make your site accept any language that UTF-8 can work with. I extend my thanks to those of you on the Tomcat users list who helped me find these little gems.
(from the tomcat-user mailing list)
Alternative solution
The solution suggested above works, but from the architecture perspective the correct way is to add a filter to the Tomcat that will do necessary correction for the application deployed without any additional changes to the rest of the code.
- Make sure JSP header is set as suggested:
No Format |
---|
<%@ page contentType="text/html; charset=UTF-8"%>
|
2. Example of filter:
{{{import java.io.*; import java.util.*; import javax.servlet.*; import javax.servlet.http.*;
public class CharsetFilter implements Filter
{
private String encoding;
public void init(FilterConfig config) throws ServletException
{
encoding = config.getInitParameter("requestEncoding");
if( encoding==null ) encoding="UTF-8";
}
public void doFilter(ServletRequest request, ServletResponse response, FilterChain next)
throws IOException, ServletException
{
// Respect the client-specified character encoding
// (see HTTP specification section 3.4.1)
if(null == request.getCharacterEncoding())
request.setCharacterEncoding(encoding);
next.doFilter(request, response);
}
public void destroy(){}
}
}}}
Corresponding portion of web.xml configuration will look like:
No Format |
---|
<!--CharsetFilter start-->
<filter>
<filter-name>Charset Filter</filter-name>
<filter-class>CharsetFilter</filter-class>
<init-param>
<param-name>requestEncoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
</filter>
<filter-mapping>
<filter-name>Charset Filter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
<!--CharsetFilter end--> |
Wiki Markup |
---|
The suggested solution originates from \[http://people.comita.spb.ru/users/sergeya/java/ruschars.html Sergey Astakhov (all texts are in russian)\] (sergeya@comita.spb.ru) |
Important note: Note that this filter should be as far towards the front of your filter chain as possible. If some other code calls request.getParameter (or a similar method) before this filter is invoked, then the encoding will not be set properly, and your parameters will still be decoded improperly.
- TIP -*
Update the file $CATALINA_HOME/conf/server.xml for UTF-8 support by connectors. Example:
<Connector port="8080"
maxThreads="150" minSpareThreads="25" maxSpareThreads="75"
enableLookups="false" redirectPort="8443" acceptCount="100"
debug="0" connectionTimeout="20000"
disableUploadTimeout="true"
URIEncoding="UTF-8"/>
Note that this changes the behavior of reading GET parameters from the request URI and will not affect POST parameters at all.
See Also
...
Describe Tomcat/UTF-8 here.