...
If a character encoding is not specified, the Servlet specification requires that an encoding of ISO-8859-1 is used. The character encoding for the body of an HTTP message (request or response) is specified in the Content-Type
header field. An example of such a header is Content-Type: text/html; charset=ISO-8859-1
which explicitly states that the default (ISO-8859-1) is being used.
References: HTTP 1.1 Specification, Section 3.7.1
Anchor | ||||
---|---|---|---|---|
|
...
- Set the
URIEncoding
attribute on the <Connector> element in server.xml to something specific (e.g.URIEncoding="UTF-8"
). - Set the
useBodyEncodingForURI
attribute on the <Connector> element in server.xml totrue
. This will cause the Connector to use the request body's encoding for GET parameters.
References: Tomcat 6 HTTP Connector, Tomcat 6 AJP Connector
Anchor | ||||
---|---|---|---|---|
|
...
- Java Servlet Specification 2.5
- Java Servlet Specification 2.4
- HTTP 1.1 Protocol] (hyperlinked version)
- URI Syntax
- ARPA Internet Text Messages
- HTML 4
Default encoding for request and response bodies
See 'Default Encoding for POST' below.
Default encoding for GET
The character set for HTTP query strings (that's the technical term for 'GET parameters') can be found in sections 2 and 2.1 the "URI Syntax" specification. The character set is defined to be US-ASCII. Any character that does not map to US-ASCII must be encoded in some way. Section 2.1 of the URI Syntax specification says that characters outside of US-ASCII must be encoded using %
escape sequences: each character is encoded as a literal %
followed by the two hexadecimal codes which indicate its character code. Thus, a
(US-ASCII character code 0x97) is equivalent to %97
. There is no default encoding for URIs specified anywhere, which is why there is a lot of confusion when it comes to decoding these values.
Some notes about the character encoding of URIs:
...