Background

Today we observe two cases in our logging:

1. In some places, we creates too much not-so-useful logging content which pollute the logging files. For example, below is a snippet of the server log under normal operations (setting log4j level to INFO):

2014/08/24 00:00:12.145 INFO [Processor] [kafka-network-thread-10251-1] [kafka-server] [] Closing socket connection to /172.17.198.21.
2014/08/24 00:00:12.152 INFO [Processor] [kafka-network-thread-10251-6] [kafka-server] [] Closing socket connection to /172.24.25.165.
2014/08/24 00:00:12.191 INFO [Processor] [kafka-network-thread-10251-4] [kafka-server] [] Closing socket connection to /172.17.198.24.
2014/08/24 00:00:12.209 INFO [Processor] [kafka-network-thread-10251-4] [kafka-server] [] Closing socket connection to /172.17.232.135.
2014/08/24 00:00:12.218 INFO [Processor] [kafka-network-thread-10251-6] [kafka-server] [] Closing socket connection to /172.17.198.41.
2014/08/24 00:00:12.235 INFO [Processor] [kafka-network-thread-10251-5] [kafka-server] [] Closing socket connection to /172.17.198.30.
2014/08/24 00:00:12.326 INFO [Processor] [kafka-network-thread-10251-1] [kafka-server] [] Closing socket connection to /172.17.198.27.
2014/08/24 00:00:12.362 INFO [Processor] [kafka-network-thread-10251-2] [kafka-server] [] Closing socket connection to /172.17.198.23.
2014/08/24 00:00:12.455 INFO [Processor] [kafka-network-thread-10251-5] [kafka-server] [] Closing socket connection to /172.17.198.94.
2014/08/24 00:00:12.488 INFO [Processor] [kafka-network-thread-10251-3] [kafka-server] [] Closing socket connection to /172.17.198.20.

This is from the socket server, which logs closing a socket each time as INFO. However, since most clients use a temporary socket to refresh their metadata, this result in tons of closing socket INFO entries which are useless most of the time.

Another example is that when there is a network connection issue causing clients / follower-replicas not able to connect to servers / leader replicas, an ERROR / WARN log entry will be recorded. But since most of the clients will retry frequently upon failures, and usually log4j will be configured to at least include ERROR / WARN entries, this can cause the clients logs to be swamped with ERROR / WARN along with stack traces:

ERROR kafka.producer.SyncProducer  - Producer connection to localhost:1025 unsuccessful
java.net.ConnectException: Connection refused
    at sun.nio.ch.Net.connect0(Native Method)
    at sun.nio.ch.Net.connect(Net.java:465)
    at sun.nio.ch.Net.connect(Net.java:457)
    at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:639)
    at kafka.network.BlockingChannel.connect(BlockingChannel.scala:57)
    at kafka.producer.SyncProducer.connect(SyncProducer.scala:146)
    at kafka.producer.SyncProducer.getOrMakeConnection(SyncProducer.scala:161)
    at kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:68)
    at kafka.producer.SyncProducer.send(SyncProducer.scala:112)
    at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:53)
    at kafka.producer.BrokerPartitionInfo.updateInfo(BrokerPartitionInfo.scala:82)
    at kafka.producer.async.DefaultEventHandler$$anonfun$handle$1.apply$mcV$sp(DefaultEventHandler.scala:69)
    at kafka.utils.Utils$.swallow(Utils.scala:186)
    at kafka.utils.Logging$class.swallowError(Logging.scala:105)
    at kafka.utils.Utils$.swallowError(Utils.scala:45)
    at kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:69)
    at kafka.producer.Producer.send(Producer.scala:74)
    at kafka.javaapi.producer.Producer.send(Producer.scala:32)
    ...
WARN kafka.client.ClientUtils$  - Fetching topic metadata with correlation id 0 for topics [Set(test-topic)] from broker [id:0,host:localhost,port:1025] failed
java.net.ConnectException: Connection refused
    at sun.nio.ch.Net.connect0(Native Method)
    at sun.nio.ch.Net.connect(Net.java:465)
    at sun.nio.ch.Net.connect(Net.java:457)
    at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:639)
    at kafka.network.BlockingChannel.connect(BlockingChannel.scala:57)
    at kafka.producer.SyncProducer.connect(SyncProducer.scala:146)
    at kafka.producer.SyncProducer.getOrMakeConnection(SyncProducer.scala:161)
    at kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:68)
    at kafka.producer.SyncProducer.send(SyncProducer.scala:112)
    at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:53)
    at kafka.producer.BrokerPartitionInfo.updateInfo(BrokerPartitionInfo.scala:82)
    at kafka.producer.async.DefaultEventHandler$$anonfun$handle$1.apply$mcV$sp(DefaultEventHandler.scala:69)
    at kafka.utils.Utils$.swallow(Utils.scala:186)
    at kafka.utils.Logging$class.swallowError(Logging.scala:105)
    at kafka.utils.Utils$.swallowError(Utils.scala:45)
    at kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:69)
    at kafka.producer.Producer.send(Producer.scala:74)
    at kafka.javaapi.producer.Producer.send(Producer.scala:32)
    ...

2) in some other places, some important log entries are missing, which make some debugging more difficult. Some more example stories can be found here:

KAFKA-1066

KAFKA-1122

LIKAFKA-776

LIKAFKA-550

Here is just a summarized list of logging styles we have been implicitly following to just inspire the thinking and discussion. Existing code styles can be found here.

We do not need to force ourselves to some strict coding conventions: it would be great if after the discussion people can get a better feeling of when / what to log in general in future development.

When to Log

1. At the beginning / end of startup(), shutdown(),

2. Background tread logic such as cleanupLogs(), etc.

3. Entrance of branches like "if-else", "case None", etc.

4. At the beginning of special case handling logic, such as Controller.onBrokerFailure(), etc.

5. At the end of an important and self-contained code block.

6. While handling checked exceptions.

7. etc..

What to Log

1. Related local / global attribute values with the function / code block.

2. Description of the current state, possible explanation, etc.

3. Stack trace for Throwable handling OR error logging.

4. Exception name and attached message for known exception type and clear calltree AND warning or below-level logging.

5. Swallow exceptions and hence no logging during shutting down.

6. etc..

Logging Format

1. Logging indent to be concise and self-distinctive (this is usually also related to client id naming).

2. Class.toString to be concise, and avoid multi-line if possible.

3. Descriptive content to be concise.

4. Break possible long logging (for example, "moved partition ..... to online" in controller log that can take hundreds of lines) to multiple lines.

5. etc..

Space shortcuts

Child pages

Background

When to Log

What to Log

Logging Format

Space shortcuts

Child pages

Kafka Error Handling and Logging

Background

When to Log

What to Log

Logging Format