Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

 

Table of Contents

Background

...

Code Block
ERROR kafka.producer.SyncProducer  - Producer connection to localhost:1025 unsuccessful
java.net.ConnectException: Connection refused
    at sun.nio.ch.Net.connect0(Native Method)
    at sun.nio.ch.Net.connect(Net.java:465)
    at sun.nio.ch.Net.connect(Net.java:457)
    at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:639)
    at kafka.network.BlockingChannel.connect(BlockingChannel.scala:57)
    at kafka.producer.SyncProducer.connect(SyncProducer.scala:146)
    at kafka.producer.SyncProducer.getOrMakeConnection(SyncProducer.scala:161)
    at kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:68)
    at kafka.producer.SyncProducer.send(SyncProducer.scala:112)
    at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:53)
    at kafka.producer.BrokerPartitionInfo.updateInfo(BrokerPartitionInfo.scala:82)
    at kafka.producer.async.DefaultEventHandler$$anonfun$handle$1.apply$mcV$sp(DefaultEventHandler.scala:69)
    at kafka.utils.Utils$.swallow(Utils.scala:186)
    at kafka.utils.Logging$class.swallowError(Logging.scala:105)
    at kafka.utils.Utils$.swallowError(Utils.scala:45)
    at kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:69)
    at kafka.producer.Producer.send(Producer.scala:74)
    at kafka.javaapi.producer.Producer.send(Producer.scala:32)
    ...
WARN kafka.client.ClientUtils$  - Fetching topic metadata with correlation id 0 for topics [Set(test-topic)] from broker [id:0,host:localhost,port:1025] failed
java.net.ConnectException: Connection refused
    at sun.nio.ch.Net.connect0(Native Method)
    at sun.nio.ch.Net.connect(Net.java:465)
    at sun.nio.ch.Net.connect(Net.java:457)
    at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:639)
    at kafka.network.BlockingChannel.connect(BlockingChannel.scala:57)
    at kafka.producer.SyncProducer.connect(SyncProducer.scala:146)
    at kafka.producer.SyncProducer.getOrMakeConnection(SyncProducer.scala:161)
    at kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:68)
    at kafka.producer.SyncProducer.send(SyncProducer.scala:112)
    at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:53)
    at kafka.producer.BrokerPartitionInfo.updateInfo(BrokerPartitionInfo.scala:82)
    at kafka.producer.async.DefaultEventHandler$$anonfun$handle$1.apply$mcV$sp(DefaultEventHandler.scala:69)
    at kafka.utils.Utils$.swallow(Utils.scala:186)
    at kafka.utils.Logging$class.swallowError(Logging.scala:105)
    at kafka.utils.Utils$.swallowError(Utils.scala:45)
    at kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:69)
    at kafka.producer.Producer.send(Producer.scala:74)
    at kafka.javaapi.producer.Producer.send(Producer.scala:32)
    ...

 

2. On the other hand, some important logging information is missing, which make makes debugging / trouble shooting much more difficult than it should be, for example:

...

Some related JIRAs have been filed claiming for these issues also, for example, KAFKA-1066 and KAFKA-1122.

...

Also, the logging indent need to be concise and self-distinctive (this is usually also related to client id naming).

...

Logging Cleaning Actions

Based on some discussions around these points, we have proposed several actions so far to clean our logging for now, this following list may expand as we discuss along:


1. Compress request logs.

The request handling logs has been a pain for a long time for us. Since it records one entry for each of the request received and handled, and simply due to the large number of clients and some clients may not follow any backoff mechanism in sending metadata refresh requests, etc, its logging files can easily grow to hundreds of GBs in a day. The plan is to change the request logs to default to DEBUG level (summary) with TRACE level (requests detail) printed as binary format at the same time, along with compression / rolling of the binary logs. The DEBUG level information is sufficient under normal operations, and we only need to de-compress and transform the request details logs into text when trouble shooting is needed.

Details can be found in KAFKA-1590.

2. Clean-up unnecessary stack traces.

This one is kind of obvious, we need to go through the Kafka code base and check for all these prints of stack traces, if it is really necessary.

Details can be found in KAFKA-1591.

3. Clean-up unnecessary INFO level log4j entries.

Some of the INFO level logging entries should really be DEBUG or even TRACE, we need to reduce them so that the sever / client logs are clean and concise under normal operations.

...