Ecosystem

Here is a list of tools we have been told about that integrate with Kafka outside the main distribution. We haven't tried them all, so they may not work!

Clients, of course, are listed separately here.

Distributions & Packaging

Cloudera Kafka source https://github.com/cloudera-labs/kafka/tree/cdh5-0.8.2_1.1.0 and release http://www.cloudera.com/content/cloudera/en/developers/home/cloudera-labs/apache-kafka.html
Hortonworks Kafka source ??? and release http://hortonworks.com/hadoop/kafka/
Stratio Kafka source ??? for ubuntu http://repository.stratio.com/sds/1.1/ubuntu/13.10/binary/ and for RHEL http://repository.stratio.com/sds/1.1/RHEL/

Stream Processing

Storm - A stream-processing framework.
Samza - A YARN-based stream processing framework.
Storm Spout - Consume messages from Kafka and emit as Storm tuples
Kafka-Storm - Kafka 0.8, Storm 0.9, Avro integration
SparkStreaming - Kafka reciever supports Kafka 0.8 and above

Hadoop Integration

Camus - LinkedIn's Kafka=>HDFS pipeline. This one is used for all data at LinkedIn, and works great.
Kafka Hadoop Loader A different take on Hadoop loading functionality from what is included in the main distribution.
Flume - Contains Kafka Source (consumer) and Sink (producer)
KaBoom - A high-performance HDFS data loader

Search and Query

ElasticSearch - This project, Kafka Standalone Consumer will read the messages from Kafka, processes and index them in ElasticSearch.
Presto - The Presto Kafka connector allows you to query Kafka in SQL using Presto.
Hive - Hive SerDe that allows querying Kafka (Avro only for now) using Hive SQL

Web Management Consoles

Kafka Web Console- Displays information about your Kafka cluster including which nodes are up and what topics they host data for.
Kafka Offset Monitor - Displays the state of all consumers and how far behind the head of the stream they are.
Capillary – Displays the state and deltas of Kafka-based Apache Storm topologies. Supports Kafka >= 0.8. It also provides an API for fetching this information for monitoring purposes.

AWS Integration

Automated AWS deployment
Kafka -> S3 Mirroring tool from Pinterest.
Alternative Kafka->S3 Mirroring tool

Logging

Syslog producer - A syslog producer that support both raw data and protobuf with meta data for deep analytics usage.
klogd - A python syslog publisher
klogd2 - A java syslog publisher
Tail2Kafka - A simple log tailing utility
Fluentd plugin - Integration with Fluentd
Remote log viewer
LogStash integration - Integration with LogStash and Fluentd
Syslog Collector written in Go
Klogger - A simple proxy service for Kafka.
fuse-kafka: A file system logging agent based on Kafka

Flume - Kafka plugins

Flume Kafka Plugin - Integration with Flume
Kafka as a sink and source in Flume - Integration with Flume

Metrics

Mozilla Metrics Service - A Kafka and Protocol Buffers based metrics and logging system
Ganglia Integration
SPM for Kafka
Coda Hale Metric Reporter to Kafka

Packing and Deployment

Kafka Camel Integration

Misc.

Kafka Websocket - A proxy that interoperates with websockets for delivering Kafka data to browsers.
KafkaCat - A native, command line producer and consumer.
Kafka Mirror - An alternative to the built-in mirroring tool
Ruby Demo App
Apache Camel Integration
Infobright integration
Riemann Consumer of Metrics
stormkafkamom – curses-based tool which displays state of Apache Storm based Kafka consumers (Kafka 0.7 only).

Space shortcuts

Child pages