Apache Kafka is a powerful distributed streaming platform designed for processing real-time data events at scale. When paired with Confluent, a cloud-native platform built specifically for Kafka, developers gain access to enhanced tools like Schema Registry, Kafka Connect, and Stream Governance. In this guide, you’ll learn how to connect Kafka Streaming to Confluent, using a simple and practical approach ideal for building real-time data pipelines and event-driven architectures. Whether you’re deploying on-premises or in Confluent Cloud, this step-by-step tutorial walks you through the integration process from producer setup to stream consumption.

Step-by-Step Approach: How to connect Kafka to Confluent

1. Kafka Connect setup

To begin, you need a Kafka cluster. You can choose to:

  • Use Confluent Cloud: A fully managed service where you can quickly set up a Kafka cluster.
  • Self-Hosted Kafka: Install Kafka locally or on your own servers by downloading it from the Apache Kafka website.

2. Install Confluent Platform

If you opt for a self-hosted Kafka setup, download and install the Confluent Platform. It includes additional components like Schema Registry and Kafka Connect.

3. Configure Your Kafka Broker

Once you have Kafka and Confluent configured, edit the server.properties file to set the necessary configurations such as:

  • Broker ID
  • Listeners
  • Log directories
  • Replication factor

4. Create a Kafka Topic

Create a topic in your Kafka cluster where your streaming data will be published.

bin/kafka-topics.sh --create --topic your_topic_name --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1

5. Develop Your Kafka Streams Application

Use the Kafka Streams library in your application. Here’s a simple example in Java to create a stream application:

import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;

import java.util.Properties;

public class KafkaStreamingApp {
    public static void main(String[] args) {
        Properties props = new Properties();
        props.put(StreamsConfig.APPLICATION_ID_CONFIG, "streams-app");
        props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
        props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
        props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());

        StreamsBuilder builder = new StreamsBuilder();
        // Define your stream processing logic here

        KafkaStreams streams = new KafkaStreams(builder.build(), props);
        streams.start();
    }
}

6. Use Confluent Schema Registry

To manage schema for your Kafka data, configure the Confluent Schema Registry. Add the dependency in your project and define a schema for your data.

<dependency>
    <groupId>io.confluent</groupId>
    <artifactId>kafka-streams-avro-serde</artifactId>
    <version>7.0.1</version>
</dependency>

7. Publish and Consume Data

Utilize Kafka producers to send data to your topic and consumers to read from it. Take advantage of Confluent’s features for effective serialization, deserialization, and schema management.

8. Monitoring and Scaling

Utilize the Confluent Control Center to monitor your Kafka producers and consumers. It provides insights into the health of your Kafka Streams application.

9. Deploy and Scale

Depending on your needs, scale your Kafka Streams application by increasing the number of instances or partitions within your topic.

References

Follow these steps to connect Kafka Streaming to Confluent. This will allow you to leverage its capabilities for processing real-time data streams.