Load streaming data from AutoMQ Kafka to Timeplus

AutoMQ for Kafka is a cloud-native version of Kafka redesigned for cloud environments. AutoMQ Kafka is open source and fully compatible with the Kafka protocol, fully leveraging cloud benefits. Compared to self-managed Apache Kafka, AutoMQ Kafka, with its cloud-native architecture, offers features like capacity auto scaling, self-balancing of network traffic, move partition in seconds. These features contribute to a significantly lower Total Cost of Ownership (TCO) for users.

This article will guide you on how to import data from AutoMQ Kafka into Timeplus using the Timeplus console. Since AutoMQ Kafka is 100% compatible with the Apache Kafka protocol, you can also create an external stream in Kafka to analyze data in AutoMQ without moving it.

Prepare AutoMQ Kafka and test data

To prepare your AutoMQ Kafka environment and test data, follow the AutoMQ Quick Start guide to deploy your AutoMQ Kafka cluster. Ensure that Timeplus can directly connect to your AutoMQ Kafka server. You can use tools like ngrok to securely expose your local AutoMQ Kafka proxy on the internet, allowing Timeplus Cloud to connect to it. For more details, see the blog.

Note
if you maintain an IP whitelist, you’ll need to include our static IP in it:
52.83.159.13 for cloud.timeplus.com.cn

To quickly create a topic named example_topic in AutoMQ Kafka and write a test JSON data into it, follow these steps:

Create a topic:

Use Kafka’s command-line tools to create a topic. Ensure you have access to the Kafka environment and the Kafka service is running. Here is the command to create a topic:

1
./kafka-topics.sh --create --topic example_topic --bootstrap-server 10.0.96.4:9092 --partitions 1 --replication-factor 1

Note: Replace topic and bootstrap-server with your Kafka server address.

To check the result of the topic creation, use this command:

1
./kafka-topics.sh --describe example_topic --bootstrap-server 10.0.96.4:9092

Generate test data:

Generate a simple JSON format test data

1
2
3
4
5
6
{
"id": 1,
"name": "测试用户",
"timestamp": "2023-11-10T12:00:00",
"status": "active"
}

Write Test Data:

Use Kafka’s command-line tools or programming methods to write test data into example_topic. Here is an example using command-line tools:

1
echo '{"id": 1, "name": "测试用户", "timestamp": "2023-11-10T12:00:00", "status": "active"}' | sh kafka-console-producer.sh --broker-list 10.0.96.4:9092 --topic example_topic

Note: Replace topic and bootstrap-server with your Kafka server address.

To view the recently written topic data, use the following command:

1
sh kafka-console-consumer.sh --bootstrap-server 10.0.96.4:9092 --topic example_topic --from-beginning

AutoMQ Kafka Data Source

  1. In the left navigation menu, click Data Ingestion, then click the Add Data button in the upper right corner.
  2. In this pop-up window, you will see the data sources you can connect to and other methods to add data. Since AutoMQ Kafka is fully compatible with Apache Kafka, you can directly click on Apache Kafka here.
  3. Enter the broker URL. Since the default created AutoMQ Kafka does not enable TLS and authentication, turn off TLS and authentication here.
  4. Enter the name of the AutoMQ Kafka topic and specify the ‘read as’ data format. We currently support JSON, AVRO, and text formats.
    1. If the data in the AutoMQ Kafka topic is in JSON format, but the schema may change over time, we recommend you choose Text. This way, the entire JSON document will be saved as a string, and you can apply functions related to JSON to extract values, even if the schema changes.
    2. If you choose AVRO, there is an “auto-extract” option. By default, this option is off, which means the entire message will be saved as a string. If you turn it on, the top-level attributes in the AVRO message will be put into different columns. This is more convenient for your queries but does not support schema evolution. When choosing AVRO, you also need to specify the address of the schema registry, API key, and secret.
  5. In the next “Preview” step, we will show you at least one event from your specified AutoMQ Kafka data source.
  6. By default, your new data source will automatically generate a new stream in Timeplus. You are invited to name this fresh stream and verify the column information, including the column names and data types. You also have the option to designate a column as the event time column. Should you choose not to do so, we will use the ingestion time as the event time. Alternatively, you may select an existing stream from the dropdown menu.
  7. After previewing your data, you can assign a name and an optional description to the source, and review the configuration. Upon clicking “Finish,” the stream data will be immediately available in the specified stream.

AutoMQ Kafka Source Description

Please note:

  1. Currently, we support messages in AutoMQ Kafka topics in JSON and AVRO formats.
  2. The topic level JSON attributes will be converted to stream columns. For nested attributes, the element will be saved as a String column and later you can query them with one of the JSON functions.
  3. Values in number or boolean types in the JSON message will be converted to corresponding types in the stream.
  4. Datetime or timestamp will be saved as a String column. You can convert them back to DateTime via to_time function.