Linux – Kafka multiple listeners

amazon-web-serviceskafkalinux

Initial apologies for the long post (this is also on superuser as i wasn't sure the best place for this https://superuser.com/questions/1404421/kafka-multiple-listeners so let me know if one needs closing)…

I have setup a Kafka cluster in AWS with the following listeners and advertised listeners:

KAFKA_ADVERTISED_LISTENERS:           PLAINTEXT://ds-kafka-broker0.service.local:9092,INTERNAL://:9093,PRIVATE://ds-kafka-broker0.private.awscloud.co.uk:6000,EXTERNAL://ds-kafka-broker0.dev.awscloud.co.uk:7000
KAFKA_LISTENERS:                      PLAINTEXT://:9092,INTERNAL://:9093,PRIVATE://:6000,EXTERNAL://:7000
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,INTERNAL:PLAINTEXT,PRIVATE:PLAINTEXT,EXTERNAL:PLAINTEXT
KAFKA_INTER_BROKER_LISTENER_NAME:     INTERNAL

I am having to do this because we run a AWS/On-Prem hybrid environment over direct connect.

Within AWS:

We use VPCE's for connectivity to the Kafka cluster within accounts so the host for the PRIVATE listener is the same for every account and uses a local private R53 zone.

On-Prem:

The private zone does not work with on-prem because we cannot resolve the private.awscloud.co.uk zones that sit in every AWS account so i have to use another zone which in Kafka forces me to use another listener and port range. This is the EXTERNAL listener.

They are all currently using plaintext as i am still in the implementation stage but both will eventually use the same encryption but for my current connectivity testing purposes, this should work. As a running cluster with all the accessories (connect, ksql, schema, etc) all works fine from within the Kafka cluster AWS account.

The problem:

When i connect to the EXTERNAL ports using the following producer.config settings:

bootstrap.servers=EXTERNAL://ds-kafka-broker0.dev.awscloud.co.uk:7000,EXTERNAL://ds-kafka-broker1.dev.awscloud.co.uk:7001,EXTERNAL://ds-kafka-broker2.dev.awscloud.co.uk:7002
#security.protocol=EXTERNAL   # commented out as this is not valid in console producer
compression.type=snappy
max.block.ms=5000
linger.ms=5
max.in.flight.requests.per.connection=1
retries=5
batch.size=1000
max.request.size=10000000
acks=1
buffer.memory=67108864

and use the following test console producer command line:

bin/kafka-console-producer --producer.config etc/producer.properties --topic test-create-remote --broker-list EXTERNAL://ds-kafka-broker0.dev.awscloud.co.uk:7000,EXTERNAL://ds-kafka-broker1.dev.awscloud.co.uk:7001,EXTERNAL://ds-kafka-broker2.dev.awscloud.co.uk:7002

the initial connection occurs on 7000 but then Kafka reports back to the client that it should be using the PRIVATE listener and the traffic reconnects onto the 6000 Private port range (confirmed with tcpdump).

This is fine when you are connecting in from within an AWS account as that is what this port range and listener is for but from a client perspective i don't seem to have any control of which listener to use. In this case the connection from on-prem fails because i cannot resolve the PRIVATE address and even if i could, i couldn't connect on the port anyway.

This also makes me wonder why i am getting the PRIVATE listener… Why not the INTERNAL or PLAINTEXT ones if i have no control?

Hope all this makes sense and any pointers appreciated.

Best Answer

Just thought i would post my solution for this. This was nothing to do with the Kafka configuration!

This was running on AWS ECS(EC2, not Fargate) and as there is currently a limitation of 1 target group per task so 1 target group was used in the background for both listeners (6000 & 7000). This target group was the 6000 port so it was translating 7000 to 6000, hence me always getting back the same listener.

This blog post (https://rmoff.net/2018/08/02/kafka-listeners-explained/) was quite helpful but didn't go far enough to cover my problem but there was 1 key quote within it that helped:

When connecting to a broker, the listener that will be returned to the client will be the listener to which you connected (based on the port).

Then when i was talking the problem through with someone and i was talking about the single load balancer when i had a light bulb moment... Now on port dedicated target groups and all is well.

Related Topic