Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kafka-node does not recover from UnknownTopicOrPartition error (when topic has been reassigned to a different broker) #319

Closed
hyperlink opened this issue Feb 1, 2016 · 5 comments
Assignees
Labels

Comments

@hyperlink
Copy link
Collaborator

This issue happened in our production environment running version 0.2.29. Devops tried to balance the load across our kafka cluster by moving some topics around. I would expect this also to be an issue in 0.3.1 as well.

Test setup

Please see issue #277 for details on how to set up your environment for this test.

Update docker-compose.yml:

zookeeper:
  image: hyperlink/zookeeper
  privileged: true
  ports:
    - "2181:2181"
kafka:
  image: wurstmeister/kafka:0.9.0.0
  ports:
    - "9092"
  links:
    - zookeeper:zk
  environment:
    KAFKA_ADVERTISED_HOST_NAME: 192.168.99.100
    KAFKA_CREATE_TOPICS: "KafkaConnectivityTest:1:1"
  volumes:
    - /var/run/docker.sock:/var/run/docker.sock
  privileged: true

Add move.json

{"version":1,"partitions":[{"topic":"KafkaConnectivityTest","partition":0,"replicas":[1002]}]}

Steps to reproduce

  1. Start docker containers
  2. Run ./kafkatest.js
  3. Send a test message through the console producer and verify it's echoed in the consumer
  4. Scale up kafka docker-compose scale kafka=3
  5. Run ./kafka-topics.sh --zookeeper 192.168.99.100:2181 --topic KafkaConnectivityTest --describe to verify the topic is currently not on leader you are changing to otherwise you will need to update the replicas field in move.json to a different broker
  6. Change the topic to a different broker by running ./kafka-reassign-partitions.sh --zookeeper 192.168.99.100:2181 --reassignment-json-file move.json --execute
  7. You will see the following repeated error echoed:
an error happened { topic: 'KafkaConnectivityTest',
  partition: 0,
  message: 'UnknownTopicOrPartition' }
  1. Send a test message through the console producer. We expect the test message to echo however it does not.

Comments

According to the error description on the kafka protocol page This request is for a topic or partition that does not exist on this broker. This appears to be recoverable error. I'm currently in the process of working on a PR to fix this issue.

hyperlink added a commit to hyperlink/kafka-node that referenced this issue Feb 2, 2016
hyperlink added a commit to hyperlink/kafka-node that referenced this issue Feb 2, 2016
@mingfang
Copy link

This problem also happens when I manually delete a topic.

@mingfang
Copy link

@hyperlink Even though your code does prevent kakfa-node from going into an endless error loop, but for some reason it actually prevents a topic from being deleted on the broker. I'm using kafka-manager to manually delete topics.

@itamarwe
Copy link

itamarwe commented Sep 8, 2016

It seems like the fix only handles the fetch function.
How about fixing it in the sendProduceRequest as well?

I'm encountering this problem when reassigning partitions. When I'm producing I get the UnknownTopicOrPartition error and no brokersChanged event is firing.
@hyperlink

@hyperlink
Copy link
Collaborator Author

@itamarwe do you mind create an issue for this? Also PRs are welcome if you want to fix this.

@yunnysunny
Copy link

I encountered this problem too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants