-
Bug
-
Resolution: Cannot Reproduce
-
Minor
-
None
-
-
GSoC - Coding Phase 3
Remoting Kafka Agent classloading breaks in some cases:
1) Remoting class fetching is asynchronous operation in the Channel instance. If something fails, classloading for the class will not be retried until the channel is reset
2) Remoting Kafka Agent holds the connection Channel, so it will stay waiting for classloading forever, even if the master receives command and then fails to respond
3) One of the response failures I see in debug (Classloader fetch3 command was received, but the response didn't get sent back):
INFO: [Consumer clientId=consumer-1, groupId=hello] Setting newly assigned partitions [test2-localhost-8080-connect-0] Jun 10, 2018 1:53:38 PM io.jenkins.plugins.remotingkafka.commandtransport.KafkaClassicCommandTransport read INFO: Received a command: Unexport Jun 10, 2018 1:53:45 PM io.jenkins.plugins.remotingkafka.commandtransport.KafkaClassicCommandTransport read INFO: Received a command: RPCRequest:hudson.remoting.RemoteClassLoader$IClassLoader.fetch3[java.lang.String](2) Jun 10, 2018 1:53:55 PM org.apache.kafka.clients.consumer.internals.AbstractCoordinator markCoordinatorUnknown INFO: [Consumer clientId=consumer-1, groupId=hello] Group coordinator 127.0.0.1:9092 (id: 2147482646 rack: null) is unavailable or invalid, will attempt rediscovery Jun 10, 2018 1:53:55 PM org.apache.kafka.clients.consumer.internals.AbstractCoordinator$FindCoordinatorResponseHandler onSuccess INFO: [Consumer clientId=consumer-1, groupId=hello] Discovered group coordinator 127.0.0.1:9092 (id: 2147482646 rack: null) Jun 10, 2018 1:53:55 PM org.apache.kafka.clients.consumer.internals.AbstractCoordinator markCoordinatorUnknown INFO: [Consumer clientId=consumer-1, groupId=hello] Group coordinator 127.0.0.1:9092 (id: 2147482646 rack: null) is unavailable or invalid, will attempt rediscovery Jun 10, 2018 1:53:55 PM io.jenkins.plugins.remotingkafka.commandtransport.KafkaClassicCommandTransport write INFO: Sent a command=Response:RPCRequest:hudson.remoting.RemoteClassLoader$IClassLoader.fetch3[java.lang.String](2)(java.util.HashMap), in topic=localhost-8080-test2-connect, with key=launch Jun 10, 2018 1:53:56 PM org.apache.kafka.clients.consumer.internals.AbstractCoordinator$FindCoordinatorResponseHandler onSuccess INFO: [Consumer clientId=consumer-1, groupId=hello] Discovered group coordinator 127.0.0.1:9092 (id: 2147482646 rack: null) Jun 10, 2018 1:53:56 PM org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler handle SEVERE: [Consumer clientId=consumer-1, groupId=hello] Offset commit failed on partition test2-localhost-8080-connect-0 at offset 31: The coordinator is not aware of this member. Jun 10, 2018 1:53:56 PM hudson.remoting.SynchronousCommandTransport$ReaderThread run SEVERE: Unexpected error in channel test2 org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records. at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:775) at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:726) at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:822) at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:802) at org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:204) at org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:167) at org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:127) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:563) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:390) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:293) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:209) at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:597) at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1218) at io.jenkins.plugins.remotingkafka.commandtransport.KafkaClassicCommandTransport.read(KafkaClassicCommandTransport.java:84) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63) Jun 10, 2018 1:53:56 PM hudson.remoting.SynchronousCommandTransport$ReaderThread lambda$new$0 SEVERE: Uncaught exception in SynchronousCommandTransport.ReaderThread Thread[Channel reader thread: test2,5,main] org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records. at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:713) at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:596) at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1218) at io.jenkins.plugins.remotingkafka.commandtransport.KafkaClassicCommandTransport.closeRead(KafkaClassicCommandTransport.java:67) at hudson.remoting.Channel.terminate(Channel.java:1031) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:99)