Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-51842

Unreliable classloading chain

XMLWordPrintable

    • GSoC - Coding Phase 3

      Remoting Kafka Agent classloading breaks in some cases:

      1) Remoting class fetching is asynchronous operation in the Channel instance. If something fails, classloading for the class will not be retried until the channel is reset
      2) Remoting Kafka Agent holds the connection Channel, so it will stay waiting for classloading forever, even if the master receives command and then fails to respond
      3) One of the response failures I see in debug (Classloader fetch3 command was received, but the response didn't get sent back):

      INFO: [Consumer clientId=consumer-1, groupId=hello] Setting newly assigned partitions [test2-localhost-8080-connect-0]
      Jun 10, 2018 1:53:38 PM io.jenkins.plugins.remotingkafka.commandtransport.KafkaClassicCommandTransport read
      INFO: Received a command: Unexport
      Jun 10, 2018 1:53:45 PM io.jenkins.plugins.remotingkafka.commandtransport.KafkaClassicCommandTransport read
      INFO: Received a command: RPCRequest:hudson.remoting.RemoteClassLoader$IClassLoader.fetch3[java.lang.String](2)
      Jun 10, 2018 1:53:55 PM org.apache.kafka.clients.consumer.internals.AbstractCoordinator markCoordinatorUnknown
      INFO: [Consumer clientId=consumer-1, groupId=hello] Group coordinator 127.0.0.1:9092 (id: 2147482646 rack: null) is unavailable or invalid, will attempt rediscovery
      Jun 10, 2018 1:53:55 PM org.apache.kafka.clients.consumer.internals.AbstractCoordinator$FindCoordinatorResponseHandler onSuccess
      INFO: [Consumer clientId=consumer-1, groupId=hello] Discovered group coordinator 127.0.0.1:9092 (id: 2147482646 rack: null)
      Jun 10, 2018 1:53:55 PM org.apache.kafka.clients.consumer.internals.AbstractCoordinator markCoordinatorUnknown
      INFO: [Consumer clientId=consumer-1, groupId=hello] Group coordinator 127.0.0.1:9092 (id: 2147482646 rack: null) is unavailable or invalid, will attempt rediscovery
      Jun 10, 2018 1:53:55 PM io.jenkins.plugins.remotingkafka.commandtransport.KafkaClassicCommandTransport write
      INFO: Sent a command=Response:RPCRequest:hudson.remoting.RemoteClassLoader$IClassLoader.fetch3[java.lang.String](2)(java.util.HashMap), in topic=localhost-8080-test2-connect, with key=launch
      Jun 10, 2018 1:53:56 PM org.apache.kafka.clients.consumer.internals.AbstractCoordinator$FindCoordinatorResponseHandler onSuccess
      INFO: [Consumer clientId=consumer-1, groupId=hello] Discovered group coordinator 127.0.0.1:9092 (id: 2147482646 rack: null)
      Jun 10, 2018 1:53:56 PM org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler handle
      SEVERE: [Consumer clientId=consumer-1, groupId=hello] Offset commit failed on partition test2-localhost-8080-connect-0 at offset 31: The coordinator is not aware of this member.
      Jun 10, 2018 1:53:56 PM hudson.remoting.SynchronousCommandTransport$ReaderThread run
      SEVERE: Unexpected error in channel test2
      org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
      	at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:775)
      	at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:726)
      	at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:822)
      	at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:802)
      	at org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:204)
      	at org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:167)
      	at org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:127)
      	at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:563)
      	at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:390)
      	at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:293)
      	at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233)
      	at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:209)
      	at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:597)
      	at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1218)
      	at io.jenkins.plugins.remotingkafka.commandtransport.KafkaClassicCommandTransport.read(KafkaClassicCommandTransport.java:84)
      	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
      
      Jun 10, 2018 1:53:56 PM hudson.remoting.SynchronousCommandTransport$ReaderThread lambda$new$0
      SEVERE: Uncaught exception in SynchronousCommandTransport.ReaderThread Thread[Channel reader thread: test2,5,main]
      org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
      	at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:713)
      	at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:596)
      	at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1218)
      	at io.jenkins.plugins.remotingkafka.commandtransport.KafkaClassicCommandTransport.closeRead(KafkaClassicCommandTransport.java:67)
      	at hudson.remoting.Channel.terminate(Channel.java:1031)
      	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:99)
      

            pvtuan10 Pham Vu Tuan
            oleg_nenashev Oleg Nenashev
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: