Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-53397

ChannelClosedException when using kafka nodes

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved (View Workflow)
    • Priority: Minor
    • Resolution: Fixed
    • Component/s: remoting-kafka-plugin
    • Labels:
      None
    • Environment:
      Jenkins Instance (2.135) with 5 agents all connected using Kafka (no security)
      remoting Kafka plugin 1.1, built with remoting 3.23
    • Similar Issues:

      Description

      Using Remoting Kafka Plugin to version 1.1 (changed the pom file to use the remoting.version=3.23 to match the version on my set up) 

      I've set a Jenkins instance with 5 nodes all connected with kafka.  It did run jobs for about 3 days with a considerable load without failures. But today I had the following failure (full log attached):

       

      Error when executing always post condition:hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on daybreakk failed. The channel is closing down or has closed down 
      

       

      The process that I started in the kafka agent with:

       

      > java -jar remoting-kafka-agent.jar -name daybreakk -master http://jenkins:8081/ -secret 6e6c09c25550bd721cef2efba02bfbd54d6306fc711c2482450abb5ee454cd7f -kafkaURL kafka.al.com.au:9093 -noauth

       

      exited without any log.

      When I restarted the process, after all the kafka initial logs it printed: (full log attached) 

       

      Sep 04, 2018 8:44:47 AM hudson.remoting.ExportTable unexportByOidSEVERE: Trying to unexport an object that's already unexportedjava.util.concurrent.ExecutionException: Invalid object ID 750 iota=2 at hudson.remoting.ExportTable.diagnoseInvalidObjectId(ExportTable.java:478) at hudson.remoting.ExportTable.unexportByOid(ExportTable.java:516) at hudson.remoting.Channel.unexport(Channel.java:803) at hudson.remoting.Channel.unexport(Channel.java:793) at hudson.remoting.UnexportCommand.execute(UnexportCommand.java:49) at hudson.remoting.Channel$1.handle(Channel.java:565) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:87)
      Sep 04, 2018 8:44:47 AM hudson.remoting.ExportTable unexportByOidSEVERE: 2nd unexport attempt is hereCommand Unexport created at at hudson.remoting.Command.<init>(Command.java:79) at hudson.remoting.Command.<init>(Command.java:62) at hudson.remoting.UnexportCommand.<init>(UnexportCommand.java:35) at hudson.remoting.RemoteInvocationHandler$PhantomReferenceImpl.cleanup(RemoteInvocationHandler.java:393) at hudson.remoting.RemoteInvocationHandler$PhantomReferenceImpl.access$1000(RemoteInvocationHandler.java:352) at hudson.remoting.RemoteInvocationHandler$Unexporter.run(RemoteInvocationHandler.java:610) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at hudson.remoting.AtmostOneThreadExecutor$Worker.run(AtmostOneThreadExecutor.java:112) at java.lang.Thread.run(Thread.java:745)Caused by: java.lang.Exception: Proxy hudson.remoting.RemoteInvocationHandler@2ee was created for interface hudson.plugins.git.IGitAPI at hudson.remoting.RemoteInvocationHandler.<init>(RemoteInvocationHandler.java:142) at hudson.remoting.RemoteInvocationHandler.wrap(RemoteInvocationHandler.java:161) at hudson.remoting.Channel.export(Channel.java:768) at hudson.remoting.Channel.export(Channel.java:731) at org.jenkinsci.plugins.gitclient.LegacyCompatibleGitAPIImpl.writeReplace(LegacyCompatibleGitAPIImpl.java:198) at sun.reflect.GeneratedMethodAccessor64.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.io.ObjectStreamClass.invokeWriteReplace(ObjectStreamClass.java:1218) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1136) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) at hudson.remoting.UserRequest._serialize(UserRequest.java:264) at hudson.remoting.UserRequest.serialize(UserRequest.java:273) at hudson.remoting.UserRequest.perform(UserRequest.java:223) at hudson.remoting.UserRequest.perform(UserRequest.java:54) at hudson.remoting.Request$2.run(Request.java:369) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at io.jenkins.plugins.remotingkafka.Engine$1.lambda$newThread$0(Engine.java:47) at java.lang.Thread.run(Thread.java:748)
      ...
      ...
      ...
      Sep 04, 2018 8:44:47 AM hudson.remoting.Request$2 runINFO: Failed to send back a reply to the request hudson.remoting.Request$2@7da8bc0d: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@51a38854:daybreakk": channel is already closed[Thread-2] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka version : 1.1.0[Thread-2] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka commitId : fdcf75ea326b8e07

      it printed this repeatedly and it successfully connected at some point.

       

      There were no meaningful logs in the support folder of the Remote root directory (full logs attached)

       

      My expectation was that even if we had an exception like this, the job will not fail due to this cause, and keep waiting until the node gets reconnected and in the worse case, fail due to a timeout.

       

       

        Attachments

          Activity

          Hide
          fnaum Federico Naum added a comment -

          having problems uploading the logs. Will upload them soon

          Show
          fnaum Federico Naum added a comment - having problems uploading the logs. Will upload them soon
          Hide
          oleg_nenashev Oleg Nenashev added a comment -

          The unexport pattern looks similar to https://issues.jenkins-ci.org/browse/JENKINS-42533?focusedCommentId=294094&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-294094 . It should be fixed by upgrading Remoting to 3.26. CC Jeff Thompson just in case.

          Pham Vu Tuan my recommendation would be to update Remoting and spin another release. I am not sure it fixes the root cause of this issue, but at least it will prevent known issues.

           

          Show
          oleg_nenashev Oleg Nenashev added a comment - The unexport pattern looks similar to https://issues.jenkins-ci.org/browse/JENKINS-42533?focusedCommentId=294094&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-294094 . It should be fixed by upgrading Remoting to 3.26. CC Jeff Thompson just in case. Pham Vu Tuan my recommendation would be to update Remoting and spin another release. I am not sure it fixes the root cause of this issue, but at least it will prevent known issues.  
          Hide
          fnaum Federico Naum added a comment -

          I'll  try building the kafka plugin against remoting  3.26 and updating Jenkins to 2.141

          Show
          fnaum Federico Naum added a comment - I'll  try building the kafka plugin against remoting  3.26 and updating Jenkins to 2.141
          Hide
          pvtuan10 Pham Vu Tuan added a comment -

          Thanks for reporting Federico Naum! As Oleg Nenashev suggested, I will try to update Jenkins and remoting version for the plugin first to see if you are still facing any similar issue.

          Show
          pvtuan10 Pham Vu Tuan added a comment - Thanks for reporting Federico Naum ! As Oleg Nenashev suggested, I will try to update Jenkins and remoting version for the plugin first to see if you are still facing any similar issue.
          Hide
          jthompson Jeff Thompson added a comment -

          As Oleg mentions, upgrading to Remoting 3.26 will probably clear up that log message, "Trying to unexport an object that's already unexported". That's a good first step. There are still probably other issues after that.

          Show
          jthompson Jeff Thompson added a comment - As Oleg mentions, upgrading to Remoting 3.26 will probably clear up that log message, "Trying to unexport an object that's already unexported". That's a good first step. There are still probably other issues after that.
          Hide
          pvtuan10 Pham Vu Tuan added a comment -

          yes, merged the PR https://github.com/jenkinsci/remoting-kafka-plugin/pull/43, I will try to release soon by the end of this week.

          Show
          pvtuan10 Pham Vu Tuan added a comment - yes, merged the PR https://github.com/jenkinsci/remoting-kafka-plugin/pull/43,  I will try to release soon by the end of this week.
          Hide
          fnaum Federico Naum added a comment -

          Hi Pham Vu Tuan, thanks for the quick turn around. 

          I do not know much about jenkins/remoting dependencies, but I noticed in your PR that you have jenkins.version=2.129 , and the remoting version 3.26 is only bundled on Jenkins 2.141 (https://jenkins.io/changelog/)  Shouldn't this need to be updated as well?

          F

          Show
          fnaum Federico Naum added a comment - Hi Pham Vu Tuan , thanks for the quick turn around.  I do not know much about jenkins/remoting dependencies, but I noticed in your PR that you have jenkins.version=2.129 , and the remoting version 3.26 is only bundled on Jenkins 2.141 ( https://jenkins.io/changelog/)   Shouldn't this need to be updated as well? F
          Hide
          fnaum Federico Naum added a comment -

          Disregard my last comment, I just saw Oleg Nenashev comments on the PR

          Show
          fnaum Federico Naum added a comment - Disregard my last comment, I just saw Oleg Nenashev comments on the PR
          Hide
          fnaum Federico Naum added a comment -

          Just wanted to mention that since I build that against remoting 3.26 I haven't seen this issue again
           

          Show
          fnaum Federico Naum added a comment - Just wanted to mention that since I build that against remoting 3.26 I haven't seen this issue again  
          Hide
          pvtuan10 Pham Vu Tuan added a comment -

          Thanks for trying it Federico Naum, last time I upgraded remoting 3.26 in my machine it has a strange exception, I will debug it and spin a release to fix these problems soon.

          Show
          pvtuan10 Pham Vu Tuan added a comment - Thanks for trying it Federico Naum , last time I upgraded remoting 3.26 in my machine it has a strange exception, I will debug it and spin a release to fix these problems soon.
          Hide
          jthompson Jeff Thompson added a comment -

          That's good news, Federico Naum.

          Pham Vu Tuan, I would be surprised at Remoting 3.26 causing a new exception. Let me know if you continue to see issues there.

          Show
          jthompson Jeff Thompson added a comment - That's good news, Federico Naum . Pham Vu Tuan , I would be surprised at Remoting 3.26 causing a new exception. Let me know if you continue to see issues there.
          Hide
          pvtuan10 Pham Vu Tuan added a comment -

          Hi Jeff Thompson, Im getting this error log when connecting master with kafka agent in remoting 3.26, which doesn't happen when I use 3.25.

          java.util.concurrent.CancellationExceptionjava.util.concurrent.CancellationException at java.util.concurrent.FutureTask.report(FutureTask.java:121) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at io.jenkins.plugins.remotingkafka.KafkaComputerLauncher.launch(KafkaComputerLauncher.java:119) at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:294) at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46) at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
          Oct 02, 2018 3:28:58 PM hudson.slaves.ChannelPinger installSEVERE: Failed to set up a ping for testjava.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.Request.call(Request.java:177) at hudson.remoting.Channel.call(Channel.java:954) at hudson.slaves.ChannelPinger.install(ChannelPinger.java:112) at hudson.slaves.ChannelPinger.preOnline(ChannelPinger.java:95) at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:660) at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:465) at io.jenkins.plugins.remotingkafka.KafkaComputerLauncher$1.call(KafkaComputerLauncher.java:88) at io.jenkins.plugins.remotingkafka.KafkaComputerLauncher$1.call(KafkaComputerLauncher.java:74) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
          

          Is there any special changed thing in 3.26 related to this?

          And cc Oleg Nenashev, I still can connect Kafka agent to master and do the build when having this exception, seem like another plugin issue

          Show
          pvtuan10 Pham Vu Tuan added a comment - Hi Jeff Thompson , Im getting this error log when connecting master with kafka agent in remoting 3.26, which doesn't happen when I use 3.25. java.util.concurrent.CancellationExceptionjava.util.concurrent.CancellationException at java.util.concurrent.FutureTask.report(FutureTask.java:121) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at io.jenkins.plugins.remotingkafka.KafkaComputerLauncher.launch(KafkaComputerLauncher.java:119) at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:294) at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46) at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang. Thread .run( Thread .java:748) Oct 02, 2018 3:28:58 PM hudson.slaves.ChannelPinger installSEVERE: Failed to set up a ping for testjava.lang.InterruptedException at java.lang. Object .wait(Native Method) at hudson.remoting.Request.call(Request.java:177) at hudson.remoting.Channel.call(Channel.java:954) at hudson.slaves.ChannelPinger.install(ChannelPinger.java:112) at hudson.slaves.ChannelPinger.preOnline(ChannelPinger.java:95) at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:660) at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:465) at io.jenkins.plugins.remotingkafka.KafkaComputerLauncher$1.call(KafkaComputerLauncher.java:88) at io.jenkins.plugins.remotingkafka.KafkaComputerLauncher$1.call(KafkaComputerLauncher.java:74) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang. Thread .run( Thread .java:748) Is there any special changed thing in 3.26 related to this? And cc Oleg Nenashev , I still can connect Kafka agent to master and do the build when having this exception, seem like another plugin issue
          Hide
          pvtuan10 Pham Vu Tuan added a comment -

          Btw, are you using this remoting-kafka-plugin in production? Federico Naum

          Show
          pvtuan10 Pham Vu Tuan added a comment - Btw, are you using this remoting-kafka-plugin in production? Federico Naum
          Hide
          pvtuan10 Pham Vu Tuan added a comment -

          Released version 1.1.1 with remoting 3.26, this should not happen again.

          Show
          pvtuan10 Pham Vu Tuan added a comment - Released version 1.1.1 with remoting 3.26, this should not happen again.
          Hide
          jthompson Jeff Thompson added a comment -

          Pham Vu Tuan I have no idea why you would see that stack trace. There shouldn't be anything in 3.26 that would cause it. But, there was another long-standing issue that showed up for someone in 3.26 so something may have tweaked some timing in Remoting.

          Looks like you must have gotten it figured out well enough to proceed, though. Great!

          Show
          jthompson Jeff Thompson added a comment - Pham Vu Tuan I have no idea why you would see that stack trace. There shouldn't be anything in 3.26 that would cause it. But, there was another long-standing issue that showed up for someone in 3.26 so something may have tweaked some timing in Remoting. Looks like you must have gotten it figured out well enough to proceed, though. Great!
          Hide
          fnaum Federico Naum added a comment -

          Hey Pham Vu Tuan, We have 2 different instances of Jenkins in production. Currently, I'm using it on the most contained, lower risk instance, only 5 nodes and 2 dozens of jobs. 

          I'm planning to move the other instance that has 12 nodes and 400 jobs soon.

           

          Show
          fnaum Federico Naum added a comment - Hey Pham Vu Tuan , We have 2 different instances of Jenkins in production. Currently, I'm using it on the most contained, lower risk instance, only 5 nodes and 2 dozens of jobs.  I'm planning to move the other instance that has 12 nodes and 400 jobs soon.  

            People

            Assignee:
            pvtuan10 Pham Vu Tuan
            Reporter:
            fnaum Federico Naum
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: