Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-14332

Repeated channel/timeout errors from Jenkins slave

    • Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Icon: Blocker Blocker
    • jenkins-1.509.4 with remoting-2.36
      ssh-slaves-1.2

      The issue appears on my custom build of the Jenkins core, but seems it could be reproduced on newest versions as well.

      We've experienced a network overloading, which has let to the exception in the PingThread on Jenkins master, which has closed the communication channel. However, the slave stills online and takes jobs, but any remote action fails (see logs above) => All scheduled builds fail with an error

      The issue affects ssh-slaves only:

      • Linux SSH slaves are "online", but all jobs on the fail with the error above
      • Windows services have reconnected automatically...
      • Windows JNLP slaves have reconnected as well

          [JENKINS-14332] Repeated channel/timeout errors from Jenkins slave

          Olivier Lamy created issue -

          Olivier Lamy added a comment -

          happened with recent version Jenkins ver. 1.475-SNAPSHOT stack trace:

          Started by timer
          Building remotely on ubuntu1 in workspace /home/jenkins/jenkins-slave/workspace/Qpid-Java-Java-Test-0.16
          hudson.util.IOException2: remote file operation failed: /home/jenkins/jenkins-slave/workspace/Qpid-Java-Java-Test-0.16 at hudson.remoting.Channel@22397ea8:ubuntu1
          	at hudson.FilePath.act(FilePath.java:838)
          	at hudson.FilePath.act(FilePath.java:824)
          	at hudson.FilePath.mkdirs(FilePath.java:890)
          	at hudson.model.AbstractProject.checkout(AbstractProject.java:1243)
          	at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:589)
          	at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:88)
          	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:494)
          	at hudson.model.Run.execute(Run.java:1488)
          	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
          	at hudson.model.ResourceController.execute(ResourceController.java:88)
          	at hudson.model.Executor.run(Executor.java:236)
          Caused by: hudson.remoting.ChannelClosedException: channel is already closed
          	at hudson.remoting.Channel.send(Channel.java:492)
          	at hudson.remoting.Request.call(Request.java:129)
          	at hudson.remoting.Channel.call(Channel.java:663)
          	at hudson.FilePath.act(FilePath.java:831)
          	... 10 more
          Caused by: java.io.IOException
          	at hudson.remoting.Channel.close(Channel.java:895)
          	at hudson.slaves.ChannelPinger$1.onDead(ChannelPinger.java:110)
          	at hudson.remoting.PingThread.ping(PingThread.java:114)
          	at hudson.remoting.PingThread.run(PingThread.java:81)
          Caused by: java.util.concurrent.TimeoutException: Ping started on 1341557279424 hasn't completed at 1341557519424
          	... 2 more
          Caused by: java.util.concurrent.TimeoutException
          	at hudson.remoting.Request$1.get(Request.java:275)
          	at hudson.remoting.Request$1.get(Request.java:210)
          	at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59)
          	at hudson.remoting.PingThread.ping(PingThread.java:107)
          	... 1 more
          Retrying after 10 seconds
          hudson.util.IOException2: remote file operation failed: /home/jenkins/jenkins-slave/workspace/Qpid-Java-Java-Test-0.16 at hudson.remoting.Channel@22397ea8:ubuntu1
          	at hudson.FilePath.act(FilePath.java:838)
          	at hudson.FilePath.act(FilePath.java:824)
          	at hudson.FilePath.mkdirs(FilePath.java:890)
          	at hudson.model.AbstractProject.checkout(AbstractProject.java:1243)
          	at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:589)
          	at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:88)
          	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:494)
          	at hudson.model.Run.execute(Run.java:1488)
          	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
          	at hudson.model.ResourceController.execute(ResourceController.java:88)
          	at hudson.model.Executor.run(Executor.java:236)
          Caused by: hudson.remoting.ChannelClosedException: channel is already closed
          	at hudson.remoting.Channel.send(Channel.java:492)
          	at hudson.remoting.Request.call(Request.java:129)
          	at hudson.remoting.Channel.call(Channel.java:663)
          	at hudson.FilePath.act(FilePath.java:831)
          	... 10 more
          Caused by: java.io.IOException
          	at hudson.remoting.Channel.close(Channel.java:895)
          	at hudson.slaves.ChannelPinger$1.onDead(ChannelPinger.java:110)
          	at hudson.remoting.PingThread.ping(PingThread.java:114)
          	at hudson.remoting.PingThread.run(PingThread.java:81)
          Caused by: java.util.concurrent.TimeoutException: Ping started on 1341557279424 hasn't completed at 1341557519424
          	... 2 more
          Caused by: java.util.concurrent.TimeoutException
          	at hudson.remoting.Request$1.get(Request.java:275)
          	at hudson.remoting.Request$1.get(Request.java:210)
          	at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59)
          	at hudson.remoting.PingThread.ping(PingThread.java:107)
          	... 1 more
          Retrying after 10 seconds
          hudson.util.IOException2: remote file operation failed: /home/jenkins/jenkins-slave/workspace/Qpid-Java-Java-Test-0.16 at hudson.remoting.Channel@22397ea8:ubuntu1
          	at hudson.FilePath.act(FilePath.java:838)
          	at hudson.FilePath.act(FilePath.java:824)
          	at hudson.FilePath.mkdirs(FilePath.java:890)
          	at hudson.model.AbstractProject.checkout(AbstractProject.java:1243)
          	at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:589)
          	at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:88)
          	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:494)
          	at hudson.model.Run.execute(Run.java:1488)
          	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
          	at hudson.model.ResourceController.execute(ResourceController.java:88)
          	at hudson.model.Executor.run(Executor.java:236)
          Caused by: hudson.remoting.ChannelClosedException: channel is already closed
          	at hudson.remoting.Channel.send(Channel.java:492)
          	at hudson.remoting.Request.call(Request.java:129)
          	at hudson.remoting.Channel.call(Channel.java:663)
          	at hudson.FilePath.act(FilePath.java:831)
          	... 10 more
          Caused by: java.io.IOException
          	at hudson.remoting.Channel.close(Channel.java:895)
          	at hudson.slaves.ChannelPinger$1.onDead(ChannelPinger.java:110)
          	at hudson.remoting.PingThread.ping(PingThread.java:114)
          	at hudson.remoting.PingThread.run(PingThread.java:81)
          Caused by: java.util.concurrent.TimeoutException: Ping started on 1341557279424 hasn't completed at 1341557519424
          	... 2 more
          Caused by: java.util.concurrent.TimeoutException
          	at hudson.remoting.Request$1.get(Request.java:275)
          	at hudson.remoting.Request$1.get(Request.java:210)
          	at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59)
          	at hudson.remoting.PingThread.ping(PingThread.java:107)
          	... 1 more
          Recording test results
          ERROR: Publisher hudson.tasks.junit.JUnitResultArchiver aborted due to exception
          hudson.remoting.ChannelClosedException: channel is already closed
          	at hudson.remoting.Channel.send(Channel.java:492)
          	at hudson.remoting.Request.call(Request.java:129)
          	at hudson.remoting.Channel.call(Channel.java:663)
          	at hudson.EnvVars.getRemote(EnvVars.java:202)
          	at hudson.model.Computer.getEnvironment(Computer.java:843)
          	at jenkins.model.CoreEnvironmentContributor.buildEnvironmentFor(CoreEnvironmentContributor.java:28)
          	at hudson.model.Run.getEnvironment(Run.java:1938)
          	at hudson.model.AbstractBuild.getEnvironment(AbstractBuild.java:843)
          	at hudson.tasks.junit.JUnitResultArchiver.perform(JUnitResultArchiver.java:131)
          	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
          	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:717)
          	at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:692)
          	at hudson.model.Build$BuildExecution.post2(Build.java:183)
          	at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:639)
          	at hudson.model.Run.execute(Run.java:1513)
          	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
          	at hudson.model.ResourceController.execute(ResourceController.java:88)
          	at hudson.model.Executor.run(Executor.java:236)
          Caused by: java.io.IOException
          	at hudson.remoting.Channel.close(Channel.java:895)
          	at hudson.slaves.ChannelPinger$1.onDead(ChannelPinger.java:110)
          	at hudson.remoting.PingThread.ping(PingThread.java:114)
          	at hudson.remoting.PingThread.run(PingThread.java:81)
          Caused by: java.util.concurrent.TimeoutException: Ping started on 1341557279424 hasn't completed at 1341557519424
          	... 2 more
          Caused by: java.util.concurrent.TimeoutException
          	at hudson.remoting.Request$1.get(Request.java:275)
          	at hudson.remoting.Request$1.get(Request.java:210)
          	at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59)
          	at hudson.remoting.PingThread.ping(PingThread.java:107)
          	... 1 more
          Finished: FAILURE
          

          I don't understand this:

          Caused by: java.util.concurrent.TimeoutException: Ping started on 1341557279424 hasn't completed at 1341557519424
          

          Olivier Lamy added a comment - happened with recent version Jenkins ver. 1.475-SNAPSHOT stack trace: Started by timer Building remotely on ubuntu1 in workspace /home/jenkins/jenkins-slave/workspace/Qpid-Java-Java-Test-0.16 hudson.util.IOException2: remote file operation failed: /home/jenkins/jenkins-slave/workspace/Qpid-Java-Java-Test-0.16 at hudson.remoting.Channel@22397ea8:ubuntu1 at hudson.FilePath.act(FilePath.java:838) at hudson.FilePath.act(FilePath.java:824) at hudson.FilePath.mkdirs(FilePath.java:890) at hudson.model.AbstractProject.checkout(AbstractProject.java:1243) at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:589) at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:88) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:494) at hudson.model.Run.execute(Run.java:1488) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:236) Caused by: hudson.remoting.ChannelClosedException: channel is already closed at hudson.remoting.Channel.send(Channel.java:492) at hudson.remoting.Request.call(Request.java:129) at hudson.remoting.Channel.call(Channel.java:663) at hudson.FilePath.act(FilePath.java:831) ... 10 more Caused by: java.io.IOException at hudson.remoting.Channel.close(Channel.java:895) at hudson.slaves.ChannelPinger$1.onDead(ChannelPinger.java:110) at hudson.remoting.PingThread.ping(PingThread.java:114) at hudson.remoting.PingThread.run(PingThread.java:81) Caused by: java.util.concurrent.TimeoutException: Ping started on 1341557279424 hasn't completed at 1341557519424 ... 2 more Caused by: java.util.concurrent.TimeoutException at hudson.remoting.Request$1.get(Request.java:275) at hudson.remoting.Request$1.get(Request.java:210) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59) at hudson.remoting.PingThread.ping(PingThread.java:107) ... 1 more Retrying after 10 seconds hudson.util.IOException2: remote file operation failed: /home/jenkins/jenkins-slave/workspace/Qpid-Java-Java-Test-0.16 at hudson.remoting.Channel@22397ea8:ubuntu1 at hudson.FilePath.act(FilePath.java:838) at hudson.FilePath.act(FilePath.java:824) at hudson.FilePath.mkdirs(FilePath.java:890) at hudson.model.AbstractProject.checkout(AbstractProject.java:1243) at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:589) at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:88) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:494) at hudson.model.Run.execute(Run.java:1488) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:236) Caused by: hudson.remoting.ChannelClosedException: channel is already closed at hudson.remoting.Channel.send(Channel.java:492) at hudson.remoting.Request.call(Request.java:129) at hudson.remoting.Channel.call(Channel.java:663) at hudson.FilePath.act(FilePath.java:831) ... 10 more Caused by: java.io.IOException at hudson.remoting.Channel.close(Channel.java:895) at hudson.slaves.ChannelPinger$1.onDead(ChannelPinger.java:110) at hudson.remoting.PingThread.ping(PingThread.java:114) at hudson.remoting.PingThread.run(PingThread.java:81) Caused by: java.util.concurrent.TimeoutException: Ping started on 1341557279424 hasn't completed at 1341557519424 ... 2 more Caused by: java.util.concurrent.TimeoutException at hudson.remoting.Request$1.get(Request.java:275) at hudson.remoting.Request$1.get(Request.java:210) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59) at hudson.remoting.PingThread.ping(PingThread.java:107) ... 1 more Retrying after 10 seconds hudson.util.IOException2: remote file operation failed: /home/jenkins/jenkins-slave/workspace/Qpid-Java-Java-Test-0.16 at hudson.remoting.Channel@22397ea8:ubuntu1 at hudson.FilePath.act(FilePath.java:838) at hudson.FilePath.act(FilePath.java:824) at hudson.FilePath.mkdirs(FilePath.java:890) at hudson.model.AbstractProject.checkout(AbstractProject.java:1243) at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:589) at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:88) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:494) at hudson.model.Run.execute(Run.java:1488) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:236) Caused by: hudson.remoting.ChannelClosedException: channel is already closed at hudson.remoting.Channel.send(Channel.java:492) at hudson.remoting.Request.call(Request.java:129) at hudson.remoting.Channel.call(Channel.java:663) at hudson.FilePath.act(FilePath.java:831) ... 10 more Caused by: java.io.IOException at hudson.remoting.Channel.close(Channel.java:895) at hudson.slaves.ChannelPinger$1.onDead(ChannelPinger.java:110) at hudson.remoting.PingThread.ping(PingThread.java:114) at hudson.remoting.PingThread.run(PingThread.java:81) Caused by: java.util.concurrent.TimeoutException: Ping started on 1341557279424 hasn't completed at 1341557519424 ... 2 more Caused by: java.util.concurrent.TimeoutException at hudson.remoting.Request$1.get(Request.java:275) at hudson.remoting.Request$1.get(Request.java:210) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59) at hudson.remoting.PingThread.ping(PingThread.java:107) ... 1 more Recording test results ERROR: Publisher hudson.tasks.junit.JUnitResultArchiver aborted due to exception hudson.remoting.ChannelClosedException: channel is already closed at hudson.remoting.Channel.send(Channel.java:492) at hudson.remoting.Request.call(Request.java:129) at hudson.remoting.Channel.call(Channel.java:663) at hudson.EnvVars.getRemote(EnvVars.java:202) at hudson.model.Computer.getEnvironment(Computer.java:843) at jenkins.model.CoreEnvironmentContributor.buildEnvironmentFor(CoreEnvironmentContributor.java:28) at hudson.model.Run.getEnvironment(Run.java:1938) at hudson.model.AbstractBuild.getEnvironment(AbstractBuild.java:843) at hudson.tasks.junit.JUnitResultArchiver.perform(JUnitResultArchiver.java:131) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:717) at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:692) at hudson.model.Build$BuildExecution.post2(Build.java:183) at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:639) at hudson.model.Run.execute(Run.java:1513) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:236) Caused by: java.io.IOException at hudson.remoting.Channel.close(Channel.java:895) at hudson.slaves.ChannelPinger$1.onDead(ChannelPinger.java:110) at hudson.remoting.PingThread.ping(PingThread.java:114) at hudson.remoting.PingThread.run(PingThread.java:81) Caused by: java.util.concurrent.TimeoutException: Ping started on 1341557279424 hasn't completed at 1341557519424 ... 2 more Caused by: java.util.concurrent.TimeoutException at hudson.remoting.Request$1.get(Request.java:275) at hudson.remoting.Request$1.get(Request.java:210) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59) at hudson.remoting.PingThread.ping(PingThread.java:107) ... 1 more Finished: FAILURE I don't understand this: Caused by: java.util.concurrent.TimeoutException: Ping started on 1341557279424 hasn't completed at 1341557519424
          Olivier Lamy made changes -
          Priority Original: Major [ 3 ] New: Blocker [ 1 ]

          The message looks ok to me. There is 360 seconds difference between the two timestamps which would be consistent with a timeout.

          However I have a feeling that this is due to some additional changes that Koshuke made when he merged my remoting pull-3. See

          https://github.com/jenkinsci/remoting/commit/aceaf5c038e8179723f9a83be6f994f37e7cfa46

          Previously the TimeoutException (ping failed) message would only have been observed if a TimeoutException was received. Now the message will be logged when the loop ends. Maybe there is a real timeout/delay here that was previously hidden.

          One other note. The while(remaining>0) gets evaluated after a timeout exception but remaining will only get recalculated on the next loop so there will be an extra pass around the loop.

          Richard Mortimer added a comment - The message looks ok to me. There is 360 seconds difference between the two timestamps which would be consistent with a timeout. However I have a feeling that this is due to some additional changes that Koshuke made when he merged my remoting pull-3. See https://github.com/jenkinsci/remoting/commit/aceaf5c038e8179723f9a83be6f994f37e7cfa46 Previously the TimeoutException (ping failed) message would only have been observed if a TimeoutException was received. Now the message will be logged when the loop ends. Maybe there is a real timeout/delay here that was previously hidden. One other note. The while(remaining>0) gets evaluated after a timeout exception but remaining will only get recalculated on the next loop so there will be an extra pass around the loop.

          Ah I take some of that back. I didn't check the main jenkins build closely enough. Jenkins is using remoting 1.16 and this does not include the latest changes made by Kohsuke/me.

          The PingThread logic in 1.475-SNAPSHOT is unchanged since Sept/2011

          Richard Mortimer added a comment - Ah I take some of that back. I didn't check the main jenkins build closely enough. Jenkins is using remoting 1.16 and this does not include the latest changes made by Kohsuke/me. The PingThread logic in 1.475-SNAPSHOT is unchanged since Sept/2011

          Similar symptoms at JENKINS-14307

          Richard Mortimer added a comment - Similar symptoms at JENKINS-14307

          Alexander Likulin added a comment - - edited

          The same problems happens on 1.477

          hudson.util.IOException2: remote file operation failed: /home/slave/.jenkins/slave/workspace/4.0-SNAPSHOT-POSTGRESQL2 at hudson.remoting.Channel@7e053ab4:slave
          at hudson.FilePath.act(FilePath.java:838)
          at hudson.FilePath.act(FilePath.java:824)
          at hudson.FilePath.mkdirs(FilePath.java:890)
          at hudson.model.AbstractProject.checkout(AbstractProject.java:1254)
          at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:589)
          at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:88)
          at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:494)
          at hudson.model.Run.execute(Run.java:1502)
          at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
          at hudson.model.ResourceController.execute(ResourceController.java:88)
          at hudson.model.Executor.run(Executor.java:236)
          Caused by: hudson.remoting.ChannelClosedException: channel is already closed
          at hudson.remoting.Channel.send(Channel.java:492)
          at hudson.remoting.Request.call(Request.java:129)
          at hudson.remoting.Channel.call(Channel.java:663)
          at hudson.FilePath.act(FilePath.java:831)
          ... 10 more
          Caused by: java.io.IOException
          at hudson.remoting.Channel.close(Channel.java:895)
          at hudson.slaves.ChannelPinger$1.onDead(ChannelPinger.java:110)
          at hudson.remoting.PingThread.ping(PingThread.java:114)
          at hudson.remoting.PingThread.run(PingThread.java:81)
          Caused by: java.util.concurrent.TimeoutException: Ping started on 1346241264458 hasn't completed at 1346241504458
          ... 2 more
          Caused by: java.util.concurrent.TimeoutException
          at hudson.remoting.Request$1.get(Request.java:275)
          at hudson.remoting.Request$1.get(Request.java:210)
          at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59)
          at hudson.remoting.PingThread.ping(PingThread.java:107)
          ... 1 more

          Alexander Likulin added a comment - - edited The same problems happens on 1.477 hudson.util.IOException2: remote file operation failed: /home/slave/.jenkins/slave/workspace/4.0-SNAPSHOT-POSTGRESQL2 at hudson.remoting.Channel@7e053ab4:slave at hudson.FilePath.act(FilePath.java:838) at hudson.FilePath.act(FilePath.java:824) at hudson.FilePath.mkdirs(FilePath.java:890) at hudson.model.AbstractProject.checkout(AbstractProject.java:1254) at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:589) at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:88) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:494) at hudson.model.Run.execute(Run.java:1502) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:236) Caused by: hudson.remoting.ChannelClosedException: channel is already closed at hudson.remoting.Channel.send(Channel.java:492) at hudson.remoting.Request.call(Request.java:129) at hudson.remoting.Channel.call(Channel.java:663) at hudson.FilePath.act(FilePath.java:831) ... 10 more Caused by: java.io.IOException at hudson.remoting.Channel.close(Channel.java:895) at hudson.slaves.ChannelPinger$1.onDead(ChannelPinger.java:110) at hudson.remoting.PingThread.ping(PingThread.java:114) at hudson.remoting.PingThread.run(PingThread.java:81) Caused by: java.util.concurrent.TimeoutException: Ping started on 1346241264458 hasn't completed at 1346241504458 ... 2 more Caused by: java.util.concurrent.TimeoutException at hudson.remoting.Request$1.get(Request.java:275) at hudson.remoting.Request$1.get(Request.java:210) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59) at hudson.remoting.PingThread.ping(PingThread.java:107) ... 1 more

          Julien Carsique added a comment - - edited

          Maybe not a Jenkins issue.

          Having the same issue, reproduced at every build for a given slave (but with nothing relevant in its logs), I tried to disconnect and reconnect the slave:

          [05/06/13 14:06:04] Launching slave agent
          $ ssh slavedns java -jar ~/bin/slave.jar
          <===[JENKINS REMOTING CAPACITY]===<===[JENKINS REMOTING CAPACITY]===>>channel started
          channel started
          Slave.jar version: 2.22
          This is a Unix slave
          Slave.jar version: 2.22
          This is a Unix slave
          Copied maven-agent.jar
          Copied maven3-agent.jar
          Copied maven3-interceptor.jar
          Copied maven-agent.jar
          Copied maven-interceptor.jar
          Copied maven2.1-interceptor.jar
          Copied plexus-classworld.jar
          Copied maven3-agent.jar
          Copied maven3-interceptor.jar
          Copied classworlds.jar
          Copied maven-interceptor.jar
          Copied maven2.1-interceptor.jar
          Copied plexus-classworld.jar
          Copied classworlds.jar
          Evacuated stdout
          Evacuated stdout
          ERROR: Unexpected error in launching a slave. This is probably a bug in Jenkins
          (...)java.lang.IllegalStateException: Already connected
          	at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:459)
          	at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:339)
          	at hudson.slaves.CommandLauncher.launch(CommandLauncher.java:122)
          	at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:222)
          	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
          	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
          	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
          	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
          	at java.lang.Thread.run(Thread.java:662)
          Connection terminated
          channel stopped
          ERROR: Unexpected error in launching a slave. This is probably a bug in Jenkins
          (...)java.lang.NullPointerException
          	at org.jenkinsci.modules.slave_installer.impl.ComputerListenerImpl.onOnline(ComputerListenerImpl.java:32)
          	at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:471)
          	at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:339)
          	at hudson.slaves.CommandLauncher.launch(CommandLauncher.java:122)
          	at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:222)
          	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
          	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
          	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
          	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
          	at java.lang.Thread.run(Thread.java:662)
          channel stopped
          Connection terminated

          Then the slaved successfully reconnected itself.

          It appeared there was looping thread consuming 100% CPU. Killing the process solved the issue.

          Strangely, some system and Java commands were not working (ps, cat, less, jstack, trace, ...) until it has been killed, whereas other commands worked (top, jps, renice, kill, ...). That could explain the weird Jenkins 4s timeout log (java.util.concurrent.TimeoutException: Ping started on 1367743028681 hasn't completed at 1367743268682): the system was partially frozen and with really unstable response times.

          Note the looping thread came from a previous job which badly stopped on timeout:

          03:00:16.868 Build timed out (after 180 minutes). Marking the build as aborted.
          03:00:16.873 Build was aborted
          03:00:16.874 Archiving artifacts
          03:00:16.874 ERROR: Failed to archive artifacts: **/log/*.log, tomcat*/nxserver/config/distribution.properties
          03:00:16.875 hudson.remoting.ChannelClosedException: channel is already closed
          03:00:16.876 	at hudson.remoting.Channel.send(Channel.java:494)
          03:00:16.876 	at hudson.remoting.Request.call(Request.java:129)
          03:00:16.876 	at hudson.remoting.Channel.call(Channel.java:672)
          03:00:16.876 	at hudson.EnvVars.getRemote(EnvVars.java:212)
          03:00:16.876 	at hudson.model.Computer.getEnvironment(Computer.java:882)
          03:00:16.876 	at jenkins.model.CoreEnvironmentContributor.buildEnvironmentFor(CoreEnvironmentContributor.java:28)
          03:00:16.876 	at hudson.model.Run.getEnvironment(Run.java:2028)
          03:00:16.876 	at hudson.model.AbstractBuild.getEnvironment(AbstractBuild.java:927)
          03:00:16.876 	at hudson.tasks.ArtifactArchiver.perform(ArtifactArchiver.java:115)
          03:00:16.876 	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
          03:00:16.876 	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:798)
          03:00:16.876 	at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:770)
          03:00:16.876 	at hudson.model.Build$BuildExecution.post2(Build.java:183)
          03:00:16.876 	at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:720)
          03:00:16.876 	at hudson.model.Run.execute(Run.java:1600)
          03:00:16.876 	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
          03:00:16.876 	at hudson.model.ResourceController.execute(ResourceController.java:88)
          03:00:16.876 	at hudson.model.Executor.run(Executor.java:237)
          03:00:16.876 Caused by: java.io.IOException
          03:00:16.876 	at hudson.remoting.Channel.close(Channel.java:910)
          03:00:16.876 	at hudson.slaves.ChannelPinger$1.onDead(ChannelPinger.java:110)
          03:00:16.876 	at hudson.remoting.PingThread.ping(PingThread.java:120)
          03:00:16.876 	at hudson.remoting.PingThread.run(PingThread.java:81)
          03:00:16.876 Caused by: java.util.concurrent.TimeoutException: Ping started on 1367743028681 hasn't completed at 1367743268682
          03:00:16.876 	... 2 more

          Julien Carsique added a comment - - edited Maybe not a Jenkins issue. Having the same issue, reproduced at every build for a given slave (but with nothing relevant in its logs), I tried to disconnect and reconnect the slave: [05/06/13 14:06:04] Launching slave agent $ ssh slavedns java -jar ~/bin/slave.jar <===[JENKINS REMOTING CAPACITY]===<===[JENKINS REMOTING CAPACITY]===>>channel started channel started Slave.jar version: 2.22 This is a Unix slave Slave.jar version: 2.22 This is a Unix slave Copied maven-agent.jar Copied maven3-agent.jar Copied maven3-interceptor.jar Copied maven-agent.jar Copied maven-interceptor.jar Copied maven2.1-interceptor.jar Copied plexus-classworld.jar Copied maven3-agent.jar Copied maven3-interceptor.jar Copied classworlds.jar Copied maven-interceptor.jar Copied maven2.1-interceptor.jar Copied plexus-classworld.jar Copied classworlds.jar Evacuated stdout Evacuated stdout ERROR: Unexpected error in launching a slave. This is probably a bug in Jenkins (...)java.lang.IllegalStateException: Already connected at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:459) at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:339) at hudson.slaves.CommandLauncher.launch(CommandLauncher.java:122) at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:222) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang. Thread .run( Thread .java:662) Connection terminated channel stopped ERROR: Unexpected error in launching a slave. This is probably a bug in Jenkins (...)java.lang.NullPointerException at org.jenkinsci.modules.slave_installer.impl.ComputerListenerImpl.onOnline(ComputerListenerImpl.java:32) at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:471) at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:339) at hudson.slaves.CommandLauncher.launch(CommandLauncher.java:122) at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:222) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang. Thread .run( Thread .java:662) channel stopped Connection terminated Then the slaved successfully reconnected itself. It appeared there was looping thread consuming 100% CPU. Killing the process solved the issue. Strangely, some system and Java commands were not working (ps, cat, less, jstack, trace, ...) until it has been killed, whereas other commands worked (top, jps, renice, kill, ...). That could explain the weird Jenkins 4s timeout log ( java.util.concurrent.TimeoutException: Ping started on 1367743028681 hasn't completed at 1367743268682 ): the system was partially frozen and with really unstable response times. Note the looping thread came from a previous job which badly stopped on timeout: 03:00:16.868 Build timed out (after 180 minutes). Marking the build as aborted. 03:00:16.873 Build was aborted 03:00:16.874 Archiving artifacts 03:00:16.874 ERROR: Failed to archive artifacts: **/log /*.log, tomcat*/ nxserver/config/distribution.properties 03:00:16.875 hudson.remoting.ChannelClosedException: channel is already closed 03:00:16.876 at hudson.remoting.Channel.send(Channel.java:494) 03:00:16.876 at hudson.remoting.Request.call(Request.java:129) 03:00:16.876 at hudson.remoting.Channel.call(Channel.java:672) 03:00:16.876 at hudson.EnvVars.getRemote(EnvVars.java:212) 03:00:16.876 at hudson.model.Computer.getEnvironment(Computer.java:882) 03:00:16.876 at jenkins.model.CoreEnvironmentContributor.buildEnvironmentFor(CoreEnvironmentContributor.java:28) 03:00:16.876 at hudson.model.Run.getEnvironment(Run.java:2028) 03:00:16.876 at hudson.model.AbstractBuild.getEnvironment(AbstractBuild.java:927) 03:00:16.876 at hudson.tasks.ArtifactArchiver.perform(ArtifactArchiver.java:115) 03:00:16.876 at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) 03:00:16.876 at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:798) 03:00:16.876 at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:770) 03:00:16.876 at hudson.model.Build$BuildExecution.post2(Build.java:183) 03:00:16.876 at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:720) 03:00:16.876 at hudson.model.Run.execute(Run.java:1600) 03:00:16.876 at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) 03:00:16.876 at hudson.model.ResourceController.execute(ResourceController.java:88) 03:00:16.876 at hudson.model.Executor.run(Executor.java:237) 03:00:16.876 Caused by: java.io.IOException 03:00:16.876 at hudson.remoting.Channel.close(Channel.java:910) 03:00:16.876 at hudson.slaves.ChannelPinger$1.onDead(ChannelPinger.java:110) 03:00:16.876 at hudson.remoting.PingThread.ping(PingThread.java:120) 03:00:16.876 at hudson.remoting.PingThread.run(PingThread.java:81) 03:00:16.876 Caused by: java.util.concurrent.TimeoutException: Ping started on 1367743028681 hasn't completed at 1367743268682 03:00:16.876 ... 2 more

          Oleg Nenashev added a comment - - edited

          Ping Monitoring timeout should be configurable at least...
          Issue is being reproduced in 1.509.2

          Oleg Nenashev added a comment - - edited Ping Monitoring timeout should be configurable at least... Issue is being reproduced in 1.509.2

          Having the similar issue. Slave is reconnected during the archiving of artifacts, when Jenkins is under load.
          Jenkins 1.543.

          Nickolay Rumyantsev added a comment - Having the similar issue. Slave is reconnected during the archiving of artifacts, when Jenkins is under load. Jenkins 1.543.

            ifernandezcalvo Ivan Fernandez Calvo
            olamy Olivier Lamy
            Votes:
            33 Vote for this issue
            Watchers:
            51 Start watching this issue

              Created:
              Updated:
              Resolved: