• Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • core, remoting
    • Jenkins 1.558 / Centos5 slaves

      I observe that jobs starting at the same time the slave is performing some sort of cleanup action hang:

      Build Log:

      17:04:21 Started by upstream project "pipeline-test-2" build number 422
      17:04:21 originally caused by:
      17:04:21  Started by upstream project "master-test" build number 867
      17:04:21  originally caused by:
      17:04:21   Started by upstream project "master-build" build number 5131
      17:04:21   originally caused by:
      17:04:21    Started by an SCM change
      17:04:21 [EnvInject] - Loading node environment variables.
      ... hang for 14h ...
      

      Slave Log:

      ... lots of output ...
      INFO: Deleting /tmp/1408400946520-0/jboss51x_centos5-x64-bld-slave-09.corp.XXX.com_10002/lib/jboss-j2se.jar (atime=1408400947, diff=-5424)
      Aug 25, 2014 5:01:12 PM hudson.plugins.tmpcleaner.TmpCleanTask visit
      INFO: Deleting /tmp/1408400946520-0/jboss51x_centos5-x64-bld-slave-09.corp.XXX.com_10002/lib/javassist.jar (atime=1408400947, diff=-5424)
      Aug 25, 2014 5:01:12 PM hudson.plugins.tmpcleaner.TmpCleanTask visit
      INFO: Deleting /tmp/1408400946520-0/jboss51x_centos5-x64-bld-slave-09.corp.XXX.com_10002/lib/jboss-system-jmx.jar (atime=1408400947, diff=-5424)
      Aug 25, 2014 5:01:12 PM hudson.plugins.tmpcleaner.TmpCleanTask visit
      INFO: Deleting /tmp/1408400946520-0/jboss51x_centos5-x64-bld-slave-09.corp.XXX.com_10002/lib/jboss-system.jar (atime=1408400947, diff=-5424)
      Aug 25, 2014 5:01:12 PM hudson.plugins.tmpcleaner.TmpCleanTask visit
      INFO: Deleting /tmp/1408400946520-0/jboss51x_centos5-x64-bld-slave-09.corp.XXX.com_10002/lib/jboss-mdr.jar (atime=1408400947, diff=-5424)
      Aug 25, 2014 5:01:12 PM hudson.plugins.tmpcleaner.TmpCleanTask visit
      INFO: Deleting /tmp/1408400946520-0/jboss51x_centos5-x64-bld-slave-09.corp.XXX.com_10002/lib/jboss-logging-spi.jar (atime=1408400947, diff=-5424)
      Aug 25, 2014 5:01:12 PM hudson.plugins.tmpcleaner.TmpCleanTask visit
      

      After interrupting the job, I get this:

      0:30:26 ERROR: SEVERE ERROR occurs
      10:30:26 org.jenkinsci.lib.envinject.EnvInjectException: java.lang.InterruptedException
      10:30:26 	at org.jenkinsci.plugins.envinject.service.EnvironmentVariablesNodeLoader.gatherEnvironmentVariablesNode(EnvironmentVariablesNodeLoader.java:77)
      10:30:26 	at org.jenkinsci.plugins.envinject.EnvInjectListener.loadEnvironmentVariablesNode(EnvInjectListener.java:81)
      10:30:26 	at org.jenkinsci.plugins.envinject.EnvInjectListener.setUpEnvironment(EnvInjectListener.java:39)
      10:30:26 	at hudson.model.AbstractBuild$AbstractBuildExecution.createLauncher(AbstractBuild.java:575)
      10:30:26 	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:481)
      10:30:26 	at hudson.model.Run.execute(Run.java:1689)
      10:30:26 	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
      10:30:26 	at hudson.model.ResourceController.execute(ResourceController.java:88)
      10:30:26 	at hudson.model.Executor.run(Executor.java:231)
      10:30:26 Caused by: java.lang.InterruptedException
      10:30:26 	at java.lang.Object.wait(Native Method)
      10:30:26 	at hudson.remoting.Request.call(Request.java:146)
      10:30:26 	at hudson.remoting.Channel.call(Channel.java:722)
      10:30:26 	at hudson.FilePath.act(FilePath.java:1003)
      10:30:26 	at org.jenkinsci.plugins.envinject.service.EnvironmentVariablesNodeLoader.gatherEnvironmentVariablesNode(EnvironmentVariablesNodeLoader.java:44)
      10:30:26 	... 8 more
      10:30:26 Archiving artifacts
      10:30:26 ERROR: Publisher hudson.tasks.Mailer aborted due to exception
      10:30:26 
      hudson.remoting.ChannelClosedException: channel is already closed
      10:30:26 	at hudson.remoting.Channel.send(Channel.java:524)
      10:30:26 	at hudson.remoting.Request.call(Request.java:129)
      10:30:26 	at hudson.remoting.Channel.call(Channel.java:722)
      10:30:26 	at hudson.EnvVars.getRemote(EnvVars.java:404)
      10:30:26 	at hudson.model.Computer.getEnvironment(Computer.java:911)
      10:30:26 	at jenkins.model.CoreEnvironmentContributor.buildEnvironmentFor(CoreEnvironmentContributor.java:29)
      10:30:26 	at hudson.model.Run.getEnvironment(Run.java:2202)
      10:30:26 	at hudson.model.AbstractBuild.getEnvironment(AbstractBuild.java:873)
      10:30:26 	at hudson.tasks.Mailer.perform(Mailer.java:134)
      10:30:26 	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
      10:30:26 	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:744)
      10:30:26 	at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:714)
      10:30:26 	at hudson.model.Build$BuildExecution.post2(Build.java:182)
      10:30:26 	at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:663)
      10:30:26 	at hudson.model.Run.execute(Run.java:1714)
      10:30:26 	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
      10:30:26 	at hudson.model.ResourceController.execute(ResourceController.java:88)
      10:30:26 	at hudson.model.Executor.run(Executor.java:231)
      10:30:26 Caused by: java.io.IOException
      10:30:26 	at hudson.remoting.Channel.close(Channel.java:1007)
      10:30:26 	at hudson.slaves.ChannelPinger$1.onDead(ChannelPinger.java:110)
      10:30:26 	at hudson.remoting.PingThread.ping(PingThread.java:120)
      10:30:26 	at hudson.remoting.PingThread.run(PingThread.java:81)
      10:30:26 Caused by: java.util.concurrent.TimeoutException: Ping started on 1409011309670 hasn't completed at 1409011549671
      10:30:26 	... 2 more
      10:30:26 [BFA] Scanning build for known causes...
      10:30:26 
      10:30:26 [BFA] Done. 0s
      10:30:26 [EnvInject] - [ERROR] - SEVERE ERROR occurs: channel is already closed
      

          [JENKINS-24449] slave cleanup causes jobs to hang

          It appears that the slave crashes on the abort, and the only solution is to disconnect and reconnect the slave.

          Christian Goetze added a comment - It appears that the slave crashes on the abort, and the only solution is to disconnect and reconnect the slave.

          Daniel Beck added a comment -

          This isn't actually related to the slave squatter plugin, is it?

          Daniel Beck added a comment - This isn't actually related to the slave squatter plugin, is it?

          Probably not, it's "just" a rather nasty slave.jar bug. I can never figure out what tag to use...

          Christian Goetze added a comment - Probably not, it's "just" a rather nasty slave.jar bug. I can never figure out what tag to use...

          I can confirm that this error occurred on a stale job (one that hadn't been run in a 1+month).

          Disconnecting and re-connecting the slave worked as a solution!

          ERROR: SEVERE ERROR occurs
          org.jenkinsci.lib.envinject.EnvInjectException: java.lang.InterruptedException
          at org.jenkinsci.plugins.envinject.service.EnvironmentVariablesNodeLoader.gatherEnvironmentVariablesNode(EnvironmentVariablesNodeLoader.java:77)
          at org.jenkinsci.plugins.envinject.EnvInjectListener.loadEnvironmentVariablesNode(EnvInjectListener.java:81)
          at org.jenkinsci.plugins.envinject.EnvInjectListener.setUpEnvironment(EnvInjectListener.java:39)
          at hudson.model.AbstractBuild$AbstractBuildExecution.createLauncher(AbstractBuild.java:572)
          at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:492)
          at hudson.model.Run.execute(Run.java:1741)
          at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
          at hudson.model.ResourceController.execute(ResourceController.java:98)
          at hudson.model.Executor.run(Executor.java:410)
          Caused by: java.lang.InterruptedException
          at java.lang.Object.wait(Native Method)
          at hudson.remoting.Request.call(Request.java:147)
          at hudson.remoting.Channel.call(Channel.java:780)
          at hudson.FilePath.act(FilePath.java:1102)
          at org.jenkinsci.plugins.envinject.service.EnvironmentVariablesNodeLoader.gatherEnvironmentVariablesNode(EnvironmentVariablesNodeLoader.java:44)
          ... 8 more
          ERROR: Step ‘Archive the artifacts’ failed: no workspace for XXXXX-XXXXXX-XXXX#17
          SSH: Current build result is [FAILURE], not going to run.
          [EnvInject] - [ERROR] - SEVERE ERROR occurs: null
          Finished: FAILURE

          John Rodenburg added a comment - I can confirm that this error occurred on a stale job (one that hadn't been run in a 1+month). Disconnecting and re-connecting the slave worked as a solution! ERROR: SEVERE ERROR occurs org.jenkinsci.lib.envinject.EnvInjectException: java.lang.InterruptedException at org.jenkinsci.plugins.envinject.service.EnvironmentVariablesNodeLoader.gatherEnvironmentVariablesNode(EnvironmentVariablesNodeLoader.java:77) at org.jenkinsci.plugins.envinject.EnvInjectListener.loadEnvironmentVariablesNode(EnvInjectListener.java:81) at org.jenkinsci.plugins.envinject.EnvInjectListener.setUpEnvironment(EnvInjectListener.java:39) at hudson.model.AbstractBuild$AbstractBuildExecution.createLauncher(AbstractBuild.java:572) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:492) at hudson.model.Run.execute(Run.java:1741) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43) at hudson.model.ResourceController.execute(ResourceController.java:98) at hudson.model.Executor.run(Executor.java:410) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.Request.call(Request.java:147) at hudson.remoting.Channel.call(Channel.java:780) at hudson.FilePath.act(FilePath.java:1102) at org.jenkinsci.plugins.envinject.service.EnvironmentVariablesNodeLoader.gatherEnvironmentVariablesNode(EnvironmentVariablesNodeLoader.java:44) ... 8 more ERROR: Step ‘Archive the artifacts’ failed: no workspace for XXXXX-XXXXXX-XXXX#17 SSH: Current build result is [FAILURE] , not going to run. [EnvInject] - [ERROR] - SEVERE ERROR occurs: null Finished: FAILURE

          Oleg Nenashev added a comment -

          Conditionally assigned it to myself. No ETA

          Oleg Nenashev added a comment - Conditionally assigned it to myself. No ETA

          Jesse Glick added a comment - - edited

          Does this happen without EnvInject enabled?

          Jesse Glick added a comment - - edited Does this happen without EnvInject enabled?

          Oleg Nenashev added a comment -

          Unfortunately I have no capacity to work on Remoting in medium term, so I will unassign it and let others to take it. If somebody is interested to submit a pull request, I will be happy to help to get it reviewed and released.

          Oleg Nenashev added a comment - Unfortunately I have no capacity to work on Remoting in medium term, so I will unassign it and let others to take it. If somebody is interested to submit a pull request, I will be happy to help to get it reviewed and released.

            Unassigned Unassigned
            cg Christian Goetze
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated: