After upgrading to 1.600 I am getting build errors on all jobs that has a "Execute shell" build step

      I tried downgrading to 1.599 and everything works there.
      I tried with both sh and bash
      It looks like the shell commands are terminated randomly in the middle of the run

      I tested with a very simple script like :

      sleep 59
      echo "sleep done"

      and got :

      Started by timer
      [EnvInject] - Loading node environment variables.
      Building remotely on cis01.coolsmsc.dk (swarm) in workspace /home/jenkinsslave/workspace/cron - test - test-no-php
      [cron - test - test-no-php] $ /bin/bash -xe /tmp/hudson4904788725429931154.sh
      + sleep 59
      /tmp/hudson4904788725429931154.sh: line 2: 19627 Terminated sleep 59
      Build step 'Execute shell' marked build as failure
      Finished: FAILURE

          [JENKINS-27178] Execute shell is terminated randomly

          Daniel Beck added a comment - - edited

          Started by user anonymous
          Building in workspace /Users/danielbeck/JENKINS-27178-Home/workspace/JENKINS-27178
          JENKINS-27178 $ /bin/sh -xe /var/folders/39/ggldtdps6034ct7d_y6x4_v80000gn/T/hudson3952789379184431267.sh
          + sleep 59

          + echo 'sleep done'
          sleep done
          Finished: SUCCESS

          Works for me with a pristine Jenkins 1.600 and new home directory, freestyle job run on master.

          Could you try to reproduce this issue on a (mostly) new Jenkins instance, and/or try to figure out what part of your environment breaks this? Maybe there's something interesting in the slave log? jenkins.log? The system logs of the slave node?

          Daniel Beck added a comment - - edited Started by user anonymous Building in workspace /Users/danielbeck/ JENKINS-27178 -Home/workspace/ JENKINS-27178 JENKINS-27178 $ /bin/sh -xe /var/folders/39/ggldtdps6034ct7d_y6x4_v80000gn/T/hudson3952789379184431267.sh + sleep 59 + echo 'sleep done' sleep done Finished: SUCCESS Works for me with a pristine Jenkins 1.600 and new home directory, freestyle job run on master. Could you try to reproduce this issue on a (mostly) new Jenkins instance, and/or try to figure out what part of your environment breaks this? Maybe there's something interesting in the slave log? jenkins.log? The system logs of the slave node?

          Some times it works for me as well, it seems to be when I run with multiple executors and even without the slaves there are problems.

          so recreating the problem is not that easy.

          we have quite a lot of plugins installed, so it's not that easy to recreate the setup and the problem, and it looks like the changes in the 1.600 release are to blame

          if none of that gives you any clues, i'll try setting a vagrant box up with a copy of my setup so I have somewhere I can try to recreate the errors and get back to you.

          Henrik Nicolaisen added a comment - Some times it works for me as well, it seems to be when I run with multiple executors and even without the slaves there are problems. so recreating the problem is not that easy. we have quite a lot of plugins installed, so it's not that easy to recreate the setup and the problem, and it looks like the changes in the 1.600 release are to blame if none of that gives you any clues, i'll try setting a vagrant box up with a copy of my setup so I have somewhere I can try to recreate the errors and get back to you.

          Viktor Szathmary added a comment - - edited

          Having the same problem since upgrading to 1.600 from 1.599. A certain shell invocation completes normally, but Jenkins occasionally reports a FAILURE for no good reason.

          Viktor Szathmary added a comment - - edited Having the same problem since upgrading to 1.600 from 1.599. A certain shell invocation completes normally, but Jenkins occasionally reports a FAILURE for no good reason.

          Will Saxon added a comment -

          I upgraded to 1.600 this afternoon and immediately began experiencing this issue. Downgrading works, but all our job history disappears when I do that. Since we make heavy use of 'execute shell' I am not sure what to do next except restore from backup.

          Will Saxon added a comment - I upgraded to 1.600 this afternoon and immediately began experiencing this issue. Downgrading works, but all our job history disappears when I do that. Since we make heavy use of 'execute shell' I am not sure what to do next except restore from backup.

          Daniel Beck added a comment -

          You could try to isolate the cause. As I wrote above, I was unable to reproduce this issue on a test instance (even when running two builds of different jobs in parallel on the master node).

          Daniel Beck added a comment - You could try to isolate the cause. As I wrote above, I was unable to reproduce this issue on a test instance (even when running two builds of different jobs in parallel on the master node).

          Daniel Beck added a comment -

          I've been able to reproduce this issue on a test instance of 1.600 only when env-inject is installed and enabled. Disable it, restart, issue is gone. Enable it, restart, issue is back.

          It seems to occur when two builds (different projects but same config, only the sleep + echo shell step) run in parallel (on the master node for me). There's a solid chance that one of the two (never both so far) fails.

          Presumably related to JENKINS-26755 so tentatively assigning to ndeloof. Given how popular this plugin is this can easily blow up a lot of instances, especially assuming that users who care about Jenkins security upgrade earlier than usual due to the security advisory.


          If anyone's interested and capable, it should be easy enough to revert the changes for JENKINS-26755 and create a build of 1.600 without it to see whether it resolves the issue for you. I could also provide such a build if you're willing to trust some random guy on the internet. Just respond here.

          Daniel Beck added a comment - I've been able to reproduce this issue on a test instance of 1.600 only when env-inject is installed and enabled. Disable it, restart, issue is gone. Enable it, restart, issue is back. It seems to occur when two builds (different projects but same config, only the sleep + echo shell step) run in parallel (on the master node for me). There's a solid chance that one of the two (never both so far) fails. Presumably related to JENKINS-26755 so tentatively assigning to ndeloof . Given how popular this plugin is this can easily blow up a lot of instances, especially assuming that users who care about Jenkins security upgrade earlier than usual due to the security advisory. If anyone's interested and capable, it should be easy enough to revert the changes for JENKINS-26755 and create a build of 1.600 without it to see whether it resolves the issue for you. I could also provide such a build if you're willing to trust some random guy on the internet. Just respond here.

          Daniel Beck added a comment -

          Possible fix in core:

          $ git diff
          diff --git a/core/src/main/java/hudson/model/Computer.java b/core/src/main/java/hudson/model/Computer.java
          index 2d31b41..058a166 100644
          --- a/core/src/main/java/hudson/model/Computer.java
          +++ b/core/src/main/java/hudson/model/Computer.java
          @@ -948,12 +948,12 @@ public /*transient*/ abstract class Computer extends Actionable implements Acces
               public EnvVars getEnvironment() throws IOException, InterruptedException {
                   EnvVars cachedEnvironment = this.cachedEnvironment;
                   if (cachedEnvironment != null) {
          -            return cachedEnvironment;
          +            return new EnvVars(cachedEnvironment);
                   }
           
                   cachedEnvironment = EnvVars.getRemote(getChannel());
                   this.cachedEnvironment = cachedEnvironment;
          -        return cachedEnvironment;
          +        return new EnvVars(cachedEnvironment);
               }
           
               /**

          Only tested this issue with this change, I'm not entirely sure there are no further side effects.

          Someone else can write the tests for this

          Daniel Beck added a comment - Possible fix in core: $ git diff diff --git a/core/src/main/java/hudson/model/Computer.java b/core/src/main/java/hudson/model/Computer.java index 2d31b41..058a166 100644 --- a/core/src/main/java/hudson/model/Computer.java +++ b/core/src/main/java/hudson/model/Computer.java @@ -948,12 +948,12 @@ public /* transient */ abstract class Computer extends Actionable implements Acces public EnvVars getEnvironment() throws IOException, InterruptedException { EnvVars cachedEnvironment = this .cachedEnvironment; if (cachedEnvironment != null ) { - return cachedEnvironment; + return new EnvVars(cachedEnvironment); } cachedEnvironment = EnvVars.getRemote(getChannel()); this .cachedEnvironment = cachedEnvironment; - return cachedEnvironment; + return new EnvVars(cachedEnvironment); } /** Only tested this issue with this change, I'm not entirely sure there are no further side effects. Someone else can write the tests for this

          Will Saxon added a comment -

          I can confirm that changing our master from 2 executors to 1 mitigates this issue for us. We are using EnvInject.

          Thanks Daniel.

          Will Saxon added a comment - I can confirm that changing our master from 2 executors to 1 mitigates this issue for us. We are using EnvInject. Thanks Daniel.

          Daniel Beck added a comment -

          For those of you who are more adventurous, here's a build that may or may not work, and may even make everything ten times worse:
          https://dl.dropboxusercontent.com/u/29853/jenkins-1.600.JENKINS-27178.db.war

          Based on this branch of 1.600 with fix: https://github.com/daniel-beck/jenkins/commits/JENKINS-27178, so you could read the diff to 1.600, know what was changed, and build it yourself.

          This is completely untested other than making sure the builds no longer fail this particular way with env-inject installed. I did not roll back JENKINS-26755, so this is a new state.

          If this issue turns out to be as serious as I fear, I'd prefer if one of the more regular core contributors would build a special build and upload to the Jenkins infra, but they're all offline right now. (Just to be sure I added the .db suffix to not have a naming collision with any other builds fixing this issue.)

          Daniel Beck added a comment - For those of you who are more adventurous, here's a build that may or may not work, and may even make everything ten times worse: https://dl.dropboxusercontent.com/u/29853/jenkins-1.600.JENKINS-27178.db.war Based on this branch of 1.600 with fix: https://github.com/daniel-beck/jenkins/commits/JENKINS-27178 , so you could read the diff to 1.600 , know what was changed, and build it yourself. This is completely untested other than making sure the builds no longer fail this particular way with env-inject installed. I did not roll back JENKINS-26755 , so this is a new state. If this issue turns out to be as serious as I fear, I'd prefer if one of the more regular core contributors would build a special build and upload to the Jenkins infra, but they're all offline right now. (Just to be sure I added the .db suffix to not have a naming collision with any other builds fixing this issue.)

          Daniel Beck added a comment -

          wsaxon There may be other issues as well related to this, see JENKINS-27178. I don't know whether going to 1 executor prevents that one.

          Daniel Beck added a comment - wsaxon There may be other issues as well related to this, see JENKINS-27178 . I don't know whether going to 1 executor prevents that one.

          Andrei Burd added a comment -

          I want to add that i received the same issue immediately after upgrading to 1.600

          Andrei Burd added a comment - I want to add that i received the same issue immediately after upgrading to 1.600

          Johno Crawford added a comment - Pull request here https://github.com/jenkinsci/jenkins/pull/1590

          Code changed in jenkins
          User: Jesse Glick
          Path:
          changelog.html
          core/src/main/java/hudson/model/Computer.java
          test/src/test/java/hudson/model/NodeTest.java
          http://jenkins-ci.org/commit/jenkins/ef4d1553bf12f5ba8d584491f1c118c8797a3846
          Log:
          JENKINS-27178 Amended merge of #1590.
          (cherry picked from commit 87ab95e7fc30af0d20a2bfe6a977fa2da7d15dba)

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: changelog.html core/src/main/java/hudson/model/Computer.java test/src/test/java/hudson/model/NodeTest.java http://jenkins-ci.org/commit/jenkins/ef4d1553bf12f5ba8d584491f1c118c8797a3846 Log: JENKINS-27178 Amended merge of #1590. (cherry picked from commit 87ab95e7fc30af0d20a2bfe6a977fa2da7d15dba)

          Jesse Glick added a comment -

          Merged, and backported to rc for 1.601.

          Jesse Glick added a comment - Merged, and backported to rc for 1.601.

          francisdb added a comment - - edited

          Just adding this so people end up here:

          (restarting Jenkins is a temporary fix as the argument list grows over time)

          java.io.IOException: Cannot run program "git" (in directory "/var/lib/jenkins/jobs/xxx/workspace"): error=7, Argument list too long
          java.io.IOException: Cannot run program "/bin/sh" (in directory "/var/lib/jenkins/jobs/xxx/workspace"): error=7, Argument list too long
          

          francisdb added a comment - - edited Just adding this so people end up here: (restarting Jenkins is a temporary fix as the argument list grows over time) java.io.IOException: Cannot run program "git" (in directory "/ var /lib/jenkins/jobs/xxx/workspace" ): error=7, Argument list too long java.io.IOException: Cannot run program "/bin/sh" (in directory "/ var /lib/jenkins/jobs/xxx/workspace" ): error=7, Argument list too long

          dogfood added a comment -

          Integrated in jenkins_main_trunk #3997
          JENKINS-27178 Amended merge of #1590. (Revision ef4d1553bf12f5ba8d584491f1c118c8797a3846)

          Result = SUCCESS
          jesse glick : ef4d1553bf12f5ba8d584491f1c118c8797a3846
          Files :

          • core/src/main/java/hudson/model/Computer.java
          • test/src/test/java/hudson/model/NodeTest.java
          • changelog.html

          dogfood added a comment - Integrated in jenkins_main_trunk #3997 JENKINS-27178 Amended merge of #1590. (Revision ef4d1553bf12f5ba8d584491f1c118c8797a3846) Result = SUCCESS jesse glick : ef4d1553bf12f5ba8d584491f1c118c8797a3846 Files : core/src/main/java/hudson/model/Computer.java test/src/test/java/hudson/model/NodeTest.java changelog.html

          The release notes for 1.601 (see http://jenkins-ci.org/changelog) do not mention this ticket. Was the fix released nevertheless?

          Viktor Szathmary added a comment - The release notes for 1.601 (see http://jenkins-ci.org/changelog ) do not mention this ticket. Was the fix released nevertheless?

          Jesse Glick added a comment -

          changelog.html probably got messed up by rc branch madness.

          Jesse Glick added a comment - changelog.html probably got messed up by rc branch madness.

          Daniel Beck added a comment -

          This was logged as JENKINS-21788, a duplicate of this issue.

          Daniel Beck added a comment - This was logged as JENKINS-21788 , a duplicate of this issue.

          Allan Carter added a comment -

          I'm still seeing a similar issue in release 1.637. If I run multiple executors then jenkins randomly sends SIGTERM to all the jobs processes. Am I seeing something new or did this not get integrated?

          Allan Carter added a comment - I'm still seeing a similar issue in release 1.637. If I run multiple executors then jenkins randomly sends SIGTERM to all the jobs processes. Am I seeing something new or did this not get integrated?

          Daniel Beck added a comment -

          allancarter Probably a different issue. Please file a new bug.

          Daniel Beck added a comment - allancarter Probably a different issue. Please file a new bug.

            johno Johno Crawford
            hmn Henrik Nicolaisen
            Votes:
            6 Vote for this issue
            Watchers:
            20 Start watching this issue

              Created:
              Updated:
              Resolved: