After upgrading to 1.600 I am getting build errors on all jobs that has a "Execute shell" build step

      I tried downgrading to 1.599 and everything works there.
      I tried with both sh and bash
      It looks like the shell commands are terminated randomly in the middle of the run

      I tested with a very simple script like :

      sleep 59
      echo "sleep done"

      and got :

      Started by timer
      [EnvInject] - Loading node environment variables.
      Building remotely on cis01.coolsmsc.dk (swarm) in workspace /home/jenkinsslave/workspace/cron - test - test-no-php
      [cron - test - test-no-php] $ /bin/bash -xe /tmp/hudson4904788725429931154.sh
      + sleep 59
      /tmp/hudson4904788725429931154.sh: line 2: 19627 Terminated sleep 59
      Build step 'Execute shell' marked build as failure
      Finished: FAILURE

          [JENKINS-27178] Execute shell is terminated randomly

          Henrik Nicolaisen created issue -

          Daniel Beck added a comment - - edited

          Started by user anonymous
          Building in workspace /Users/danielbeck/JENKINS-27178-Home/workspace/JENKINS-27178
          JENKINS-27178 $ /bin/sh -xe /var/folders/39/ggldtdps6034ct7d_y6x4_v80000gn/T/hudson3952789379184431267.sh
          + sleep 59

          + echo 'sleep done'
          sleep done
          Finished: SUCCESS

          Works for me with a pristine Jenkins 1.600 and new home directory, freestyle job run on master.

          Could you try to reproduce this issue on a (mostly) new Jenkins instance, and/or try to figure out what part of your environment breaks this? Maybe there's something interesting in the slave log? jenkins.log? The system logs of the slave node?

          Daniel Beck added a comment - - edited Started by user anonymous Building in workspace /Users/danielbeck/ JENKINS-27178 -Home/workspace/ JENKINS-27178 JENKINS-27178 $ /bin/sh -xe /var/folders/39/ggldtdps6034ct7d_y6x4_v80000gn/T/hudson3952789379184431267.sh + sleep 59 + echo 'sleep done' sleep done Finished: SUCCESS Works for me with a pristine Jenkins 1.600 and new home directory, freestyle job run on master. Could you try to reproduce this issue on a (mostly) new Jenkins instance, and/or try to figure out what part of your environment breaks this? Maybe there's something interesting in the slave log? jenkins.log? The system logs of the slave node?

          Some times it works for me as well, it seems to be when I run with multiple executors and even without the slaves there are problems.

          so recreating the problem is not that easy.

          we have quite a lot of plugins installed, so it's not that easy to recreate the setup and the problem, and it looks like the changes in the 1.600 release are to blame

          if none of that gives you any clues, i'll try setting a vagrant box up with a copy of my setup so I have somewhere I can try to recreate the errors and get back to you.

          Henrik Nicolaisen added a comment - Some times it works for me as well, it seems to be when I run with multiple executors and even without the slaves there are problems. so recreating the problem is not that easy. we have quite a lot of plugins installed, so it's not that easy to recreate the setup and the problem, and it looks like the changes in the 1.600 release are to blame if none of that gives you any clues, i'll try setting a vagrant box up with a copy of my setup so I have somewhere I can try to recreate the errors and get back to you.

          Viktor Szathmary added a comment - - edited

          Having the same problem since upgrading to 1.600 from 1.599. A certain shell invocation completes normally, but Jenkins occasionally reports a FAILURE for no good reason.

          Viktor Szathmary added a comment - - edited Having the same problem since upgrading to 1.600 from 1.599. A certain shell invocation completes normally, but Jenkins occasionally reports a FAILURE for no good reason.

          Will Saxon added a comment -

          I upgraded to 1.600 this afternoon and immediately began experiencing this issue. Downgrading works, but all our job history disappears when I do that. Since we make heavy use of 'execute shell' I am not sure what to do next except restore from backup.

          Will Saxon added a comment - I upgraded to 1.600 this afternoon and immediately began experiencing this issue. Downgrading works, but all our job history disappears when I do that. Since we make heavy use of 'execute shell' I am not sure what to do next except restore from backup.

          Daniel Beck added a comment -

          You could try to isolate the cause. As I wrote above, I was unable to reproduce this issue on a test instance (even when running two builds of different jobs in parallel on the master node).

          Daniel Beck added a comment - You could try to isolate the cause. As I wrote above, I was unable to reproduce this issue on a test instance (even when running two builds of different jobs in parallel on the master node).

          Daniel Beck added a comment -

          I've been able to reproduce this issue on a test instance of 1.600 only when env-inject is installed and enabled. Disable it, restart, issue is gone. Enable it, restart, issue is back.

          It seems to occur when two builds (different projects but same config, only the sleep + echo shell step) run in parallel (on the master node for me). There's a solid chance that one of the two (never both so far) fails.

          Presumably related to JENKINS-26755 so tentatively assigning to ndeloof. Given how popular this plugin is this can easily blow up a lot of instances, especially assuming that users who care about Jenkins security upgrade earlier than usual due to the security advisory.


          If anyone's interested and capable, it should be easy enough to revert the changes for JENKINS-26755 and create a build of 1.600 without it to see whether it resolves the issue for you. I could also provide such a build if you're willing to trust some random guy on the internet. Just respond here.

          Daniel Beck added a comment - I've been able to reproduce this issue on a test instance of 1.600 only when env-inject is installed and enabled. Disable it, restart, issue is gone. Enable it, restart, issue is back. It seems to occur when two builds (different projects but same config, only the sleep + echo shell step) run in parallel (on the master node for me). There's a solid chance that one of the two (never both so far) fails. Presumably related to JENKINS-26755 so tentatively assigning to ndeloof . Given how popular this plugin is this can easily blow up a lot of instances, especially assuming that users who care about Jenkins security upgrade earlier than usual due to the security advisory. If anyone's interested and capable, it should be easy enough to revert the changes for JENKINS-26755 and create a build of 1.600 without it to see whether it resolves the issue for you. I could also provide such a build if you're willing to trust some random guy on the internet. Just respond here.
          Daniel Beck made changes -
          Assignee New: Nicolas De Loof [ ndeloof ]
          Daniel Beck made changes -
          Component/s New: envinject-plugin [ 15893 ]

          Daniel Beck added a comment -

          Possible fix in core:

          $ git diff
          diff --git a/core/src/main/java/hudson/model/Computer.java b/core/src/main/java/hudson/model/Computer.java
          index 2d31b41..058a166 100644
          --- a/core/src/main/java/hudson/model/Computer.java
          +++ b/core/src/main/java/hudson/model/Computer.java
          @@ -948,12 +948,12 @@ public /*transient*/ abstract class Computer extends Actionable implements Acces
               public EnvVars getEnvironment() throws IOException, InterruptedException {
                   EnvVars cachedEnvironment = this.cachedEnvironment;
                   if (cachedEnvironment != null) {
          -            return cachedEnvironment;
          +            return new EnvVars(cachedEnvironment);
                   }
           
                   cachedEnvironment = EnvVars.getRemote(getChannel());
                   this.cachedEnvironment = cachedEnvironment;
          -        return cachedEnvironment;
          +        return new EnvVars(cachedEnvironment);
               }
           
               /**

          Only tested this issue with this change, I'm not entirely sure there are no further side effects.

          Someone else can write the tests for this

          Daniel Beck added a comment - Possible fix in core: $ git diff diff --git a/core/src/main/java/hudson/model/Computer.java b/core/src/main/java/hudson/model/Computer.java index 2d31b41..058a166 100644 --- a/core/src/main/java/hudson/model/Computer.java +++ b/core/src/main/java/hudson/model/Computer.java @@ -948,12 +948,12 @@ public /* transient */ abstract class Computer extends Actionable implements Acces public EnvVars getEnvironment() throws IOException, InterruptedException { EnvVars cachedEnvironment = this .cachedEnvironment; if (cachedEnvironment != null ) { - return cachedEnvironment; + return new EnvVars(cachedEnvironment); } cachedEnvironment = EnvVars.getRemote(getChannel()); this .cachedEnvironment = cachedEnvironment; - return cachedEnvironment; + return new EnvVars(cachedEnvironment); } /** Only tested this issue with this change, I'm not entirely sure there are no further side effects. Someone else can write the tests for this

            johno Johno Crawford
            hmn Henrik Nicolaisen
            Votes:
            6 Vote for this issue
            Watchers:
            20 Start watching this issue

              Created:
              Updated:
              Resolved: