Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-21932

Job hangs if one of multiple triggered builds was aborted

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • core
    • None

      I have two cases with job hanging:
      1) Parent job triggers multiple different jobs and waits for their completion. I abort some triggered job while it stays in queue. After that parent job never finished, waiting for completion.
      2) Parent job triggers multiple instances (via label factory) of one job on different slaves and waits for completion. I abort one of job while it executes on slave. Parent job hangs forever saying "Waiting for completion".

      I believe it worked correctly (in both cases parent job has failed) in some previous version.
      I've recently updated Jenkins from 1.48 to 1.532
      As well as Parametrized Trigger Plugin from 2.16 to 2.22
      And NodeLabel Parameter Plugin from 1.2.1 to 1.4

          [JENKINS-21932] Job hangs if one of multiple triggered builds was aborted

          ikedam added a comment -

          Sounds a issue of parameterized-trigger plugin.

          Can you always reproduce that behavior?
          Please report an example project configuration to reproduce the problem.

          ikedam added a comment - Sounds a issue of parameterized-trigger plugin. Can you always reproduce that behavior? Please report an example project configuration to reproduce the problem.

          Sergey Irisov added a comment - - edited

          I can reproduce now only first case.

          Parent job: Launcher_trigger_multiple_job.
          It triggers 2 job: Slave_job and Slave_job2.
          Slave_job was aborted while it stays in queue. And Slave_job2 successfully finished.
          Launcher job hung waiting for Slave_job.

          Jobs' configs are in attachment.

          Sergey Irisov added a comment - - edited I can reproduce now only first case. Parent job: Launcher_trigger_multiple_job. It triggers 2 job: Slave_job and Slave_job2. Slave_job was aborted while it stays in queue. And Slave_job2 successfully finished. Launcher job hung waiting for Slave_job. Jobs' configs are in attachment.

          Oleg Nenashev added a comment - - edited

          I suppose this issue and JENKINS-16679 have the same origin.
          After the initial analysis of JENKINS-16679 (I've done it several months ago; hope to find notes), I suspect that it may require a fix inside the Jenkins core.

          As a workaround, it is possible to analyze statuses of future tasks instead of sequential "wait" calls against the submitted projects list.

          Oleg Nenashev added a comment - - edited I suppose this issue and JENKINS-16679 have the same origin. After the initial analysis of JENKINS-16679 (I've done it several months ago; hope to find notes), I suspect that it may require a fix inside the Jenkins core. As a workaround, it is possible to analyze statuses of future tasks instead of sequential "wait" calls against the submitted projects list.

          ikedam added a comment -

          It reproduced easily also in my environment. Amazing.
          It would require a fix in Jenkins core as @oleg_nenashev points.

          How I reproduce:

          1. Install parameterized-trigger plugin
          2. Create a node "slave1" from Manage Jenkins > Manage Nodes > New Node
            • No need to launch that node.
          3. Create a free style project "downstream1".
            • Check "Restrict where this project can be run" (this is displayed only when there are slaves) and enter "slave1" to "Label Expression"
          4. Create a free style project "downstream2".
          5. Create a free style project "upstream"
            • Add "Trigger/call builds on other projects"
              • "downstream1,downstream2" for "Projects to build"
              • Check Block until the triggered projects finish their builds
          6. Click "Build Now" of "upstream".
          7. Cancel the build of "downstream1" that should be pending.
          8. Result: "upstream" should finish, but actually does not finish.

          Environments:
          Windows 8
          Jenkins 1.532.1
          JDK 1.7.0_45
          Parameterized Trigger plugin 2.22

          ikedam added a comment - It reproduced easily also in my environment. Amazing. It would require a fix in Jenkins core as @oleg_nenashev points. How I reproduce: Install parameterized-trigger plugin Create a node "slave1" from Manage Jenkins > Manage Nodes > New Node No need to launch that node. Create a free style project "downstream1". Check "Restrict where this project can be run" (this is displayed only when there are slaves) and enter "slave1" to "Label Expression" Create a free style project "downstream2". Create a free style project "upstream" Add "Trigger/call builds on other projects" "downstream1,downstream2" for "Projects to build" Check Block until the triggered projects finish their builds Click "Build Now" of "upstream". Cancel the build of "downstream1" that should be pending. Result: "upstream" should finish, but actually does not finish. Environments: Windows 8 Jenkins 1.532.1 JDK 1.7.0_45 Parameterized Trigger plugin 2.22

          ikedam added a comment -

          I found it reproduces even with only one downstream project:

          1. Install parameterized-trigger plugin
          2. Create a node "slave1" from Manage Jenkins > Manage Nodes > New Node
            • No need to launch that node.
          3. Create a free style project "downstream".
            • Check "Restrict where this project can be run" (this is displayed only when there are slaves) and enter "slave1" to "Label Expression"
          4. Create a free style project "upstream"
            • Add "Trigger/call builds on other projects"
              • "downstream" for "Projects to build"
              • Check Block until the triggered projects finish their builds
          5. Click "Build Now" of "upstream".
          6. Cancel the build of "downstream" that should be pending.
          7. Result: "upstream" should finish, but actually does not finish.

          And as long I tested, it does not reproduce with Jenkins <= 1.509.

          Jenkins result
          1.480.3 don't reproduce
          1.509.2 don't reproduce
          1.532.1 reproduce

          I should find the version of Jenkins that starts produce this problem.

          ikedam added a comment - I found it reproduces even with only one downstream project: Install parameterized-trigger plugin Create a node "slave1" from Manage Jenkins > Manage Nodes > New Node No need to launch that node. Create a free style project "downstream". Check "Restrict where this project can be run" (this is displayed only when there are slaves) and enter "slave1" to "Label Expression" Create a free style project "upstream" Add "Trigger/call builds on other projects" "downstream" for "Projects to build" Check Block until the triggered projects finish their builds Click "Build Now" of "upstream". Cancel the build of "downstream" that should be pending. Result: "upstream" should finish, but actually does not finish. And as long I tested, it does not reproduce with Jenkins <= 1.509. Jenkins result 1.480.3 don't reproduce 1.509.2 don't reproduce 1.532.1 reproduce I should find the version of Jenkins that starts produce this problem.

          ikedam added a comment -

          This seems a behavior introduced in Jenkins 1.520. And not fixed even in the latest release (1.553).

          Jenkins result
          1.480.3 don't reproduce
          1.509.2 don't reproduce
          1.515 don't reproduce
          1.517 don't reproduce
          1.518 don't reproduce
          1.519 don't reproduce
          1.520 reproduce
          1.532.1 reproduce
          1.553 reproduce

          ikedam added a comment - This seems a behavior introduced in Jenkins 1.520. And not fixed even in the latest release (1.553). Jenkins result 1.480.3 don't reproduce 1.509.2 don't reproduce 1.515 don't reproduce 1.517 don't reproduce 1.518 don't reproduce 1.519 don't reproduce 1.520 reproduce 1.532.1 reproduce 1.553 reproduce

          ikedam added a comment -

          This seems introduced in 513a45b.
          The problem reproduces with that commit, but not with 7b2541d which is the parent of 513a45b.

          ikedam added a comment - This seems introduced in 513a45b . The problem reproduces with that commit, but not with 7b2541d which is the parent of 513a45b .

          ikedam added a comment -

          Jenkins <= 1.519 (7b2541d)

          • Queue#doCancelItem
            • Queue#cancel(Item)
              • Item#onCancelled
                • FutureImpl#setAsCancelled

          Jenkins >= 1.520 (513a45b)

          • Queue#doCancelItem
            • Queue#cancel(Item)
              • Item#leave
          • Item#onCancelled is removed in 513a45b.
          • I think it should work in following way:
            • Queue#doCancelItem
              • Queue#cancel(Item)
                • Item#cancel
                  • Item#leave
                  • FutureImpl#setAsCancelled

          ikedam added a comment - Jenkins <= 1.519 ( 7b2541d ) Queue#doCancelItem Queue#cancel(Item) Item#onCancelled FutureImpl#setAsCancelled Jenkins >= 1.520 ( 513a45b ) Queue#doCancelItem Queue#cancel(Item) Item#leave Item#onCancelled is removed in 513a45b . I think it should work in following way: Queue#doCancelItem Queue#cancel(Item) Item#cancel Item#leave FutureImpl#setAsCancelled

          Oleg Nenashev added a comment -

          I've discovered an issue in the remoting library, which may cause this issue.
          https://github.com/jenkinsci/remoting/pull/22

          Oleg Nenashev added a comment - I've discovered an issue in the remoting library, which may cause this issue. https://github.com/jenkinsci/remoting/pull/22

          ikedam added a comment -

          Posted pull request:
          https://github.com/jenkinsci/jenkins/pull/1160

          And changed component to core as this is fix for Jenkins core.

          ikedam added a comment - Posted pull request: https://github.com/jenkinsci/jenkins/pull/1160 And changed component to core as this is fix for Jenkins core.

          Code changed in jenkins
          User: Oleg Nenashev
          Path:
          src/main/java/hudson/remoting/AsyncFutureImpl.java
          http://jenkins-ci.org/commit/remoting/f32e434cb195e5d8d6d160f78116a053b948be62
          Log:
          Fixed the synchronization issue for cancel() operations

          Locking operations may affect hudson.remoting.Request::get() handlers
          The issue may cause the JENKINS-21932, which seems to be caused by unhandled locks().

          Signed-off-by: Oleg Nenashev <o.v.nenashev@gmail.com>

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Oleg Nenashev Path: src/main/java/hudson/remoting/AsyncFutureImpl.java http://jenkins-ci.org/commit/remoting/f32e434cb195e5d8d6d160f78116a053b948be62 Log: Fixed the synchronization issue for cancel() operations Locking operations may affect hudson.remoting.Request::get() handlers The issue may cause the JENKINS-21932 , which seems to be caused by unhandled locks(). Signed-off-by: Oleg Nenashev <o.v.nenashev@gmail.com>

          Code changed in jenkins
          User: ikedam
          Path:
          test/src/test/java/hudson/model/QueueTest.java
          http://jenkins-ci.org/commit/jenkins/35dfc75c682e1c7dbc1308426e1d75a0f18a2ab9
          Log:
          JENKINS-21932 Added tests to reproduce JENKINS-21932, Future#get does not abort even when a task in the queue is canceled.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: ikedam Path: test/src/test/java/hudson/model/QueueTest.java http://jenkins-ci.org/commit/jenkins/35dfc75c682e1c7dbc1308426e1d75a0f18a2ab9 Log: JENKINS-21932 Added tests to reproduce JENKINS-21932 , Future#get does not abort even when a task in the queue is canceled.

          Code changed in jenkins
          User: ikedam
          Path:
          test/src/test/java/hudson/model/QueueTest.java
          http://jenkins-ci.org/commit/jenkins/5c3672270bcc97d5d05541f284c35b44067693ea
          Log:
          JENKINS-21932 Make the slave used in the test offline explicitly.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: ikedam Path: test/src/test/java/hudson/model/QueueTest.java http://jenkins-ci.org/commit/jenkins/5c3672270bcc97d5d05541f284c35b44067693ea Log: JENKINS-21932 Make the slave used in the test offline explicitly.

          Code changed in jenkins
          User: ikedam
          Path:
          core/src/main/java/hudson/model/Queue.java
          http://jenkins-ci.org/commit/jenkins/0c3d67097a3394fee7f0eb895c4350ea96887a02
          Log:
          [FIXED JENKINS-21932] Call Item#cancel when a task in a queue is cancelled.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: ikedam Path: core/src/main/java/hudson/model/Queue.java http://jenkins-ci.org/commit/jenkins/0c3d67097a3394fee7f0eb895c4350ea96887a02 Log: [FIXED JENKINS-21932] Call Item#cancel when a task in a queue is cancelled.

          Code changed in jenkins
          User: Kohsuke Kawaguchi
          Path:
          changelog.html
          core/src/main/java/hudson/model/Queue.java
          test/src/test/java/hudson/model/QueueTest.java
          http://jenkins-ci.org/commit/jenkins/fce4fed4e8785ebf2618e5532d1416488bb9fa6d
          Log:
          JENKINS-21932 Merge pull request #1160

          Compare: https://github.com/jenkinsci/jenkins/compare/eb0bfa5ece8e...fce4fed4e878

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: changelog.html core/src/main/java/hudson/model/Queue.java test/src/test/java/hudson/model/QueueTest.java http://jenkins-ci.org/commit/jenkins/fce4fed4e8785ebf2618e5532d1416488bb9fa6d Log: JENKINS-21932 Merge pull request #1160 Compare: https://github.com/jenkinsci/jenkins/compare/eb0bfa5ece8e...fce4fed4e878

          dogfood added a comment -

          Integrated in jenkins_main_trunk #3303
          JENKINS-21932 Added tests to reproduce JENKINS-21932, Future#get does not abort even when a task in the queue is canceled. (Revision 35dfc75c682e1c7dbc1308426e1d75a0f18a2ab9)
          JENKINS-21932 Make the slave used in the test offline explicitly. (Revision 5c3672270bcc97d5d05541f284c35b44067693ea)
          [FIXED JENKINS-21932] Call Item#cancel when a task in a queue is cancelled. (Revision 0c3d67097a3394fee7f0eb895c4350ea96887a02)

          Result = SUCCESS
          devld : 35dfc75c682e1c7dbc1308426e1d75a0f18a2ab9
          Files :

          • test/src/test/java/hudson/model/QueueTest.java

          devld : 5c3672270bcc97d5d05541f284c35b44067693ea
          Files :

          • test/src/test/java/hudson/model/QueueTest.java

          devld : 0c3d67097a3394fee7f0eb895c4350ea96887a02
          Files :

          • core/src/main/java/hudson/model/Queue.java

          dogfood added a comment - Integrated in jenkins_main_trunk #3303 JENKINS-21932 Added tests to reproduce JENKINS-21932 , Future#get does not abort even when a task in the queue is canceled. (Revision 35dfc75c682e1c7dbc1308426e1d75a0f18a2ab9) JENKINS-21932 Make the slave used in the test offline explicitly. (Revision 5c3672270bcc97d5d05541f284c35b44067693ea) [FIXED JENKINS-21932] Call Item#cancel when a task in a queue is cancelled. (Revision 0c3d67097a3394fee7f0eb895c4350ea96887a02) Result = SUCCESS devld : 35dfc75c682e1c7dbc1308426e1d75a0f18a2ab9 Files : test/src/test/java/hudson/model/QueueTest.java devld : 5c3672270bcc97d5d05541f284c35b44067693ea Files : test/src/test/java/hudson/model/QueueTest.java devld : 0c3d67097a3394fee7f0eb895c4350ea96887a02 Files : core/src/main/java/hudson/model/Queue.java

            Unassigned Unassigned
            cerber Sergey Irisov
            Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: