Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-19776

Deadlock of AsyncFutureImpl.get() during massive submission of distributed jobs

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      1) I trigger jobs from Parametrized Trigger Plugin. Job submits about 64 parallel jobs with "Hello, world!" output and waits till their completion
      2) At some point monitoring of jobs hangs. When all slave jobs finish, master job stills waiting..
      3) According to logs, hudson.remoting.AsyncFutureImpl.get() hangs, because "completed" was initially false. Then, wait() cycle never returns. Seems that AsyncFutureImpl:set() has not been called for one of the jobs.

      Additional analysis:

      • Submission works well on local host w/o additional remote node
      • In the log I see log rotation errors only() // see below
      • All executor thread have been finished for their jobs

      Call stack of the job (0x00000007866656f0 is not used by other threads):

      "Executor #7 for master : executing Test_MassiveSubmission #8" prio=6 tid=0x000000001148e000 nid=0x16bfc in Object.wait() [0x000000000d12e000]
      java.lang.Thread.State: WAITING (on object monitor)
      at java.lang.Object.wait(Native Method)

      • waiting on <0x00000007866656f0> (a hudson.model.queue.FutureImpl)
        at java.lang.Object.wait(Object.java:503)
        at hudson.remoting.AsyncFutureImpl.get(AsyncFutureImpl.java:73)
      • locked <0x00000007866656f0> (a hudson.model.queue.FutureImpl)
        at hudson.plugins.parameterizedtrigger.TriggerBuilder.perform(TriggerBuilder.java:135)
        at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
        at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:802)
        at hudson.model.Build$BuildExecution.build(Build.java:199)
        at hudson.model.Build$BuildExecution.doRun(Build.java:160)
        at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:584)
        at hudson.model.Run.execute(Run.java:1592)
        at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
        at hudson.model.ResourceController.execute(ResourceController.java:88)
        at hudson.model.Executor.run(Executor.java:237)

      Error log contains only following errors:

      SEVERE: Failed to rotate log
      java.io.IOException: C:\Users\nenashev\Documents\Work\Jenkins\contrib\parameterized-trigger-plugin\.\work\jobs\Test_MassiveSubmissionSlave\builds\2013-09-26_15-36-16 is in use
      at hudson.model.Run.delete(Run.java:1380)
      at hudson.tasks.LogRotator.perform(LogRotator.java:133)
      at hudson.model.Job.logRotate(Job.java:404)
      at hudson.model.Run.execute(Run.java:1655)
      at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
      at hudson.model.ResourceController.execute(ResourceController.java:88)
      at hudson.model.Executor.run(Executor.java:237)

        Attachments

          Issue Links

            Activity

            Hide
            jglick Jesse Glick added a comment -

            The Failed to rotate log is a known (and fixed) bug for external jobs; observed but unknown cause for other job types. Some better diagnostics in dev versions. Probably unrelated to the hang.

            Show
            jglick Jesse Glick added a comment - The Failed to rotate log is a known (and fixed) bug for external jobs; observed but unknown cause for other job types. Some better diagnostics in dev versions. Probably unrelated to the hang.
            Hide
            oleg_nenashev Oleg Nenashev added a comment -

            I'm going to try the custom core with remoting-2.32 tomorrow.
            Probably issue has been fixed by other changes. Probably...

            Show
            oleg_nenashev Oleg Nenashev added a comment - I'm going to try the custom core with remoting-2.32 tomorrow. Probably issue has been fixed by other changes. Probably...
            Hide
            oleg_nenashev Oleg Nenashev added a comment - - edited

            BTW, I have not managed to reproduce issue on 1.530. Could it be fixed by https://github.com/jenkinsci/jenkins/commit/bf444887ac16cc802695827da0a0f30949aa0f1f ?

            Show
            oleg_nenashev Oleg Nenashev added a comment - - edited BTW, I have not managed to reproduce issue on 1.530. Could it be fixed by https://github.com/jenkinsci/jenkins/commit/bf444887ac16cc802695827da0a0f30949aa0f1f ?
            Hide
            jglick Jesse Glick added a comment -

            JENKINS-19377 was about a problem with the external job plugin, so its fix should not have had any effect on other job types.

            If you are able to reproduce the issue in some earlier version of Jenkins, but now now, then bisection can be used to pinpoint the fix, which might be useful to know (for example for backporting).

            Show
            jglick Jesse Glick added a comment - JENKINS-19377 was about a problem with the external job plugin, so its fix should not have had any effect on other job types. If you are able to reproduce the issue in some earlier version of Jenkins, but now now, then bisection can be used to pinpoint the fix, which might be useful to know (for example for backporting).
            Hide
            oleg_nenashev Oleg Nenashev added a comment -

            Issue has gone after migration to custom core with remoting-2.32 (and several other patches from 1.509.4). Due to its randomness and low probability, there's no absolute warranty

            I'll try 1.509.4-RC on my installation. If I fail to reproduce issue within a 1-2 weeks, then I'll just close the issue.

            P.S: I still experience hanging in case of jobs triggering from parallel builds, but seems that it is a plugin's issue (https://issues.jenkins-ci.org/browse/JENKINS-16679).

            Show
            oleg_nenashev Oleg Nenashev added a comment - Issue has gone after migration to custom core with remoting-2.32 (and several other patches from 1.509.4). Due to its randomness and low probability, there's no absolute warranty I'll try 1.509.4-RC on my installation. If I fail to reproduce issue within a 1-2 weeks, then I'll just close the issue. P.S: I still experience hanging in case of jobs triggering from parallel builds, but seems that it is a plugin's issue ( https://issues.jenkins-ci.org/browse/JENKINS-16679 ).
            Hide
            hiteswar_kumar hiteswar kumar added a comment -

            please share if anyone seeing this issue still?
            and at which jenkins lts it is fixed?

            i am geting this issue on jekins lts- ver. 1.480.3 and parameterised triiger-2.19

            regards
            Hiteswar

            Show
            hiteswar_kumar hiteswar kumar added a comment - please share if anyone seeing this issue still? and at which jenkins lts it is fixed? i am geting this issue on jekins lts- ver. 1.480.3 and parameterised triiger-2.19 regards Hiteswar
            Hide
            oleg_nenashev Oleg Nenashev added a comment -

            AFAIK, there's no direct fix for the issue.
            However, I have not seen it since migration to 1.509.4

            Show
            oleg_nenashev Oleg Nenashev added a comment - AFAIK, there's no direct fix for the issue. However, I have not seen it since migration to 1.509.4
            Hide
            oleg_nenashev Oleg Nenashev added a comment -

            I cannot reproduce the issue since update to 1.509.4
            If anybody experiences it, please reopen the issue.

            Show
            oleg_nenashev Oleg Nenashev added a comment - I cannot reproduce the issue since update to 1.509.4 If anybody experiences it, please reopen the issue.

              People

              Assignee:
              oleg_nenashev Oleg Nenashev
              Reporter:
              oleg_nenashev Oleg Nenashev
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: