Deadlock of AsyncFutureImpl.get() during massive submission of distributed jobs

This issue is archived. You can view it, but you can't modify it. Learn more

XMLWordPrintable

      1) I trigger jobs from Parametrized Trigger Plugin. Job submits about 64 parallel jobs with "Hello, world!" output and waits till their completion
      2) At some point monitoring of jobs hangs. When all slave jobs finish, master job stills waiting..
      3) According to logs, hudson.remoting.AsyncFutureImpl.get() hangs, because "completed" was initially false. Then, wait() cycle never returns. Seems that AsyncFutureImpl:set() has not been called for one of the jobs.

      Additional analysis:

      • Submission works well on local host w/o additional remote node
      • In the log I see log rotation errors only() // see below
      • All executor thread have been finished for their jobs

      Call stack of the job (0x00000007866656f0 is not used by other threads):

      "Executor #7 for master : executing Test_MassiveSubmission #8" prio=6 tid=0x000000001148e000 nid=0x16bfc in Object.wait() [0x000000000d12e000]
      java.lang.Thread.State: WAITING (on object monitor)
      at java.lang.Object.wait(Native Method)

      • waiting on <0x00000007866656f0> (a hudson.model.queue.FutureImpl)
        at java.lang.Object.wait(Object.java:503)
        at hudson.remoting.AsyncFutureImpl.get(AsyncFutureImpl.java:73)
      • locked <0x00000007866656f0> (a hudson.model.queue.FutureImpl)
        at hudson.plugins.parameterizedtrigger.TriggerBuilder.perform(TriggerBuilder.java:135)
        at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
        at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:802)
        at hudson.model.Build$BuildExecution.build(Build.java:199)
        at hudson.model.Build$BuildExecution.doRun(Build.java:160)
        at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:584)
        at hudson.model.Run.execute(Run.java:1592)
        at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
        at hudson.model.ResourceController.execute(ResourceController.java:88)
        at hudson.model.Executor.run(Executor.java:237)

      Error log contains only following errors:

      SEVERE: Failed to rotate log
      java.io.IOException: C:\Users\nenashev\Documents\Work\Jenkins\contrib\parameterized-trigger-plugin\.\work\jobs\Test_MassiveSubmissionSlave\builds\2013-09-26_15-36-16 is in use
      at hudson.model.Run.delete(Run.java:1380)
      at hudson.tasks.LogRotator.perform(LogRotator.java:133)
      at hudson.model.Job.logRotate(Job.java:404)
      at hudson.model.Run.execute(Run.java:1655)
      at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
      at hudson.model.ResourceController.execute(ResourceController.java:88)
      at hudson.model.Executor.run(Executor.java:237)

            Assignee:
            Oleg Nenashev
            Reporter:
            Oleg Nenashev
            Archiver:
            Jenkins Service Account

              Created:
              Updated:
              Resolved:
              Archived: