Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-30226

MultiJob getting "stuck" after all child jobs complete

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Postponed
    • Icon: Major Major
    • multijob-plugin
    • None

      We have been experiencing an issue lately wherein our MultiJobs run, all the subjobs complete (and pass), and it gets "stuck". The multijob fails to complete, and it does not respond to manual halt requests. The only way to unstick it we've found is to restart jenkins.

      Reproduction
      I've setup a simplified generic example to reproduce it:

      • One multijob named "parent_job"
      • parent_job has one build phase
      • build phase has 10 identical subjobs: subjob_1 - subjob_10
      • Each subjob merely runs this simple script:
        echo "start"
        sleep 3
        echo "done"
        
      • set the parent job to run every minute (check "Build periodically" and set it to "* * * * *")

      Generally this job runs just fine. Approximately one of every 500-1000 runs though... it hangs.

      When it hangs... this is the multijob console log

      Started by timer
      [EnvInject] - Loading node environment variables.
      Building in workspace /var/lib/jenkins/jobs/b_parentjob/workspace
      Starting build job b_subjob4.
      Starting build job b_subjob9.
      Starting build job b_subjob8.
      Starting build job b_subjob3.
      Starting build job b_subjob10.
      Starting build job b_subjob6.
      Starting build job b_subjob2.
      Starting build job b_subjob5.
      Starting build job b_subjob1.
      Finished Build : #5849 of Job : b_subjob4 with status : SUCCESS
      Finished Build : #5849 of Job : b_subjob9 with status : SUCCESS
      Finished Build : #5849 of Job : b_subjob8 with status : SUCCESS
      Finished Build : #5849 of Job : b_subjob3 with status : SUCCESS
      Finished Build : #5849 of Job : b_subjob10 with status : SUCCESS
      Finished Build : #5849 of Job : b_subjob6 with status : SUCCESS
      Finished Build : #5849 of Job : b_subjob2 with status : SUCCESS
      Finished Build : #5849 of Job : b_subjob1 with status : SUCCESS
      Finished Build : #5849 of Job : b_subjob5 with status : SUCCESS
      

      Looks basically exactly like the log when it does not hang.

      other notes
      The hanging seems to correspond with other periods of high CPU and/or IO load on the EC2 instance we are using. But, I've tried to induce CPU / IO load using the 'stress' tool, but havent been able to find a reliable set of reproduction steps.

      versions
      We've reproduced using the latest jenkins and plugins: jenkins 1.626, conditional-build-step 1.3.3, multijob 1.16.

      I've back up to jenkins 1.613 where we had previously seen stable behavior, unfortunately the problem returned.

      Any ideas?

      Thanks,
      Rob

            Unassigned Unassigned
            montanarob Robert Schultheis
            Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: