Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-69061

Pipelines do not resume properly after Jenkins restart

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • Jenkins 2.360, Durable Task Plugin 496.va67c6f9eefa7, Pipeline Version 590.v6a_d052e5a_a_b_5

       Problem: 

      Since recent upgrades (2.332 and now 2.360), when running a pipeline project, if the jenkins service is restarted, the pipeline process does not resume properly once Jenkins process resumes. The job will eventually fail after attempting to resume for several minutes.

      This problem exists if you restart the service via a normal LInux service restart OR by using the jenkins-cli to trigger the restart. Same behavior either way.

      Steps to reproduce the issue: 

      1. Start a pipeline job on a single-node Jenkins host. Simple example Jenkinsfile below.
      2. While running, restart the jenkins service "service jenkins restart" ( OR using jenkins-cli.jar to restart )
      3. After Jenkins starts, the task attempts to resume, but instead eventually fails (log below). This used to work fine. 

      More details and notes: 

      This used to work perfectly fine on an older version of Jenkins (2.2x) but recently we upgraded the Jenkins hosts (through apt on Ubuntu 20.04) to v2.332 and also upgraded the plugins and the issue started happening. The pipelines have not changed. I've since tried upgrading to 2.360 and still not working. All plugins are up to date as well. 

      The script output is similar to many other open/closed issues related to the durable task plugin, however this scenario doesn't match those other issues.

      The below log is what shows after Jenkins comes back online after restarting and the job attempts to resume. 

      Resuming build at Tue Jul 19 23:26:56 UTC 2022 after Jenkins restart
      Waiting to resume part of test-job #5: Waiting for next available executor
      Ready to run at Tue Jul 19 23:27:01 UTC 2022
      wrapper script does not seem to be touching the log file in /data/jenkins_home/workspace/test-job@tmp/durable-b0167617
      (JENKINS-48300: if on an extremely laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=86400) 
      • The log file mentioned in this failure message does exist during job execution.
      • Manually touching/writing to the log file does not resolve the problem.
      • After the above message throws, the job goes into a "failed" state, but it takes a while.
      • The issue is not the filesystem nor available memory as other solutions have mentioned in related tickets/posts. 
      • There are no available plugin updates (fully up to date).
      • This seemed to happen when we got on the 2.332 version which also included the migration to systemd. So, is there a possibility that the service restart using systemd (versus the old init system used previous to 2.332) is breaking the durable task?

       

      Example Pipeline file:

      pipeline {
        agent any
      
        stages {
      
          stage("Sleep for 60 seconds") {
            steps {
      
              echo "Go restart jenkins service now and see that this job wont succeed"
      
              sh "sleep 60"
      
              echo "The job will never get this far"
      
            }
          }
        }
      } 

      Use Case / Impact:

      Major impact because this is a primary component that is not working:

      • This is a regression
      • Being able to resume after a fault/unexpected crash of the Jenkins process is something that should be expected of the durable tasks (and the primary reason for durable pipelines to exist at all). This is especially true for deployment automations. 
      • Restarting the jenkins service is a normal part of workflows when using the Jenkins init.groovy.d hook scripts to configure Jenkins itself (aka Jenkins configures Jenkins). "service jenkins restart" has been a part of our CI/CD workflow for a long time and only recently stopped working properly.

          [JENKINS-69061] Pipelines do not resume properly after Jenkins restart

          lee benhart added a comment - - edited

          This is also an issue on jenkins LTS 3.346.3. Durable Task Plugin Version 500.v8927d9fd99d8

          lee benhart added a comment - - edited This is also an issue on jenkins LTS 3.346.3. Durable Task Plugin Version 500.v8927d9fd99d8

          Matt Dee added a comment -

          This is still an issue with Jenkins 2.361.1 with Durable Task Plugin Version 500.v8927d9fd99d8

          Matt Dee added a comment - This is still an issue with Jenkins 2.361.1 with Durable Task Plugin Version 500.v8927d9fd99d8

          Donat added a comment - - edited

          This is still an issue with Jenkins 2.319 with Durable Task Plugin Version 503.v57154d18d478 

          Donat added a comment - - edited This is still an issue with Jenkins 2.319 with Durable Task Plugin Version 503.v57154d18d478 

          Donat added a comment -

          This issue is open for a very long time, are there any plans to tackle this problem?  

          Donat added a comment - This issue is open for a very long time, are there any plans to tackle this problem?  

          Markus Winter added a comment -

          I can only recommend to use the binary wrapper. On our Jenkins we're running around 1500 builds per day. On average we had like 4 build per day failing with the problem 
          wrapper script does not seem to be touching the log file
          Since we enabled the binary wrapper 2 weeks ago, we had only one occurrence of that problem.

          I could imagine that this also helps for the restarts.

          Markus Winter added a comment - I can only recommend to use the binary wrapper. On our Jenkins we're running around 1500 builds per day. On average we had like 4 build per day failing with the problem  wrapper script does not seem to be touching the log file Since we enabled the binary wrapper 2 weeks ago, we had only one occurrence of that problem. I could imagine that this also helps for the restarts.

          What is the binary wrapper?

          Brian J Murrell added a comment - What is the binary wrapper ?

          Jesse Glick added a comment -

          -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.USE_BINARY_WRAPPER=true is apparently not the default as of https://github.com/jenkinsci/durable-task-plugin/pull/115 despite JENKINS-59903 & JENKINS-59907 being resolved; carroll was this forgotten?

          Jesse Glick added a comment - -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.USE_BINARY_WRAPPER=true is apparently not the default as of https://github.com/jenkinsci/durable-task-plugin/pull/115 despite JENKINS-59903 & JENKINS-59907 being resolved; carroll was this forgotten?

            Unassigned Unassigned
            mdebord1 Matt Dee
            Votes:
            1 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated: