Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-69061

Pipelines do not resume properly after Jenkins restart

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • Jenkins 2.360, Durable Task Plugin 496.va67c6f9eefa7, Pipeline Version 590.v6a_d052e5a_a_b_5

       Problem: 

      Since recent upgrades (2.332 and now 2.360), when running a pipeline project, if the jenkins service is restarted, the pipeline process does not resume properly once Jenkins process resumes. The job will eventually fail after attempting to resume for several minutes.

      This problem exists if you restart the service via a normal LInux service restart OR by using the jenkins-cli to trigger the restart. Same behavior either way.

      Steps to reproduce the issue: 

      1. Start a pipeline job on a single-node Jenkins host. Simple example Jenkinsfile below.
      2. While running, restart the jenkins service "service jenkins restart" ( OR using jenkins-cli.jar to restart )
      3. After Jenkins starts, the task attempts to resume, but instead eventually fails (log below). This used to work fine. 

      More details and notes: 

      This used to work perfectly fine on an older version of Jenkins (2.2x) but recently we upgraded the Jenkins hosts (through apt on Ubuntu 20.04) to v2.332 and also upgraded the plugins and the issue started happening. The pipelines have not changed. I've since tried upgrading to 2.360 and still not working. All plugins are up to date as well. 

      The script output is similar to many other open/closed issues related to the durable task plugin, however this scenario doesn't match those other issues.

      The below log is what shows after Jenkins comes back online after restarting and the job attempts to resume. 

      Resuming build at Tue Jul 19 23:26:56 UTC 2022 after Jenkins restart
      Waiting to resume part of test-job #5: Waiting for next available executor
      Ready to run at Tue Jul 19 23:27:01 UTC 2022
      wrapper script does not seem to be touching the log file in /data/jenkins_home/workspace/test-job@tmp/durable-b0167617
      (JENKINS-48300: if on an extremely laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=86400) 
      • The log file mentioned in this failure message does exist during job execution.
      • Manually touching/writing to the log file does not resolve the problem.
      • After the above message throws, the job goes into a "failed" state, but it takes a while.
      • The issue is not the filesystem nor available memory as other solutions have mentioned in related tickets/posts. 
      • There are no available plugin updates (fully up to date).
      • This seemed to happen when we got on the 2.332 version which also included the migration to systemd. So, is there a possibility that the service restart using systemd (versus the old init system used previous to 2.332) is breaking the durable task?

       

      Example Pipeline file:

      pipeline {
        agent any
      
        stages {
      
          stage("Sleep for 60 seconds") {
            steps {
      
              echo "Go restart jenkins service now and see that this job wont succeed"
      
              sh "sleep 60"
      
              echo "The job will never get this far"
      
            }
          }
        }
      } 

      Use Case / Impact:

      Major impact because this is a primary component that is not working:

      • This is a regression
      • Being able to resume after a fault/unexpected crash of the Jenkins process is something that should be expected of the durable tasks (and the primary reason for durable pipelines to exist at all). This is especially true for deployment automations. 
      • Restarting the jenkins service is a normal part of workflows when using the Jenkins init.groovy.d hook scripts to configure Jenkins itself (aka Jenkins configures Jenkins). "service jenkins restart" has been a part of our CI/CD workflow for a long time and only recently stopped working properly.

          [JENKINS-69061] Pipelines do not resume properly after Jenkins restart

          Matt Dee created issue -
          Matt Dee made changes -
          Description Original: h2.
          {color:#172b4d}Problem:{color} 

          Since recent upgrades (2.332 and now 2.360), when running a pipeline project, *if the jenkins service is restarted, the pipeline process does not resume properly* once Jenkins process resumes. Worked in 2.2x
           **
          h3. Steps to reproduce the issue: 
           # Start a pipeline job on a single-node Jenkins host.
           # While running, restart the jenkins service "service jenkins restart"
           # After Jenkins starts, the task attempts to resume, but instead eventually fails (log below). This used to work fine. 

          h2. {color:#172b4d}More details and notes: {color}

          This used to work perfectly fine on an older version of Jenkins (2.2x) but recently we upgraded the Jenkins hosts (through apt) to v2.332 and also upgraded the plugins and the issue started happening. The pipelines have not changed. I've since tried upgrading to 2.360 and still not working. All plugins are up to date as well. 

          The script output is similar to many other open/closed issues related to the durable task plugin, however this scenario doesn't match those other issues.

          The below log is what shows after Jenkins comes back online after restarting and the job attempts to resume. 

           
          {code:java}
          Resuming build at Tue Jul 19 23:26:56 UTC 2022 after Jenkins restart
          Waiting to resume part of test-job #5: Waiting for next available executor
          Ready to run at Tue Jul 19 23:27:01 UTC 2022
          wrapper script does not seem to be touching the log file in /data/jenkins_home/workspace/test-job@tmp/durable-b0167617
          (JENKINS-48300: if on an extremely laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=86400) {code}
           
           * After the above message throws, the job goes into a "failed" state.
           * Manually touching/writing to the log file does not resolve the problem.
           * The issue is not the filesystem nor available memory as other solutions have mentioned in related tickets/posts. 
           * There are no available plugin updates (fully up to date).
           * This seemed to happen when we got on the 2.332 version \{*}which also included the migration to systemd{*}. So, *is there a possibility that the service restart using systemd (versus the old init system used previous to 2.332) is breaking the durable task?*

          h2. {color:#172b4d}Use Case / Impact{color}

          Restarting the jenkins service is a normal part of workflows because we use the Jenkins init.groovy.d scripts to configure Jenkins itself (aka Jenkins configures Jenkins). "service jenkins restart" has been a part of our CI/CD workflow for a long time and only recently stopped working properly (after upgrading). This makes this problem high impact for my company. 

          Furthermore, being able to resume after a fault/unexpected crash of the Jenkins process is something that should be expected of the durable tasks. Because of this, I've marked this as a "major" impact. 
          New: h2.  

          {color:#172b4d}Problem:{color} 

          Since recent upgrades (2.332 and now 2.360), when running a pipeline project, *if the jenkins service is restarted, the pipeline process does not resume properly* once Jenkins process resumes. Worked in 2.2x


          h3. Steps to reproduce the issue: 
           # Start a pipeline job on a single-node Jenkins host.
           # While running, restart the jenkins service "service jenkins restart"
           # After Jenkins starts, the task attempts to resume, but instead eventually fails (log below). This used to work fine. 

          h2. {color:#172b4d}More details and notes: {color}

          This used to work perfectly fine on an older version of Jenkins (2.2x) but recently we upgraded the Jenkins hosts (through apt) to v2.332 and also upgraded the plugins and the issue started happening. The pipelines have not changed. I've since tried upgrading to 2.360 and still not working. All plugins are up to date as well. 

          The script output is similar to many other open/closed issues related to the durable task plugin, however this scenario doesn't match those other issues.

          The below log is what shows after Jenkins comes back online after restarting and the job attempts to resume. 

           
          {code:java}
          Resuming build at Tue Jul 19 23:26:56 UTC 2022 after Jenkins restart
          Waiting to resume part of test-job #5: Waiting for next available executor
          Ready to run at Tue Jul 19 23:27:01 UTC 2022
          wrapper script does not seem to be touching the log file in /data/jenkins_home/workspace/test-job@tmp/durable-b0167617
          (JENKINS-48300: if on an extremely laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=86400) {code}
           
           * After the above message throws, the job goes into a "failed" state.
           * Manually touching/writing to the log file does not resolve the problem.
           * The issue is not the filesystem nor available memory as other solutions have mentioned in related tickets/posts. 
           * There are no available plugin updates (fully up to date).
           * This seemed to happen when we got on the 2.332 version {*}which also included the migration to systemd{*}. So, *is there a possibility that the service restart using systemd (versus the old init system used previous to 2.332) is breaking the durable task?*

          h2. {color:#172b4d}Use Case / Impact{color}

          Restarting the jenkins service is a normal part of workflows because we use the Jenkins init.groovy.d scripts to configure Jenkins itself (aka Jenkins configures Jenkins). "service jenkins restart" has been a part of our CI/CD workflow for a long time and only recently stopped working properly (after upgrading). This makes this problem high impact for my company. 

          Furthermore, being able to resume after a fault/unexpected crash of the Jenkins process is something that should be expected of the durable tasks. Because of this, I've marked this as a "major" impact. 
          Matt Dee made changes -
          Description Original: h2.  

          {color:#172b4d}Problem:{color} 

          Since recent upgrades (2.332 and now 2.360), when running a pipeline project, *if the jenkins service is restarted, the pipeline process does not resume properly* once Jenkins process resumes. Worked in 2.2x


          h3. Steps to reproduce the issue: 
           # Start a pipeline job on a single-node Jenkins host.
           # While running, restart the jenkins service "service jenkins restart"
           # After Jenkins starts, the task attempts to resume, but instead eventually fails (log below). This used to work fine. 

          h2. {color:#172b4d}More details and notes: {color}

          This used to work perfectly fine on an older version of Jenkins (2.2x) but recently we upgraded the Jenkins hosts (through apt) to v2.332 and also upgraded the plugins and the issue started happening. The pipelines have not changed. I've since tried upgrading to 2.360 and still not working. All plugins are up to date as well. 

          The script output is similar to many other open/closed issues related to the durable task plugin, however this scenario doesn't match those other issues.

          The below log is what shows after Jenkins comes back online after restarting and the job attempts to resume. 

           
          {code:java}
          Resuming build at Tue Jul 19 23:26:56 UTC 2022 after Jenkins restart
          Waiting to resume part of test-job #5: Waiting for next available executor
          Ready to run at Tue Jul 19 23:27:01 UTC 2022
          wrapper script does not seem to be touching the log file in /data/jenkins_home/workspace/test-job@tmp/durable-b0167617
          (JENKINS-48300: if on an extremely laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=86400) {code}
           
           * After the above message throws, the job goes into a "failed" state.
           * Manually touching/writing to the log file does not resolve the problem.
           * The issue is not the filesystem nor available memory as other solutions have mentioned in related tickets/posts. 
           * There are no available plugin updates (fully up to date).
           * This seemed to happen when we got on the 2.332 version {*}which also included the migration to systemd{*}. So, *is there a possibility that the service restart using systemd (versus the old init system used previous to 2.332) is breaking the durable task?*

          h2. {color:#172b4d}Use Case / Impact{color}

          Restarting the jenkins service is a normal part of workflows because we use the Jenkins init.groovy.d scripts to configure Jenkins itself (aka Jenkins configures Jenkins). "service jenkins restart" has been a part of our CI/CD workflow for a long time and only recently stopped working properly (after upgrading). This makes this problem high impact for my company. 

          Furthermore, being able to resume after a fault/unexpected crash of the Jenkins process is something that should be expected of the durable tasks. Because of this, I've marked this as a "major" impact. 
          New: h2.  {color:#172b4d}Problem:{color} 

          Since recent upgrades (2.332 and now 2.360), when running a pipeline project, *if the jenkins service is restarted, the pipeline process does not resume properly* once Jenkins process resumes. Worked in 2.2x
          h3. Steps to reproduce the issue: 
           # Start a pipeline job on a single-node Jenkins host.
           # While running, restart the jenkins service "service jenkins restart"
           # After Jenkins starts, the task attempts to resume, but instead eventually fails (log below). This used to work fine. 

          h2. {color:#172b4d}More details and notes: {color}

          This used to work perfectly fine on an older version of Jenkins (2.2x) but recently we upgraded the Jenkins hosts (through apt) to v2.332 and also upgraded the plugins and the issue started happening. The pipelines have not changed. I've since tried upgrading to 2.360 and still not working. All plugins are up to date as well. 

          The script output is similar to many other open/closed issues related to the durable task plugin, however this scenario doesn't match those other issues.

          The below log is what shows after Jenkins comes back online after restarting and the job attempts to resume. 

           
          {code:java}
          Resuming build at Tue Jul 19 23:26:56 UTC 2022 after Jenkins restart
          Waiting to resume part of test-job #5: Waiting for next available executor
          Ready to run at Tue Jul 19 23:27:01 UTC 2022
          wrapper script does not seem to be touching the log file in /data/jenkins_home/workspace/test-job@tmp/durable-b0167617
          (JENKINS-48300: if on an extremely laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=86400) {code}
           
           * After the above message throws, the job goes into a "failed" state.
           * Manually touching/writing to the log file does not resolve the problem.
           * The issue is not the filesystem nor available memory as other solutions have mentioned in related tickets/posts. 
           * There are no available plugin updates (fully up to date).
           * This seemed to happen when we got on the 2.332 version {*}which also included the migration to systemd{*}. So, *is there a possibility that the service restart using systemd (versus the old init system used previous to 2.332) is breaking the durable task?*

          h2. {color:#172b4d}Use Case / Impact{color}

          Restarting the jenkins service is a normal part of workflows because we use the Jenkins init.groovy.d scripts to configure Jenkins itself (aka Jenkins configures Jenkins). "service jenkins restart" has been a part of our CI/CD workflow for a long time and only recently stopped working properly (after upgrading). This makes this problem high impact for my company. 

          Furthermore, being able to resume after a fault/unexpected crash of the Jenkins process is something that should be expected of the durable tasks. Because of this, I've marked this as a "major" impact. 
          Matt Dee made changes -
          Description Original: h2.  {color:#172b4d}Problem:{color} 

          Since recent upgrades (2.332 and now 2.360), when running a pipeline project, *if the jenkins service is restarted, the pipeline process does not resume properly* once Jenkins process resumes. Worked in 2.2x
          h3. Steps to reproduce the issue: 
           # Start a pipeline job on a single-node Jenkins host.
           # While running, restart the jenkins service "service jenkins restart"
           # After Jenkins starts, the task attempts to resume, but instead eventually fails (log below). This used to work fine. 

          h2. {color:#172b4d}More details and notes: {color}

          This used to work perfectly fine on an older version of Jenkins (2.2x) but recently we upgraded the Jenkins hosts (through apt) to v2.332 and also upgraded the plugins and the issue started happening. The pipelines have not changed. I've since tried upgrading to 2.360 and still not working. All plugins are up to date as well. 

          The script output is similar to many other open/closed issues related to the durable task plugin, however this scenario doesn't match those other issues.

          The below log is what shows after Jenkins comes back online after restarting and the job attempts to resume. 

           
          {code:java}
          Resuming build at Tue Jul 19 23:26:56 UTC 2022 after Jenkins restart
          Waiting to resume part of test-job #5: Waiting for next available executor
          Ready to run at Tue Jul 19 23:27:01 UTC 2022
          wrapper script does not seem to be touching the log file in /data/jenkins_home/workspace/test-job@tmp/durable-b0167617
          (JENKINS-48300: if on an extremely laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=86400) {code}
           
           * After the above message throws, the job goes into a "failed" state.
           * Manually touching/writing to the log file does not resolve the problem.
           * The issue is not the filesystem nor available memory as other solutions have mentioned in related tickets/posts. 
           * There are no available plugin updates (fully up to date).
           * This seemed to happen when we got on the 2.332 version {*}which also included the migration to systemd{*}. So, *is there a possibility that the service restart using systemd (versus the old init system used previous to 2.332) is breaking the durable task?*

          h2. {color:#172b4d}Use Case / Impact{color}

          Restarting the jenkins service is a normal part of workflows because we use the Jenkins init.groovy.d scripts to configure Jenkins itself (aka Jenkins configures Jenkins). "service jenkins restart" has been a part of our CI/CD workflow for a long time and only recently stopped working properly (after upgrading). This makes this problem high impact for my company. 

          Furthermore, being able to resume after a fault/unexpected crash of the Jenkins process is something that should be expected of the durable tasks. Because of this, I've marked this as a "major" impact. 
          New: h2.  {color:#172b4d}Problem:{color} 

          Since recent upgrades (2.332 and now 2.360), when running a pipeline project, *if the jenkins service is restarted, the pipeline process does not resume properly* once Jenkins process resumes. Worked in 2.2x. 

          This problem exists if you restart the service via a normal LInux service restart OR by using the jenkins-cli to trigger the restart. Same behavior either way. 
          h3. Steps to reproduce the issue: 
           # Start a pipeline job on a single-node Jenkins host.
           # While running, restart the jenkins service "service jenkins restart" ( OR using jenkins-cli.jar to restart )
           # After Jenkins starts, the task attempts to resume, but instead eventually fails (log below). This used to work fine. 

          h2. {color:#172b4d}More details and notes: {color}

          This used to work perfectly fine on an older version of Jenkins (2.2x) but recently we upgraded the Jenkins hosts (through apt) to v2.332 and also upgraded the plugins and the issue started happening. The pipelines have not changed. I've since tried upgrading to 2.360 and still not working. All plugins are up to date as well. 

          The script output is similar to many other open/closed issues related to the durable task plugin, however this scenario doesn't match those other issues.

          The below log is what shows after Jenkins comes back online after restarting and the job attempts to resume. 

           
          {code:java}
          Resuming build at Tue Jul 19 23:26:56 UTC 2022 after Jenkins restart
          Waiting to resume part of test-job #5: Waiting for next available executor
          Ready to run at Tue Jul 19 23:27:01 UTC 2022
          wrapper script does not seem to be touching the log file in /data/jenkins_home/workspace/test-job@tmp/durable-b0167617
          (JENKINS-48300: if on an extremely laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=86400) {code}
           
           * After the above message throws, the job goes into a "failed" state.
           * Manually touching/writing to the log file does not resolve the problem.
           * The issue is not the filesystem nor available memory as other solutions have mentioned in related tickets/posts. 
           * There are no available plugin updates (fully up to date).
           * This seemed to happen when we got on the 2.332 version {*}which also included the migration to systemd{*}. So, *is there a possibility that the service restart using systemd (versus the old init system used previous to 2.332) is breaking the durable task?*

          h2. {color:#172b4d}Use Case / Impact{color}

          Restarting the jenkins service is a normal part of workflows because we use the Jenkins init.groovy.d scripts to configure Jenkins itself (aka Jenkins configures Jenkins). "service jenkins restart" has been a part of our CI/CD workflow for a long time and only recently stopped working properly (after upgrading). This makes this problem high impact for my company. 

          Furthermore, being able to resume after a fault/unexpected crash of the Jenkins process is something that should be expected of the durable tasks. Because of this, I've marked this as a "major" impact. 

          lee benhart added a comment - - edited

          This is also an issue on jenkins LTS 3.346.3. Durable Task Plugin Version 500.v8927d9fd99d8

          lee benhart added a comment - - edited This is also an issue on jenkins LTS 3.346.3. Durable Task Plugin Version 500.v8927d9fd99d8
          Matt Dee made changes -
          Summary Original: After recent Jenkins upgrade, pipelines no longer resume properly after restart New: Pipelines do not resume properly after Jenkins restart
          Matt Dee made changes -
          Description Original: h2.  {color:#172b4d}Problem:{color} 

          Since recent upgrades (2.332 and now 2.360), when running a pipeline project, *if the jenkins service is restarted, the pipeline process does not resume properly* once Jenkins process resumes. Worked in 2.2x. 

          This problem exists if you restart the service via a normal LInux service restart OR by using the jenkins-cli to trigger the restart. Same behavior either way. 
          h3. Steps to reproduce the issue: 
           # Start a pipeline job on a single-node Jenkins host.
           # While running, restart the jenkins service "service jenkins restart" ( OR using jenkins-cli.jar to restart )
           # After Jenkins starts, the task attempts to resume, but instead eventually fails (log below). This used to work fine. 

          h2. {color:#172b4d}More details and notes: {color}

          This used to work perfectly fine on an older version of Jenkins (2.2x) but recently we upgraded the Jenkins hosts (through apt) to v2.332 and also upgraded the plugins and the issue started happening. The pipelines have not changed. I've since tried upgrading to 2.360 and still not working. All plugins are up to date as well. 

          The script output is similar to many other open/closed issues related to the durable task plugin, however this scenario doesn't match those other issues.

          The below log is what shows after Jenkins comes back online after restarting and the job attempts to resume. 

           
          {code:java}
          Resuming build at Tue Jul 19 23:26:56 UTC 2022 after Jenkins restart
          Waiting to resume part of test-job #5: Waiting for next available executor
          Ready to run at Tue Jul 19 23:27:01 UTC 2022
          wrapper script does not seem to be touching the log file in /data/jenkins_home/workspace/test-job@tmp/durable-b0167617
          (JENKINS-48300: if on an extremely laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=86400) {code}
           
           * After the above message throws, the job goes into a "failed" state.
           * Manually touching/writing to the log file does not resolve the problem.
           * The issue is not the filesystem nor available memory as other solutions have mentioned in related tickets/posts. 
           * There are no available plugin updates (fully up to date).
           * This seemed to happen when we got on the 2.332 version {*}which also included the migration to systemd{*}. So, *is there a possibility that the service restart using systemd (versus the old init system used previous to 2.332) is breaking the durable task?*

          h2. {color:#172b4d}Use Case / Impact{color}

          Restarting the jenkins service is a normal part of workflows because we use the Jenkins init.groovy.d scripts to configure Jenkins itself (aka Jenkins configures Jenkins). "service jenkins restart" has been a part of our CI/CD workflow for a long time and only recently stopped working properly (after upgrading). This makes this problem high impact for my company. 

          Furthermore, being able to resume after a fault/unexpected crash of the Jenkins process is something that should be expected of the durable tasks. Because of this, I've marked this as a "major" impact. 
          New: h2.  {color:#172b4d}Problem:{color} 

          Since recent upgrades (2.332 and now 2.360), when running a pipeline project, *if the jenkins service is restarted, the pipeline process does not resume properly* once Jenkins process resumes.

          This problem exists if you restart the service via a normal LInux service restart OR by using the jenkins-cli to trigger the restart. Same behavior either way.

          It worked fine in v2.2x. 
          h3. Steps to reproduce the issue: 
           # Start a pipeline job on a single-node Jenkins host.
           # While running, restart the jenkins service "service jenkins restart" ( OR using jenkins-cli.jar to restart )
           # After Jenkins starts, the task attempts to resume, but instead eventually fails (log below). This used to work fine. 

          h2. {color:#172b4d}More details and notes: {color}

          This used to work perfectly fine on an older version of Jenkins (2.2x) but recently we upgraded the Jenkins hosts (through apt) to v2.332 and also upgraded the plugins and the issue started happening. The pipelines have not changed. I've since tried upgrading to 2.360 and still not working. All plugins are up to date as well. 

          The script output is similar to many other open/closed issues related to the durable task plugin, however this scenario doesn't match those other issues.

          The below log is what shows after Jenkins comes back online after restarting and the job attempts to resume. 

           
          {code:java}
          Resuming build at Tue Jul 19 23:26:56 UTC 2022 after Jenkins restart
          Waiting to resume part of test-job #5: Waiting for next available executor
          Ready to run at Tue Jul 19 23:27:01 UTC 2022
          wrapper script does not seem to be touching the log file in /data/jenkins_home/workspace/test-job@tmp/durable-b0167617
          (JENKINS-48300: if on an extremely laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=86400) {code}
           
           * After the above message throws, the job goes into a "failed" state.
           * Manually touching/writing to the log file does not resolve the problem.
           * The issue is not the filesystem nor available memory as other solutions have mentioned in related tickets/posts. 
           * There are no available plugin updates (fully up to date).
           * This seemed to happen when we got on the 2.332 version {*}which also included the migration to systemd{*}. So, *is there a possibility that the service restart using systemd (versus the old init system used previous to 2.332) is breaking the durable task?*

          h2. {color:#172b4d}Use Case / Impact{color}

          Restarting the jenkins service is a normal part of workflows because we use the Jenkins init.groovy.d scripts to configure Jenkins itself (aka Jenkins configures Jenkins). "service jenkins restart" has been a part of our CI/CD workflow for a long time and only recently stopped working properly (after upgrading). This makes this problem high impact for my company. 

          Furthermore, being able to resume after a fault/unexpected crash of the Jenkins process is something that should be expected of the durable tasks. Because of this, I've marked this as a "major" impact. 
          Matt Dee made changes -
          Description Original: h2.  {color:#172b4d}Problem:{color} 

          Since recent upgrades (2.332 and now 2.360), when running a pipeline project, *if the jenkins service is restarted, the pipeline process does not resume properly* once Jenkins process resumes.

          This problem exists if you restart the service via a normal LInux service restart OR by using the jenkins-cli to trigger the restart. Same behavior either way.

          It worked fine in v2.2x. 
          h3. Steps to reproduce the issue: 
           # Start a pipeline job on a single-node Jenkins host.
           # While running, restart the jenkins service "service jenkins restart" ( OR using jenkins-cli.jar to restart )
           # After Jenkins starts, the task attempts to resume, but instead eventually fails (log below). This used to work fine. 

          h2. {color:#172b4d}More details and notes: {color}

          This used to work perfectly fine on an older version of Jenkins (2.2x) but recently we upgraded the Jenkins hosts (through apt) to v2.332 and also upgraded the plugins and the issue started happening. The pipelines have not changed. I've since tried upgrading to 2.360 and still not working. All plugins are up to date as well. 

          The script output is similar to many other open/closed issues related to the durable task plugin, however this scenario doesn't match those other issues.

          The below log is what shows after Jenkins comes back online after restarting and the job attempts to resume. 

           
          {code:java}
          Resuming build at Tue Jul 19 23:26:56 UTC 2022 after Jenkins restart
          Waiting to resume part of test-job #5: Waiting for next available executor
          Ready to run at Tue Jul 19 23:27:01 UTC 2022
          wrapper script does not seem to be touching the log file in /data/jenkins_home/workspace/test-job@tmp/durable-b0167617
          (JENKINS-48300: if on an extremely laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=86400) {code}
           
           * After the above message throws, the job goes into a "failed" state.
           * Manually touching/writing to the log file does not resolve the problem.
           * The issue is not the filesystem nor available memory as other solutions have mentioned in related tickets/posts. 
           * There are no available plugin updates (fully up to date).
           * This seemed to happen when we got on the 2.332 version {*}which also included the migration to systemd{*}. So, *is there a possibility that the service restart using systemd (versus the old init system used previous to 2.332) is breaking the durable task?*

          h2. {color:#172b4d}Use Case / Impact{color}

          Restarting the jenkins service is a normal part of workflows because we use the Jenkins init.groovy.d scripts to configure Jenkins itself (aka Jenkins configures Jenkins). "service jenkins restart" has been a part of our CI/CD workflow for a long time and only recently stopped working properly (after upgrading). This makes this problem high impact for my company. 

          Furthermore, being able to resume after a fault/unexpected crash of the Jenkins process is something that should be expected of the durable tasks. Because of this, I've marked this as a "major" impact. 
          New: h2.  {color:#172b4d}Problem:{color} 

          Since recent upgrades (2.332 and now 2.360), when running a pipeline project, *if the jenkins service is restarted, the pipeline process does not resume properly* once Jenkins process resumes.

          This problem exists if you restart the service via a normal LInux service restart OR by using the jenkins-cli to trigger the restart. Same behavior either way.

          It worked fine in v2.2x. 
          h3. Steps to reproduce the issue: 
           # Start a pipeline job on a single-node Jenkins host.
           # While running, restart the jenkins service "service jenkins restart" ( OR using jenkins-cli.jar to restart )
           # After Jenkins starts, the task attempts to resume, but instead eventually fails (log below). This used to work fine. 

          h2. {color:#172b4d}More details and notes: {color}

          This used to work perfectly fine on an older version of Jenkins (2.2x) but recently we upgraded the Jenkins hosts (through apt) to v2.332 and also upgraded the plugins and the issue started happening. The pipelines have not changed. I've since tried upgrading to 2.360 and still not working. All plugins are up to date as well. 

          The script output is similar to many other open/closed issues related to the durable task plugin, however this scenario doesn't match those other issues.

          The below log is what shows after Jenkins comes back online after restarting and the job attempts to resume. 

           
          {code:java}
          Resuming build at Tue Jul 19 23:26:56 UTC 2022 after Jenkins restart
          Waiting to resume part of test-job #5: Waiting for next available executor
          Ready to run at Tue Jul 19 23:27:01 UTC 2022
          wrapper script does not seem to be touching the log file in /data/jenkins_home/workspace/test-job@tmp/durable-b0167617
          (JENKINS-48300: if on an extremely laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=86400) {code}
           
           * After the above message throws, the job goes into a "failed" state.
           * Manually touching/writing to the log file does not resolve the problem.
           * The issue is not the filesystem nor available memory as other solutions have mentioned in related tickets/posts. 
           * There are no available plugin updates (fully up to date).
           * This seemed to happen when we got on the 2.332 version {*}which also included the migration to systemd{*}. So, *is there a possibility that the service restart using systemd (versus the old init system used previous to 2.332) is breaking the durable task?*

          h2. {color:#172b4d}Use Case / Impact
          {color}

          {color:#172b4d}Major impact because this is a primary component that is not working:{color}
           * Being able to resume after a fault/unexpected crash of the Jenkins process is something that should be expected of the durable tasks (and the primary reason for durable pipelines to exist at all). 
           * Restarting the jenkins service is a normal part of workflows because we use the Jenkins init.groovy.d scripts to configure Jenkins itself (aka Jenkins configures Jenkins). "service jenkins restart" has been a part of our CI/CD workflow for a long time and only recently stopped working properly (after upgrading). 

          Matt Dee added a comment -

          This is still an issue with Jenkins 2.361.1 with Durable Task Plugin Version 500.v8927d9fd99d8

          Matt Dee added a comment - This is still an issue with Jenkins 2.361.1 with Durable Task Plugin Version 500.v8927d9fd99d8
          Matt Dee made changes -
          Description Original: h2.  {color:#172b4d}Problem:{color} 

          Since recent upgrades (2.332 and now 2.360), when running a pipeline project, *if the jenkins service is restarted, the pipeline process does not resume properly* once Jenkins process resumes.

          This problem exists if you restart the service via a normal LInux service restart OR by using the jenkins-cli to trigger the restart. Same behavior either way.

          It worked fine in v2.2x. 
          h3. Steps to reproduce the issue: 
           # Start a pipeline job on a single-node Jenkins host.
           # While running, restart the jenkins service "service jenkins restart" ( OR using jenkins-cli.jar to restart )
           # After Jenkins starts, the task attempts to resume, but instead eventually fails (log below). This used to work fine. 

          h2. {color:#172b4d}More details and notes: {color}

          This used to work perfectly fine on an older version of Jenkins (2.2x) but recently we upgraded the Jenkins hosts (through apt) to v2.332 and also upgraded the plugins and the issue started happening. The pipelines have not changed. I've since tried upgrading to 2.360 and still not working. All plugins are up to date as well. 

          The script output is similar to many other open/closed issues related to the durable task plugin, however this scenario doesn't match those other issues.

          The below log is what shows after Jenkins comes back online after restarting and the job attempts to resume. 

           
          {code:java}
          Resuming build at Tue Jul 19 23:26:56 UTC 2022 after Jenkins restart
          Waiting to resume part of test-job #5: Waiting for next available executor
          Ready to run at Tue Jul 19 23:27:01 UTC 2022
          wrapper script does not seem to be touching the log file in /data/jenkins_home/workspace/test-job@tmp/durable-b0167617
          (JENKINS-48300: if on an extremely laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=86400) {code}
           
           * After the above message throws, the job goes into a "failed" state.
           * Manually touching/writing to the log file does not resolve the problem.
           * The issue is not the filesystem nor available memory as other solutions have mentioned in related tickets/posts. 
           * There are no available plugin updates (fully up to date).
           * This seemed to happen when we got on the 2.332 version {*}which also included the migration to systemd{*}. So, *is there a possibility that the service restart using systemd (versus the old init system used previous to 2.332) is breaking the durable task?*

          h2. {color:#172b4d}Use Case / Impact
          {color}

          {color:#172b4d}Major impact because this is a primary component that is not working:{color}
           * Being able to resume after a fault/unexpected crash of the Jenkins process is something that should be expected of the durable tasks (and the primary reason for durable pipelines to exist at all). 
           * Restarting the jenkins service is a normal part of workflows because we use the Jenkins init.groovy.d scripts to configure Jenkins itself (aka Jenkins configures Jenkins). "service jenkins restart" has been a part of our CI/CD workflow for a long time and only recently stopped working properly (after upgrading). 
          New: h2.  {color:#172b4d}Problem:{color} 

          Since recent upgrades (2.332 and now 2.360), when running a pipeline project, *if the jenkins service is restarted, the pipeline process does not resume properly* once Jenkins process resumes.

          This problem exists if you restart the service via a normal LInux service restart OR by using the jenkins-cli to trigger the restart. Same behavior either way.

          It worked fine in v2.2x. 
          h3. Steps to reproduce the issue: 
           # Start a pipeline job on a single-node Jenkins host.
           # While running, restart the jenkins service "service jenkins restart" ( OR using jenkins-cli.jar to restart )
           # After Jenkins starts, the task attempts to resume, but instead eventually fails (log below). This used to work fine. 

          h2. {color:#172b4d}More details and notes: {color}

          This used to work perfectly fine on an older version of Jenkins (2.2x) but recently we upgraded the Jenkins hosts (through apt) to v2.332 and also upgraded the plugins and the issue started happening. The pipelines have not changed. I've since tried upgrading to 2.360 and still not working. All plugins are up to date as well. 

          The script output is similar to many other open/closed issues related to the durable task plugin, however this scenario doesn't match those other issues.

          The below log is what shows after Jenkins comes back online after restarting and the job attempts to resume. 

           
          {code:java}
          Resuming build at Tue Jul 19 23:26:56 UTC 2022 after Jenkins restart
          Waiting to resume part of test-job #5: Waiting for next available executor
          Ready to run at Tue Jul 19 23:27:01 UTC 2022
          wrapper script does not seem to be touching the log file in /data/jenkins_home/workspace/test-job@tmp/durable-b0167617
          (JENKINS-48300: if on an extremely laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=86400) {code}
           
           * After the above message throws, the job goes into a "failed" state.
           * Manually touching/writing to the log file does not resolve the problem.
           * The issue is not the filesystem nor available memory as other solutions have mentioned in related tickets/posts. 
           * There are no available plugin updates (fully up to date).
           * This seemed to happen when we got on the 2.332 version {*}which also included the migration to systemd{*}. So, *is there a possibility that the service restart using systemd (versus the old init system used previous to 2.332) is breaking the durable task?*

          h2. {color:#172b4d}Use Case / Impact{color}

          {color:#172b4d}Major impact because this is a primary component that is not working:{color}
           * This is a regression
           * Being able to resume after a fault/unexpected crash of the Jenkins process is something that should be expected of the durable tasks (and the primary reason for durable pipelines to exist at all). 
           * Restarting the jenkins service is a normal part of workflows because we use the Jenkins init.groovy.d scripts to configure Jenkins itself (aka Jenkins configures Jenkins). "service jenkins restart" has been a part of our CI/CD workflow for a long time and only recently stopped working properly (after upgrading). 

            Unassigned Unassigned
            mdebord1 Matt Dee
            Votes:
            1 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated: