Since recent upgrades (2.332 and now 2.360), when running a pipeline project, if the jenkins service is restarted, the pipeline process does not resume properly once Jenkins process resumes. The job will eventually fail after attempting to resume for several minutes.
This problem exists if you restart the service via a normal LInux service restart OR by using the jenkins-cli to trigger the restart. Same behavior either way.
- Start a pipeline job on a single-node Jenkins host. Simple example Jenkinsfile below.
- While running, restart the jenkins service "service jenkins restart" ( OR using jenkins-cli.jar to restart )
- After Jenkins starts, the task attempts to resume, but instead eventually fails (log below). This used to work fine.
This used to work perfectly fine on an older version of Jenkins (2.2x) but recently we upgraded the Jenkins hosts (through apt on Ubuntu 20.04) to v2.332 and also upgraded the plugins and the issue started happening. The pipelines have not changed. I've since tried upgrading to 2.360 and still not working. All plugins are up to date as well.
The script output is similar to many other open/closed issues related to the durable task plugin, however this scenario doesn't match those other issues.
The below log is what shows after Jenkins comes back online after restarting and the job attempts to resume.
- The log file mentioned in this failure message does exist during job execution.
- Manually touching/writing to the log file does not resolve the problem.
- After the above message throws, the job goes into a "failed" state, but it takes a while.
- The issue is not the filesystem nor available memory as other solutions have mentioned in related tickets/posts.
- There are no available plugin updates (fully up to date).
- This seemed to happen when we got on the 2.332 version which also included the migration to systemd. So, is there a possibility that the service restart using systemd (versus the old init system used previous to 2.332) is breaking the durable task?
Major impact because this is a primary component that is not working:
- This is a regression
- Being able to resume after a fault/unexpected crash of the Jenkins process is something that should be expected of the durable tasks (and the primary reason for durable pipelines to exist at all). This is especially true for deployment automations.
- Restarting the jenkins service is a normal part of workflows when using the Jenkins init.groovy.d hook scripts to configure Jenkins itself (aka Jenkins configures Jenkins). "service jenkins restart" has been a part of our CI/CD workflow for a long time and only recently stopped working properly.