-
Bug
-
Resolution: Unresolved
-
Major
-
Jenkins 2.138.1
Durable Task plugin 1.22
Try to run the following pipeline on a Windows node (you'll need a Unix shell port; I am using bash from Git for Windows):
node('mynode') { stage('Test') { dir('foo/bar') { sh 'sleep 1' } bat 'rename foo baz' } }
The bat step will fail with an "Access is denied" message.
It seems that the reason for this is that when the Durable Task plugin runs its wrapper script (in BourneShellScript.launchWithCookie), it doesn't wait for the wrapper process to finish before it allows pipeline execution to proceed to the next step. It only waits until the result file gets created. This means that for a fraction of time, the bat step will be running concurrently with these three processes created for the sh step:
- The wrapper script.
- The nohup process that is its parent.
- The background job that the wrapper script creates to periodically touch the log file.
And that fraction can be significant, because the background job sleeps in intervals of three seconds.
Now, those processes have as their working directory the directory of the sh step, which on Windows means that that directory is locked - it cannot be deleted, and its parent cannot be moved. And since the bat step starts executing before those processes terminate, it cannot rename the foo directory, even though it really should be able to.
To solve this, ideally, the Durable Task plugin should wait for the nohup process to terminate before proceeding with the next step. Failing that, it should at least make sure that the working directories of the auxiliary processes it spawns aren't within the workspace.
(I'm unable to easily test this using the most recent version of the plugin, but I reviewed the changes made since 1.22 and I'm fairly confident that the latest version still exhibits this issue.)
While it would be nice to improve the behavior here, it seems like there are some straightforward workarounds if I understand correctly:
We do not explicitly test using sh on Windows, so although you may be able to get it working with Cygwin, MSys, etc., and we will do our best to fix any regressions we introduce, it is not recommended, and something like this that has never worked (as far as I can tell) does not seem like something we would prioritize for a fix.
That said, the code is open source, and if you want to work on fixing it, I am happy to help review your changes. I would start with creating a reproduction test in workflow-durable-task-step step along the lines of the following to demonstrate the issue:
Once you have that test failing in the way you intend, you could make changes to BourneShellScript/FileMonitoringTask in durable-task to see how they affect the test. I don't have an environment to be able to reproduce the issue myself, but maybe something like switching these two lines so the workspace is cleaned up before the script exits would help?