-
Bug
-
Resolution: Fixed
-
Blocker
-
Centos 7.7
Jenkins ver. 2.190.1 (installed by yum, not in container)
Durable Task Plugin v. 1.31
-
Powered by SuggestiMate -
1.33
A pipeline like this:
pipeline { agent { docker { label 'docker' image 'busybox' } } stages { stage("Test sh script in container") { steps { sh label: 'Echo "Hello World...', script: 'echo "Hello World!"' } } } }
Fails with this log:
Running in Durability level: PERFORMANCE_OPTIMIZED [Pipeline] Start of Pipeline (hide) [Pipeline] node Running on docker-node in /... [Pipeline] { [Pipeline] isUnix [Pipeline] sh + docker inspect -f . busybox . [Pipeline] withDockerContainer got-legaci-3 does not seem to be running inside a container $ docker run -t -d -u 1002:1002 -w <<hidden>> busybox cat $ docker top 645fd28fda5fa3c61a4b49e8a38e46e0eec331ddf6037d3f77821dd6984a185f -eo pid,comm [Pipeline] { [Pipeline] stage [Pipeline] { (Test sh script in container) [Pipeline] sh (Echo "Hello World...) process apparently never started in /... (running Jenkins temporarily with -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true might make the problem clearer) [Pipeline] } [Pipeline] // stage [Pipeline] } $ docker stop --time=1 645fd28fda5fa3c61a4b49e8a38e46e0eec331ddf6037d3f77821dd6984a185f $ docker rm -f 645fd28fda5fa3c61a4b49e8a38e46e0eec331ddf6037d3f77821dd6984a185f [Pipeline] // withDockerContainer [Pipeline] } [Pipeline] // node [Pipeline] End of Pipeline ERROR: script returned exit code -2 Finished: FAILURE
Adding the -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true parameter gives this log:
Running in Durability level: PERFORMANCE_OPTIMIZED [Pipeline] Start of Pipeline [Pipeline] node Running on docker-node in /... [Pipeline] { [Pipeline] isUnix [Pipeline] sh + docker inspect -f . busybox . [Pipeline] withDockerContainer got-legaci-3 does not seem to be running inside a container $ docker run -t -d -u 1002:1002 -w <<hidden>> busybox cat $ docker top 31b7474756f8ff5b1f0d12d0df952347e584b47113108d1f965adeeb0ee78e5e -eo pid,comm [Pipeline] { [Pipeline] stage [Pipeline] { (Test sh script in container) [Pipeline] sh (Echo "Hello World...) OCI runtime exec failed: exec failed: container_linux.go:346: starting container process caused "exec: \"/var/jenkins/caches/durable-task/durable_task_monitor_1.31_unix_64\": stat /var/jenkins/caches/durable-task/durable_task_monitor_1.31_unix_64: no such file or directory": unknown process apparently never started in /... [Pipeline] } [Pipeline] // stage [Pipeline] } $ docker stop --time=1 31b7474756f8ff5b1f0d12d0df952347e584b47113108d1f965adeeb0ee78e5e $ docker rm -f 31b7474756f8ff5b1f0d12d0df952347e584b47113108d1f965adeeb0ee78e5e [Pipeline] // withDockerContainer [Pipeline] } [Pipeline] // node [Pipeline] End of Pipeline ERROR: script returned exit code -2 Finished: FAILURE
Tested on three different Jenkins masters with similar, but no identical, configurations.
Reverting to Durable Task Plugin v. 1.30 "solves" the problem.
- is duplicated by
-
JENKINS-59906 durable-task plugin v1.31 fails to execute shell commands on swarm client
-
- Closed
-
-
JENKINS-59939 durable_task_monitor_1.31_unix_64: no such file or directory
-
- Closed
-
[JENKINS-59903] durable-task v1.31 breaks sh steps in pipeline when running in a Docker container
I confirm it affects me as well:
Jenkins ver. 2.190.1, docker, kubernetes and other plugins: latest stable version.
Jenkins runs on linux (inside a docker container)
Also a problem when running Jenkins not in Docker and executing pipeline on Jenkins master (=> psst!); problem is definitely caused by new "Durable Task" plugin v1.31 bug:
- Latest version of Jenkins core (v2.201) running on Ubuntu 16.04
- The logs seem to indicate the problem already appears when trying to start the Docker container in the pipeline, but maybe the logs are just mangled?
- See "// !!!" comment in belows build log...
- Build log (proprietary):
... [Pipeline] // stage [Pipeline] stage [Pipeline] { (linkchecker) [Pipeline] script [Pipeline] { [Pipeline] withEnv [Pipeline] { [Pipeline] withDockerRegistry [Pipeline] { [Pipeline] isUnix [Pipeline] sh 06:37:20 + docker inspect -f . ACME/linkchecker:5 06:37:20 06:37:20 Error: No such object: ACME/linkchecker:5 06:37:20 06:37:20.408218 durable_task_monitor.go:63: exit status 1 // !!! [Pipeline] isUnix [Pipeline] sh 06:37:20 + docker inspect -f . dockerregistry.ACME.com/ACME/linkchecker:5 06:37:20 . [Pipeline] withDockerContainer 06:37:20 Jenkins does not seem to be running inside a container 06:37:20 $ docker run -t -d -u 10112:10005 -w /var/lib/jenkins/workspace/Sandbox/ACME.linkCheckerPipeline -v /var/lib/jenkins/workspace/Sandbox/ACME.linkCheckerPipeline:/var/lib/jenkins/workspace/Sandbox/ACME.linkCheckerPipeline:rw,z -v /var/lib/jenkins/workspace/Sandbox/ACME.linkCheckerPipeline@tmp:/var/lib/jenkins/workspace/Sandbox/ACME.linkCheckerPipeline@tmp:rw,z -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** dockerregistry.ACME.com/ACME/linkchecker:5 cat 06:37:21 $ docker top 39f784ea27cbf6593fd40c1faaf04948daae94e97eb8ba42517f7c2f5e40c21e -eo pid,comm [Pipeline] { [Pipeline] sh 06:42:27 process apparently never started in /var/lib/jenkins/workspace/Sandbox/ACME.linkCheckerPipeline@tmp/durable-aed939a9 // !!! 06:42:27 (running Jenkins temporarily with -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true might make the problem clearer) [Pipeline] } ... ERROR: script returned exit code -2 Finished: FAILURE
- Pipeline code (in shared Jenkins pipeline library):
... void execute(Closure configBody) { LinkCheckerRunDSL config = calcConfiguration(configBody) // Then build, based on the configuration provided: script.docker.withRegistry(Constants.ACME_DOCKER_REGISTRY_URL) { script.docker.image(config.dockerImage).inside() { c -> script.sh 'linkchecker --version' ...
Having the same issue with the 1.31 version of durable-task plugin. Had to rollback to 1.30.
Also having the same issue with the 1.31 version of durable-task plugin. Also fixed by rolling back to 1.30
Here we are also affected (rolled back to 1.30) for docker.inside steps.
The problems is that the new wrapper binary is at a location that is (on purpose) not exposed inside the docker container.
Only the workspace and auxilliary workspace (workspace@tmp ) are currently mapped into the container by default.
Faced the same issue on Jenkins 2.201. Downgrading to durable-task plugin v1.30 helped to resolve the issue
the same, we face it with docker.inside after upgrade to 2.201, the works thing it is that you do not see the error until you enable `org.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS`
pipeline { agent {label 'linux && immutable'} stages { stage("Test sh script in container") { steps { script { docker.image('node:12').inside(){ echo "Docker inside" sh label: 'Echo "Hello World...', script: 'echo "Hello World!"' } } } } } }
I'd share the script to change the property from the Jenkins console
import static org.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS println("LAUNCH_DIAGNOSTICS=" + LAUNCH_DIAGNOSTICS) LAUNCH_DIAGNOSTICS = true println("LAUNCH_DIAGNOSTICS=" + LAUNCH_DIAGNOSTICS)
We hit the same issue after upgrading this morning. Downgrading to 1.30 resolved the issue for us too.
in our case, it seems related to docker.inside, if we run a similar docker command it works
pipeline { agent {label 'linux && immutable'} stages { stage("Test sh script in container") { steps { sh label: 'This works', script: """ docker run -t -v ${env.WORKSPACE}:${env.WORKSPACE} -u \$(id -u):\$(id -g) -w ${env.WORKSPACE} -e HOME=${env.WORKSPACE} node:12 echo 'Hello World!' """ script { docker.image('node:12').inside(){ echo "Docker inside" sh label: 'Im gonna fail', script: 'echo "Hello World!"' } } } } } }
I found a workaround but it is horrible for some reason the durable task is looking for the Jenkins cache inside the docker container that obviously is not there so if you mount the cache folder you resolve the issue but this means I have to change every docker.inside, I think we can back to 1.29 before the latest changes on the way to manage sh step
pipeline { agent {label 'linux && immutable'} stages { stage("Test sh script in container") { steps { script { docker.image('node:12').inside("-v /var/lib/jenkins/caches/durable-task:/var/lib/jenkins/caches/durable-task"){ echo "Docker inside" sh label: 'Im gonna fail', script: 'echo "Hello World!"' } } } } } }
I have found the cause the changes in this commit https://github.com/jenkinsci/durable-task-plugin/commit/1f59c5229b9ff83709add3e202f8e49ff463106c it is related to a new binary launcher
I can confirm it if you disable the new binary launcher it works, you can disable the property on runtime by executing this script in the Jenkins console
import static org.jenkinsci.plugins.durabletask. BourneShellScript.FORCE_SHELL_WRAPPER println("FORCE_SHELL_WRAPPER=" + FORCE_SHELL_WRAPPER) FORCE_SHELL_WRAPPER = true println("FORCE_SHELL_WRAPPER=" + FORCE_SHELL_WRAPPER)
I can confirm downgrading "Durable Task Plugin" to v1.30 fixed the issue for us as well. We're running Jenkins 2.176.2.
We're running Alpine linux builds. I briefly saw an error with a failure about not being able to run `ps` – it could be related to this block: https://github.com/jenkinsci/durable-task-plugin/commit/1f59c5229b9ff83709add3e202f8e49ff463106c#diff-b7cdd655e1fb1fd95154b2fbcb20e8e3R525
switch (platform) { case SLIM: // (See JENKINS-58656) Running in a container with no init process is guaranteed to leave a zombie. Just let this test pass. // Debian slim does not have ps // [...] }
do { // //[...] } while (psString.contains(exitString));
I feel that I should mention that I also have the issue with 1.31 (downgrade to 1.30 fixes it) without using Docker.
Borrowing from a comment above:
pipeline { agent { label 'raspberry-build' } stages { stage("Test sh script in container") { steps { sh label: 'Echo "Hello World...', script: 'echo "Hello World!"' } } } }
I just ran into this issue today too. Anyone know where I can get the .hpi file in order to downgrade to v 1.30?
Usually you can just downgrade from Manage Plugins > Installed.
If that doesn't work for you, try this:
http://repo.jenkins-ci.org/releases/org/jenkins-ci/plugins/durable-task/1.30/durable-task-1.30.hpi
Via: https://javalibs.com/artifact/org.jenkins-ci.plugins/durable-task
aaaustin10 From the looks of the label in your example, you might be having the JENKINS-59907 problem, where the new wrapper doesn't work on all platforms.
pzozobrado I suspect you might be having some similar problem, if you are not using the agent{docker{...}} or docker.inside()} pipeline instructions.
chizcw this was a fresh install today, so I didn't have that downgrade option. Thanks for the link; that worked.
Also confirming that
org.jenkinsci.plugins.durabletask.BourneShellScript.FORCE_SHELL_WRAPPER = true
is a working workaround, and that it can be set when Jenkins starts, just as LAUNCH_DIAGNOSTICS.
It would be great if the durable-task plugin could detect that it's running inside a container started by Jenkins, and only disable the wrapper in those steps, if that is to be the solution. "durable_task_monitor_1.31_unix_64" probably contains something of value, so disabling it system-wide doesn't feel like a solution.
I also have a usecase where no docker.inside is involved.
- Jenkins master is the official docker image jenkins/jenkins:2.190.1-alpine
- Agent is based on the adoptopenjdk/openjdk11:x86_64-ubuntu-jdk-11.0.4_11 image and connects to the master via
swarm plugin - Master and agent running on Docker 19.03.4 in swarm mode. The hosts are Ubuntu 18.04 LTS on VMware.
This pipeline code:
node('jdk11') { stage('test') { sh 'echo hi.' } }
- works on both master and agent with durable-task-plugin 1.30.
- works on the master with durable-task-plugin 1.31.
- fails on the agent when durable-task-plugin 1.31 is installed:
[Pipeline] Start of Pipeline [Pipeline] node Running on build_agent-java11-docker4 in /workspace/hugo [Pipeline] { [Pipeline] stage (hide) [Pipeline] { (test) [Pipeline] sh [Pipeline] } [Pipeline] // stage [Pipeline] } [Pipeline] // node [Pipeline] End of Pipeline Also: hudson.remoting.Channel$CallSiteStackTrace: Remote call to JNLP4-connect connection from 192.168.0.6/192.168.0.6:35540 at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1743) at hudson.remoting.UserRequest$ExceptionResponse.retrieve(UserRequest.java:357) at hudson.remoting.Channel.call(Channel.java:957) at hudson.FilePath.act(FilePath.java:1072) at hudson.FilePath.act(FilePath.java:1061) at org.jenkinsci.plugins.durabletask.BourneShellScript.launchWithCookie(BourneShellScript.java:169) at org.jenkinsci.plugins.durabletask.FileMonitoringTask.launch(FileMonitoringTask.java:99) at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.start(DurableTaskStep.java:317) at org.jenkinsci.plugins.workflow.cps.DSL.invokeStep(DSL.java:286) at org.jenkinsci.plugins.workflow.cps.DSL.invokeMethod(DSL.java:179) at org.jenkinsci.plugins.workflow.cps.CpsScript.invokeMethod(CpsScript.java:122) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1213) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1022) at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:42) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113) at org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:160) at org.kohsuke.groovy.sandbox.GroovyInterceptor.onMethodCall(GroovyInterceptor.java:23) at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxInterceptor.onMethodCall(SandboxInterceptor.java:157) at org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:158) at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:162) at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:132) at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:132) at com.cloudbees.groovy.cps.sandbox.SandboxInvoker.methodCall(SandboxInvoker.java:17) at com.cloudbees.groovy.cps.impl.ContinuationGroup.methodCall(ContinuationGroup.java:84) at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispatchOrArg(FunctionCallBlock.java:113) at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixArg(FunctionCallBlock.java:83) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72) at com.cloudbees.groovy.cps.impl.ConstantBlock.eval(ConstantBlock.java:21) at com.cloudbees.groovy.cps.Next.step(Next.java:83) at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:174) at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:163) at org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:129) at org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:268) at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:163) at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$001(SandboxContinuable.java:18) at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:51) at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:186) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:370) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$200(CpsThreadGroup.java:93) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:282) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:270) at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:66) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:131) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) java.nio.file.AccessDeniedException: /caches at sun.nio.fs.UnixException.translateToIOException(UnixException.java:90) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116) at sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:385) at java.nio.file.Files.createDirectory(Files.java:689) at java.nio.file.Files.createAndCheckIsDirectory(Files.java:796) at java.nio.file.Files.createDirectories(Files.java:782) at org.jenkinsci.plugins.durabletask.BourneShellScript$GetAgentInfo.invoke(BourneShellScript.java:473) at org.jenkinsci.plugins.durabletask.BourneShellScript$GetAgentInfo.invoke(BourneShellScript.java:440) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:3052) at hudson.remoting.UserRequest.perform(UserRequest.java:212) at hudson.remoting.UserRequest.perform(UserRequest.java:54) at hudson.remoting.Request$2.run(Request.java:369) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72) at java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:93) at java.lang.Thread.run(Thread.java:834) Finished: FAILURE
Looks like a problem accessing some cache: java.nio.file.AccessDeniedException: /caches.
Philip Zozobrado I suspect you might be having some similar problem, if you are not using the agent{docker{...}} or docker.inside()} pipeline instructions.
Yes. This is what we're using:
withDockerContainer([image: "php:latest", args: "-v ${WORKSPACE}:/project"]) { sh "echo 'started a container'" }
Not wanting to sound awful in the comments section, and I highly appreciate the endless efforts of (mostly?) volunteering developers. However it is a really bad experience if a minor version introduces breaking changes.
Why not just bump the version in such cases?
BTW a more portable fix piggy backing on njesper's solution:
args '--user=root --privileged -v ${HOME}/caches:${WORKSPACE}/../../caches'
This should fix it no matter what your configuration is, because it uses the same logic as implemented in the plugin: https://github.com/jenkinsci/durable-task-plugin/pull/106/files#diff-b7cdd655e1fb1fd95154b2fbcb20e8e3R485
I noted on the merge request for this stuff now that the approach isn't that great either - they are shipping a statically compiled go binary to use as an execution wrapper. This is bad as it breaks Jenkins on other architectures than x86 such as arm and ppc.
> Reverting to Durable Task Plugin v. 1.30 "solves" the problem.
This + restart (Jenkins) works for me (for now).
So, apologies for this taking so long to address. There is currently a fix in the works right now for this issue and JENKINS-59907 as well. I will also update the changelog that is currently being migrated to github. Caching will be disabled when the cache directory is unavailable to the agent.
The PR can be found here: https://github.com/jenkinsci/durable-task-plugin/pull/114
ci.jenkins.io is quite unstable right now. Hopefully things will get better sooner.
IMHO the introduction of the new binary is an error, I do not see a reason to do not make the same behavior on Java, also it copies a binary on the workspace that it is not mentioned anywhere and it comes from outside of the workspace I see potential security issues on that behavior wfollonier WDYT
The compatibility of the libc whit the binary was compiled could be a potential issue on different Linux distributions
if the stream is not open, launcherCmd woudl be "" What happened with the script in that case? I think that will do not execute anything and does not show any error
https://github.com/jenkinsci/durable-task-plugin/blob/b122d6b0f924c533b0a26d99a71779bbafc3c543/src/main/java/org/jenkinsci/plugins/durabletask/BourneShellScript.java#L207-L222
the interpreter would be launch with '-xe' options this could leak commands that you don't want to show
https://github.com/jenkinsci/durable-task-plugin/blob/b122d6b0f924c533b0a26d99a71779bbafc3c543/src/main/go/org/jenkinsci/plugins/durabletask/durable_task_monitor.go#L86
I dunno how this command behaves in Cygwin for example
https://github.com/jenkinsci/durable-task-plugin/blob/b122d6b0f924c533b0a26d99a71779bbafc3c543/src/main/go/org/jenkinsci/plugins/durabletask/durable_task_monitor.go#L109
Great news, carroll!
Apart from this issue ("cache folder not available"), and JENKINS-59907 ("binary cant run"), I've seen some comments that indicate that there might be a third kind of problems (which might not have it's own issue yet): Odd/rare distros or special configurations/installations, where the new binary wrapper can't find all the libs/tools it needs. Judging from the description of PR 114, this third type of problems might not be addressed.
If this use of a cache folder on the node is following Jenkins design guidelines I think it would be a good idea to bring up the question with relevant Docker integration plugin(s) why the cache folder isn't mounted when Jenkins starts the container. - I guess the new wrapper brings some kind of value, so it should be made to work in containers as well, if relevant.
We've also seen this essential issue on Solaris, AIX, and IBM i (AS/400). It was OK on Linux AMD64 and Windows x64. Reverting to Durable Task Plugin 1.30 resolved the issues.
I got this issue resolved by reverting to the previous version, but I am wondering why the console or Jenkins log had no information on what the underlying issue is. Isn't there a lack of sufficient logging and perhaps some sort of error handling here?
Since upgrading durable-task 1.30 to 1.31, we're seeing a lot of intermittent
Cannot run program "/home/ubuntu/caches/durable-task/durable_task_monitor_1.31_unix_64" (in directory "/home/ubuntu/workspace/path/to/job"): error=26, Text file busy
and, less often,
Cannot run program "/home/ubuntu/caches/durable-task/durable_task_monitor_1.31_unix_64" (in directory "/home/ubuntu/workspace/path/to/job"): error=13, Permission denied
This is all running on standard amd64 Ubuntu (no exotic OS or architecture) and not in Docker agents. Should I file a separate issue?
Hi everyone, sorry for the issues. I filed https://github.com/jenkins-infra/update-center2/pull/305 to suspend durable-task 1.31 from distribution for now. As a workaround, you can roll back to 1.30 or add org.jenkinsci.plugins.durabletask.BourneShellScript.FORCE_SHELL_WRAPPER=true as a system property to the JVM running Jenkins (or set the same Groovy variable to true dynamically via the script console, though that will be unset if you restart Jenkins).
For some context, the new binary was intended to improve some long-standing robustness issues with the existing shell wrapper by being able to use utilities like setsid, and to make it more maintainable going forward. The code that detects whether to use the binary or the existing shell wrapper obviously needs to handle additional cases, and we need to add some more testing for other platforms where possible, in particular the Docker-based workflows that were broken by the change. Ideally, changes to implementation details like this would be transparent to users and wouldn't cause breaking changes, but this plugin handles lot of subtly different platforms at the same time and can only test on some of them, so changes always seem to cause problems.
jonathanb1 I ran into this issue when I was testing out the fix and running tests on it the very first time. It was solved immediately by running a mvn clean install, so unfortunately I was not able to investigate deeper into the issue as I still can't reproduce it. I think this issue will be solved by reverting to 1.30 and installing again. If that does not solve it, I think it warrants a separate issue.
njesper I don't think there has been anything official on using caches on the agent. Not many plugins use caching, but i think is something that we should explore further since I think most people want to reduce the workload of the masters.
haridsv Yes, there should be more error handling involved. I am looking to add that in. The tricky part is the script is supposed to be launched as a fire and forget, this includes the original shell wrapper as well. But of course, it's one thing if your shell fails to launch vs this binary.
Ubuntu 18.04 Jenkins 2.202 - I can confirm downgrading durable-task-plugin resolved this issue
carroll if all the thing is related to reduce the load on the master I think there are simpler and better ways to make it, first, the basics with console output avoid to have insane verbose console output, I mean, have a 5-10GB of console logs is stupid, if you need to review this file probably you drive crazy trying to open the file, so reduce the console output is a key, if you need verbose output for some command to redirect the output to files and at the end of the job archive them on Jenkins. If after all you still think that this cache is needed, make it with something standard, do not reinvent the wheel, named pipes works on every Unix implementation, there is also an implementation for windows, they are plain files easy to manage from Java so you would avoid a ton of problems related with the platform.
Because durable-task-plugin is something you could not rid of it if you use pipelines is a critical component, maybe this cache would go in another plugin and keep the durable-task as it is, or allow to rid completely from durable-task it causes more issues than benefits if you do not want to restart pipelines from any point after a failure, that it is an antipattern IMHO the pipeline should pass on one round if not it is not well designed and you should split it.
So a new release 1.32 is out. Until we have a fix out resolving this ticket and, at least, JENKINS-59907, the binary will be disabled by default.
ifernandezcalvoHi Ivan, actually the caching was added as a way to reduce the number of times the master is transmitting the binary over to the agent. What was not taken into account was that the cache directory chosen may not be accessible to the job. A fix is in the works.
The binary wrapper itself was added to make the original shell wrapper script more maintainable rather than mystical. There was also an attempt to reduce the issues where the script itself was being terminated for unknown reasons. One of the ways to do this was to use setsid instead of nohup (See JENKINS-25503). The reason the launched script's output is being redirected to a file is so that the output can be transmitted to master in order to display the script's output.
carroll 1.32 does not resolve the issue for me.
When running a sh step remotely on a dockerized agent as described above, I still get java.nio.file.AccessDeniedException: /caches, see details above.
albers How are you running your container?
I'm guessing wildly here, but to me it looks like your node config is setting "Remote root directory" to /. And I'm also guessing that you are running the container as a specific user, e.g. '-u jenkins:jenkins' and probably mount the workspace like e.g. '-v /home/jenkins/workspace:/workspace'. And then start the agent inside the container.
With such a setup the Jenkins agent will probably not have enough permissions to create '/cache', which the plugin perhaps still is trying to do even if it's set to not use the new wrapper.
Try adding e.g. '-v /home/jenkins/cache:/cache' (modified to your config) or pre-creating a /cache folder in your image that is owned by 'jenkins:jenkins' (the user you run the container as).
njesper Your questions pointed me to a solution, thanks a lot.
But first the answers:
The Docker image of the agent runs as the user jenkins. The swarm client plugin sets the "Remote root directory" to "/" when connecting to the master and dynamically creating an agent. The image has an existing /workspace directory that is writable for the user jenkins. The user jenkins obviously does not have sufficient permissions to create a directory in /.
The swarm client can be configured to use a specific root directory. If I set it to a directory where the user jenkins has write permission, the build will successfully create a directory caches alongside the workspace directory.
Another solution would be to pre-create the /caches directory in the image as well.
I'm fine with this solution.
But the bottom line is that we need documentation that the user who performs the build must have sufficient permissions to create directories in the build root, or that specific directories need to exist with appropriate permissions.
I apologize, what 1.31 did was disable the binary wrapper as default, but it did not resolve the caching issue because the cache dir is stil trying to be created. I am in the process of merging in my current fix (https://github.com/jenkinsci/durable-task-plugin/pull/114) into master.
albers once the fix gets through, those users who do not have permissions to create directories in the build root will have caching disabled.
So version 1.33 has now been released. This includes the fix for disabling cache when there are insufficient permissions to access the cache dir. The binary is still disabled by default.
carroll 1.33 works for my usecase (build root in /, user not having permissions to create /caches directory)
I've found this also reproduces when using build agents in Kubernetes, not just Docker. The problem here is that Kubernetes launches two containers into a pod with a shared mount: a JNLP slave container, which Jenkins does have permission to write the cache directory in, and a build container (in my case kubectl, but could be any container without a Jenkins user) where it does not necessarily have the same permission, in which code actually runs. The plugin runs its test inside the JNLP container, enables the wrapper, and then exhibits the same hanging behavior when commands are run in the kubectl container.
Tests run on the latest (v1.33) of durable-task.
Logs with LAUNCH_DIAGNOSTICS set:
sh: 1: cannot create /home/jenkins/agent/workspace/REDACTED_PR-5140@tmp/durable-cca9ec47/jenkins-log.txt: Permission denied sh: 1: cannot create /home/jenkins/agent/workspace/REDACTED_PR-5140@tmp/durable-cca9ec47/jenkins-result.txt.tmp: Permission denied touch: cannot touch '/home/jenkins/agent/workspace/REDACTED_PR-5140@tmp/durable-cca9ec47/jenkins-log.txt': Permission denied mv: cannot stat '/home/jenkins/agent/workspace/REDACTED_PR-5140@tmp/durable-cca9ec47/jenkins-result.txt.tmp': No such file or directory touch: cannot touch '/home/jenkins/agent/workspace/REDACTED_PR-5140@tmp/durable-cca9ec47/jenkins-log.txt': Permission denied [ last line repeated ~100 times ] process apparently never started in /home/jenkins/agent/workspace/REDACTED_PR-5140@tmp/durable-cca9ec47
In the JNLP container:
bash-4.4$ cd /home/jenkins/agent/caches bash-4.4$ ls -l total 0 drwxr-xr-x 2 jenkins jenkins 6 Mar 6 15:47 durable-task
In the kubectl container:
I have no name!@<REDACTED>:/home/jenkins/agent/caches$ ls -l total 0 drwxr-xr-x 2 1000 1000 6 Mar 6 15:47 durable-taskI have no name!@<REDACTED>:/home/jenkins/agent/caches$ id uid=1001 gid=0(root) groups=0(root)
I've had some success today working around this by adding a security context to my pods, forcing a run as Jenkins's UID (which for me is 1000 - YMMV depending on how Jenkins is running), e.g.:
kind: Pod
metadata:
name: kubectl
spec:
containers:
- command:
- cat
image: bitnami/kubectl:1.14
imagePullPolicy: Always
name: kubectl
tty: true
securityContext:
runAsUser: 1000
don_code Please open a new issue instead of commenting here. In durable-task 1.33, the caches directory is not actually used by default, so I think you can ignore it. The problem in your case looks like permissions on the control directory for the script, and I think that you would run into the same problems on durable-task 1.30 or older, so I would check for similar bugs reported against Durable Task Plugin and/or Kubernetes Plugin.
After upgrading to version 1.33 My all Jenkins job failed with error. Please suggest the solution. Before upgrade there were no issues.
[Pipeline] shprocess apparently never started in XYZ/durable-xyz (running Jenkins temporarily with -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true might make the problem clearer) [Pipeline] }
komalb08 The code that caused the issue described in this ticket is now disabled by default. Please open a new ticket and fully describe the issue you are seeing, and run Jenkins with the system property mentioned in the error message to get more details on the specific problem.
A different workaround is adding args '-v /var/jenkins-legaci-lab/caches:/var/jenkins-legaci-lab/caches' to the docker{...} declaration in the pipeline.
Like this: