Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-65602

Durable task pipeline failed at sh initialisation

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      I am using jenkins version : Jenkins 2.249.1

      Durable task : 1.35

      We run builds in kubenetes farm and our builds are dockerised, After the upgrade of Jenkins, and durable plugins as we see multiple issues raised with "sh" initialisation inside the container breaks, we considered the above solution as suggested and set workingDir: "/home/jenkins/agent" and builds are success after making the change.

       

      How ever still some of the builds on Jenkins are still failing randomly with same error

      [2021-05-10T15:16:33.046Z] [Pipeline] sh [2021-05-10T15:22:08.073Z] process apparently never started in /home/jenkins/agent/workspace/CORE-CommitStage@tmp/durable-f6a728e7 [2021-05-10T15:22:08.087Z] [Pipeline] }

      Also we had already enabled as per suggestions- 
      -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true \

      Issue is however not persistent but still jobs fails randomly.  Looking for permanent fix for the current issue.

        Attachments

          Activity

          Hide
          carroll Carroll Chiou added a comment -

          thanks for the tip Jonathon Lamon! That's very interesting behavior to say the least...I'll take a look.

          Show
          carroll Carroll Chiou added a comment - thanks for the tip Jonathon Lamon ! That's very interesting behavior to say the least...I'll take a look.
          Hide
          ollehu Olle added a comment - - edited

          Carroll Chiou Has there been any changes made related to this problem? For us, the problem seems to have disappeared - at least for the affected pipeline.

          Show
          ollehu Olle added a comment - - edited Carroll Chiou Has there been any changes made related to this problem? For us, the problem seems to have disappeared - at least for the affected pipeline.
          Hide
          mus65 m t added a comment - - edited

          We are running a Jenkins Agent in a Docker Container on Debian and we just hit this after upgrading to Debian 11.

          After enabling LAUNCH_DIAGNOSTICS as described, the following errors appeared:

          09:50:30 sh: 1: cannot create /home/jenkins/agent/workspace/<job-name>@tmp/durable-fe1d0f21/jenkins-log.txt: Directory nonexistent
          09:50:30 sh: 1: cannot create /home/jenkins/agent/workspace/<job-name>@tmp/durable-fe1d0f21/jenkins-result.txt.tmp: Directory nonexistent
          09:50:30 mv: cannot stat '/home/jenkins/agent/workspace/<job-name>@tmp/durable-fe1d0f21/jenkins-result.txt.tmp': No such file or directory

          Then I noticed that these directories were created on the Docker host instead of inside the container...

          The actual issue was the following:

          <docker-hostname> does not seem to be running inside a container

          Jenkins failed to detect that the Docker agent was running in a container. I guess that is why it created the directories on the Docker host instead of inside the container.

          This is caused by Debian 11 changing to cgroup v2 by default which breaks the container detection. Looking at the code in docker-workflow-plugin, it tries to get container id from /proc/self/cgroup , but on cgroup v2 this just returns "0::/".

          I worked around this by booting Debian with "systemd.unified_cgroup_hierarchy=false". Weirdly enough, it's also necessary to rebuild the container of the jenkins agent to fix it completely (the issue also didn't appear immediately after upgrading to Debian 11, but only after re-creating the agent container).

          See also this related issue: Container detection fails on cgroup v2 devices · Issue #1592 · GoogleContainerTools/kaniko (github.com)

          In this case, they seem to have fixed this by detecting whether /.dockerenv exists. But as far as I can see, this doesn't allow access to the container id (I don't know if Jenkins actually needs it though, currently it's being logged at least).

           

          edit: there already is a bug (and another workaround) for this specific issue JENKINS-64608 Detection "running inside container" fails with cgroup namespace "private" for docker daemon - Jenkins Jira

          Show
          mus65 m t added a comment - - edited We are running a Jenkins Agent in a Docker Container on Debian and we just hit this after upgrading to Debian 11. After enabling LAUNCH_DIAGNOSTICS as described, the following errors appeared: 09:50:30 sh: 1: cannot create /home/jenkins/agent/workspace/<job-name>@tmp/durable-fe1d0f21/jenkins-log.txt: Directory nonexistent 09:50:30 sh: 1: cannot create /home/jenkins/agent/workspace/<job-name>@tmp/durable-fe1d0f21/jenkins-result.txt.tmp: Directory nonexistent 09:50:30 mv: cannot stat '/home/jenkins/agent/workspace/<job-name>@tmp/durable-fe1d0f21/jenkins-result.txt.tmp' : No such file or directory Then I noticed that these directories were created on the Docker host instead of inside the container... The actual issue was the following: <docker-hostname> does not seem to be running inside a container Jenkins failed to detect that the Docker agent was running in a container. I guess that is why it created the directories on the Docker host instead of inside the container. This is caused by Debian 11 changing to cgroup v2 by default which breaks the container detection. Looking at the code in docker-workflow-plugin, it tries to get container id from /proc/self/cgroup , but on cgroup v2 this just returns "0::/". I worked around this by booting Debian with "systemd.unified_cgroup_hierarchy=false". Weirdly enough, it's also necessary to rebuild the container of the jenkins agent to fix it completely (the issue also didn't appear immediately after upgrading to Debian 11, but only after re-creating the agent container). See also this related issue:  Container detection fails on cgroup v2 devices · Issue #1592 · GoogleContainerTools/kaniko (github.com) In this case, they seem to have fixed this by detecting whether /.dockerenv exists. But as far as I can see, this doesn't allow access to the container id (I don't know if Jenkins actually needs it though, currently it's being logged at least).   edit: there already is a bug (and another workaround) for this specific issue  JENKINS-64608 Detection "running inside container" fails with cgroup namespace "private" for docker daemon - Jenkins Jira
          Hide
          carroll Carroll Chiou added a comment -

          Thanks for the data m t! One of the many reasons the docker-workflow-plugin is so challenging. A lot of times durable-task-plugin is a symptom of the underlying issue, but with limited error output, it's really hard to tell. I wonder if this issue is present when you don't use the the docker-workflow-plugin, i.e. just running the docker commands through shell?

          Show
          carroll Carroll Chiou added a comment - Thanks for the data m t ! One of the many reasons the docker-workflow-plugin is so challenging. A lot of times durable-task-plugin is a symptom of the underlying issue, but with limited error output, it's really hard to tell. I wonder if this issue is present when you don't use the the docker-workflow-plugin, i.e. just running the docker commands through shell?
          Hide
          mus65 m t added a comment -

          Carroll Chiou I would expect docker commands to fail as well for most use cases. I had a closer look on why this fails exactly:

          When the container for the pipeline is started with "docker run", the docker workflow plugin usually passes the volume of the agent with "--volumes-from=<agent container id>", so the pipeline container has access to the workspace (which was checked out inside the agent container). But it only does this when it detects that the agent itself is running in a container. Since the container detection fails, the volume with the workspace is not passed to the pipeline container and the aforementioned issues happen because the workspace doesn't exist.

          So in theory, using docker commands on the shell only works if you either use "skipDefaultCheckout()" and do the git clone inside the stage yourself or you pass "--volumes-from=<agent container id>" yourself with the "args" parameter in the declarative pipeline. 

          by the way: the workaround from JENKINS-64608 to run the agent container with "–cgroupns host" also works fine for me and is much better than reverting to cgroupv1 for the whole host system.

          Show
          mus65 m t added a comment - Carroll Chiou  I would expect docker commands to fail as well for most use cases. I had a closer look on why this fails exactly: When the container for the pipeline is started with "docker run", the docker workflow plugin usually passes the volume of the agent with "--volumes-from=<agent container id>", so the pipeline container has access to the workspace (which was checked out inside the agent container). But it only does this when it detects that the agent itself is running in a container. Since the container detection fails, the volume with the workspace is not passed to the pipeline container and the aforementioned issues happen because the workspace doesn't exist. So in theory, using docker commands on the shell only works if you either use "skipDefaultCheckout()" and do the git clone inside the stage yourself or you pass "--volumes-from=<agent container id>" yourself with the "args" parameter in the declarative pipeline.  by the way: the workaround from  JENKINS-64608  to run the agent container with "–cgroupns host" also works fine for me and is much better than reverting to cgroupv1 for the whole host system.

            People

            Assignee:
            Unassigned Unassigned
            Reporter:
            hiteshkumar Hitesh kumar
            Votes:
            3 Vote for this issue
            Watchers:
            7 Start watching this issue

              Dates

              Created:
              Updated: