Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-55527

Builds fail randomly when running sh in container

    • Icon: Bug Bug
    • Resolution: Not A Defect
    • Icon: Critical Critical
    • kubernetes-plugin
    • Running jenkins in a Kubernetes cluster on GCP

      My devs are complaining of builds failing randomly when a stage starts. The builds fail when attempting to run "sh" in a container in the pods running the job.
      Here is the error message I see. 

      [Pipeline] shrpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:87: adding pid 3786794 to cgroups caused \"failed to write 3786794 to cgroup.procs: write /sys/fs/cgroup/cpu,cpuacct/kubepods/besteffort/pod70971cd7-153a-11e9-9fe5-42010a567404/6b66fd31d9718f168c34810477e328045af5caead06e9e7f48ed3b9431eb3d37/cgroup.procs: invalid argument\""[Pipeline] echoError: java.io.IOException: Pipe closed...
      ...
      ...
      ERROR: script returned exit code 1
      Finished: FAILURE

          [JENKINS-55527] Builds fail randomly when running sh in container

          Carlos Sanchez added a comment - This looks like https://github.com/moby/moby/issues/31230 and the fix could be in runc v1.0.0-rc6 https://github.com/opencontainers/runc/pull/1916

          Ahmed Kamel added a comment -

          Got it. Thanks for posting these. I'll watch the issue over on github.

          Ahmed Kamel added a comment - Got it. Thanks for posting these. I'll watch the issue over on github.

          Andy Powell added a comment -

          Update: We were able to isolate this to a security scanner within our GKE cluster.  Turning it off made the problems go away.  

          Andy Powell added a comment - Update: We were able to isolate this to a security scanner within our GKE cluster.  Turning it off made the problems go away.  

          Ahmed Kamel added a comment -

          apowell are you talking about a GCP service or was it a 3rd party security scanner that was causing the issue?

          Ahmed Kamel added a comment - apowell are you talking about a GCP service or was it a 3rd party security scanner that was causing the issue?

          Andy Powell added a comment -

          akamel1001 it was not GCP, but the 3rd party product that was causing the issue.  GKE is running Jenkins after we turned off the 3rd party service.

          Andy Powell added a comment - akamel1001 it was not GCP, but the 3rd party product that was causing the issue.  GKE is running Jenkins after we turned off the 3rd party service.

          Ahmed Kamel added a comment -

          Go it. We have a very similar setup here.

          Feel free not to comment but was this 3rd party tool Twistlock by any chance? 

          Ahmed Kamel added a comment - Go it. We have a very similar setup here. Feel free not to comment but was this 3rd party tool Twistlock by any chance? 

          Andy Powell added a comment -

          yes it was

          Andy Powell added a comment - yes it was

          Thanks for figuring it out

          Carlos Sanchez added a comment - Thanks for figuring it out

          Ahmed Kamel added a comment -

          Awesome thank you apowell for tracking this down. We have disabled it and saw the error count drop significantly.

          Ahmed Kamel added a comment - Awesome thank you apowell for tracking this down. We have disabled it and saw the error count drop significantly.

          Ahmed Kamel added a comment -

          For whoever stumbles onto this thread.

          Our security team reached out to Twistlock to try and figure out the root cause of this issue. They told us they are aware of the issue and are working on updates. In the meantime here is a nice blog post that explains the issue and how it was found

           

          https://www.twistlock.com/2018/12/04/advanced-runc-debugging-fun-profit/

          Ahmed Kamel added a comment - For whoever stumbles onto this thread. Our security team reached out to Twistlock to try and figure out the root cause of this issue. They told us they are aware of the issue and are working on updates. In the meantime here is a nice blog post that explains the issue and how it was found   https://www.twistlock.com/2018/12/04/advanced-runc-debugging-fun-profit/

            csanchez Carlos Sanchez
            akamel1001 Ahmed Kamel
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: