Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-55256

ssh-agent in pipeline leaves defunct processes on swarm client

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • ssh-agent-plugin
    • None
    • Jenkins ver. 2.150.1, SSH Agent Plugin 1.17, Self-Organizing Swarm Plug-in Modules 3.15

      We run build nodes via Docker and the Swarm Plugin. After a while defunct processes start to pile up on the nodes:

       

      1000 9386 9371 1 Dec18 ? 00:10:49 java -jar /usr/share/jenkins/swarm-client-3.15-jar-with-dependencies.jar -fsroot /var/jenkins-node_home -master https://jenkins.cosmos.local -username XXX -password XXX -executors 10 -mode exclusive -labels linux basic build -name basic-node-dc -disableSslVerification -description Basic node

      [root@XXX:/root]# ps -ef | grep defu 
      1000 2489 9386 0 Dec18 ? 00:00:00 [ssh-agent] <defunct> 
      1000 2514 9386 0 Dec18 ? 00:00:00 [ssh-agent] <defunct> 
      1000 2544 9386 0 Dec18 ? 00:00:00 [ssh-agent] <defunct> 
      1000 2618 9386 0 Dec18 ? 00:00:00 [ssh-agent] <defunct> 

      ...

       

      We run ssh-agent often through many scripted pipelines so it is hard to trace it down to a specific Pipeline, but this behavior shouldn't occur to begin with.

       

          [JENKINS-55256] ssh-agent in pipeline leaves defunct processes on swarm client

          Philipp Moeller created issue -

          Philipp Moeller added a comment - - edited

          I just confirmed that this happens during normal execution of code like:

          node('basic') {
          sshagent(['XXXXX']) {
            sh "echo foo"
          } 
          }
          

          Philipp Moeller added a comment - - edited I just confirmed that this happens during normal execution of code like: node( 'basic' ) { sshagent([ 'XXXXX' ]) { sh "echo foo" } }
          Oleg Nenashev made changes -
          Assignee Original: Oleg Nenashev [ oleg_nenashev ]
          Andrew Bayer made changes -
          Component/s Original: pipeline [ 21692 ]

          Hi,

          We're seeing these defunct processes independently of Swarm, with Docker 1.13.1 on RHEL7.

          Johannes Meixner added a comment - Hi, We're seeing these defunct processes independently of Swarm, with Docker 1.13.1 on RHEL7.

          Basil Crow added a comment -

          This bug doesn't seem to be specific to the SSH slaves plugin or the Swarm Plugin. I'm reassigning this to the SSH Agent plugin component.

          Basil Crow added a comment - This bug doesn't seem to be specific to the SSH slaves plugin or the Swarm Plugin. I'm reassigning this to the SSH Agent plugin component.
          Basil Crow made changes -
          Component/s Original: swarm-plugin [ 15741 ]

          We're also seeing this on an OpenShift environment:
          OpenShift 3.10
          Jenkins 2.164.3
          SSH Agent Plugin 1.17

          Job runs without error but every time job has finished a new zombie process is born on underlying host.

          Christian Wehrli added a comment - We're also seeing this on an OpenShift environment: OpenShift 3.10 Jenkins 2.164.3 SSH Agent Plugin 1.17 Job runs without error but every time job has finished a new zombie process is born on underlying host.

          Jesse Glick added a comment -

          Jesse Glick added a comment - If true, could perhaps be reproduced using something like https://github.com/jenkinsci/durable-task-plugin/blob/28ad9826c25f57d58f8ded28b727f357c838d12a/src/test/java/org/jenkinsci/plugins/durabletask/BourneShellScriptTest.java#L526-L565

          Yacine added a comment - - edited

          running in a k8s pod - jenkins agent 

          I am not able to start some processes in a pipeline at some point because of this issue, so I tried to list the zombie processes every now and then in a test pipeline that has a lot of sh and ssh-agent calls

          [2021-10-06T12:21:49.842Z] + echo 'Number of Zombie Processes:'
          [2021-10-06T12:21:49.842Z] Number of Zombie Processes:
          [2021-10-06T12:21:49.842Z] + ps axo pid=,stat=
          [2021-10-06T12:21:49.842Z] + awk '$2~/^Z/ { print }'
          [2021-10-06T12:21:49.842Z] + wc -l
          [2021-10-06T12:21:49.842Z] 1173
          [2021-10-06T12:21:49.842Z] + ps axo pid=,stat=,ppid=,command=
          [2021-10-06T12:21:49.842Z] + awk '$2~/^Z/ { print }'
          [2021-10-06T12:21:49.842Z]   152 Z        1 [sh] <defunct>
          [2021-10-06T12:21:49.842Z]   166 Z        1 [sh] <defunct>
          [2021-10-06T12:21:49.842Z]   289 Z        1 [sh] <defunct>
          [2021-10-06T12:21:49.842Z]   297 Z        1 [sh] <defunct>
          [2021-10-06T12:21:49.842Z]   308 Zs       1 [ssh-agent] <defunct>
          [2021-10-06T12:21:49.842Z]   314 Z        1 [sh] <defunct>
          [2021-10-06T12:21:49.842Z]   326 Z        1 [sh] <defunct>
          [2021-10-06T12:21:49.842Z]   338 Zs       1 [ssh-agent] <defunct>
          [2021-10-06T12:21:49.842Z]   344 Z        1 [sh] <defunct>
          [2021-10-06T12:21:49.842Z]   355 Z        1 [sh] <defunct>
          [2021-10-06T12:21:49.842Z]   367 Zs       1 [ssh-agent] <defunct>
          [2021-10-06T12:21:49.842Z]   373 Z        1 [sh] <defunct>
          [2021-10-06T12:21:49.842Z]   384 Z        1 [sh] <defunct>
          [2021-10-06T12:21:49.842Z]   396 Zs       1 [ssh-agent] <defunct>
          ... 

          sh, and ssh-agent are leaving back a lot of defunct processes ( from that log for example 1173 Zombies.. ), all of them have the same parent ( PPID=1)

          All the Zombies seem to have the same Parent Process with PID 1 ( which I can't kill )

          Is there a way to solve this?
          ( other than having to start a new (k8s-pod) agent node..)

          Yacine added a comment - - edited running in a k8s pod - jenkins agent  I am not able to start some processes in a pipeline at some point because of this issue, so I tried to list the zombie processes every now and then in a test pipeline that has a lot of sh and ssh-agent calls [2021-10-06T12:21:49.842Z] + echo ' Number of Zombie Processes:' [2021-10-06T12:21:49.842Z] Number of Zombie Processes: [2021-10-06T12:21:49.842Z] + ps axo pid=,stat= [2021-10-06T12:21:49.842Z] + awk '$2~/^Z/ { print }' [2021-10-06T12:21:49.842Z] + wc -l [2021-10-06T12:21:49.842Z] 1173 [2021-10-06T12:21:49.842Z] + ps axo pid=,stat=,ppid=,command= [2021-10-06T12:21:49.842Z] + awk '$2~/^Z/ { print }' [2021-10-06T12:21:49.842Z] 152 Z 1 [sh] <defunct> [2021-10-06T12:21:49.842Z] 166 Z 1 [sh] <defunct> [2021-10-06T12:21:49.842Z] 289 Z 1 [sh] <defunct> [2021-10-06T12:21:49.842Z] 297 Z 1 [sh] <defunct> [2021-10-06T12:21:49.842Z] 308 Zs 1 [ssh-agent] <defunct> [2021-10-06T12:21:49.842Z] 314 Z 1 [sh] <defunct> [2021-10-06T12:21:49.842Z] 326 Z 1 [sh] <defunct> [2021-10-06T12:21:49.842Z] 338 Zs 1 [ssh-agent] <defunct> [2021-10-06T12:21:49.842Z] 344 Z 1 [sh] <defunct> [2021-10-06T12:21:49.842Z] 355 Z 1 [sh] <defunct> [2021-10-06T12:21:49.842Z] 367 Zs 1 [ssh-agent] <defunct> [2021-10-06T12:21:49.842Z] 373 Z 1 [sh] <defunct> [2021-10-06T12:21:49.842Z] 384 Z 1 [sh] <defunct> [2021-10-06T12:21:49.842Z] 396 Zs 1 [ssh-agent] <defunct> ... sh, and ssh-agent are leaving back a lot of defunct processes ( from that log for example 1173 Zombies.. ), all of them have the same parent ( PPID=1) All the Zombies seem to have the same Parent Process with PID 1 ( which I can't kill ) Is there a way to solve this? ( other than having to start a new (k8s-pod) agent node..)

            Unassigned Unassigned
            pmr Philipp Moeller
            Votes:
            5 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated: