[JENKINS-59340] Pipeline hangs when Agent pod is Terminated - Jenkins Jira

Type: Bug
Resolution: Fixed
Priority: Major
Component/s: kubernetes-plugin, workflow-durable-task-step-plugin
Labels:
None
Environment:
kubernetes-plugin:1.17.2
workflow-durable-task-step-plugin:2.33
core:2.176.3.2

Similar Issues:
Powered by SuggestiMate

Show
Released As:
kubernetes 1.25.5

When a agent pod gets terminated (for example OOMKilled by Kubernetes) during a pipeline build in a shell step:

the node remains in Jenkins, as disconnected
the pipeline hangs forever
the pod remains in kubernetes, in Terminated state, with OOMKilled status

A manual intervention is necessary to fix this situation:

Aborting the pipeline manually causes the node to be removed and the pod to eventually been deleted as well
Deleting the pod manually cause the node to be removed (after about 5 minutes for some reason) and eventually the pipeline to be aborted

Expected Behavior

The pipeline should abort automatically and the node be automatically removed.

How to Reproduce

We need to simulate a pod failure when the agent is connected and building a pipeline. To reproduce this, I am using a jnlp agent with stress-ng: [dohbedoh/jnlp-stress-agent:alpine](https://hub.docker.com/r/dohbedoh/jnlp-stress-agent)

Create a pipeline that simulate an kubernetes `OOMKilled` during the build:

pipeline {
  agent {
    kubernetes {
      yaml """
metadata:
  labels:
    cloudbees.com/master: "dse-team-apac"
    jenkins: "slave"
    jenkins/stress: "true"
spec:
  containers:
  - name: "jnlp"
    image: "dohbedoh/jnlp-stress-agent:alpine"
    imagePullPolicy: "Always"
    resources:
      limits:
        memory: "128Mi"
        cpu: "0.2"
      requests:
        memory: "100Mi"
        cpu: "0.2"
    securityContext:
      privileged: true
    tty: true
"""
    }
  }
  stages {
    stage('stress') {
      steps {
        sh "stress-ng --vm 2 --vm-bytes 1G  --timeout 30s -v"
      }
    }
  }
}

The pod should get OOMKilled by kubernetes:

$ kubectl get pod dse-team-apac-aburdajewicz-testscenario-4-10xd4-558nc-5khzj
NAME                                                          READY   STATUS      RESTARTS   AGE
dse-team-apac-aburdajewicz-testscenario-4-10xd4-558nc-5khzj   0/1     OOMKilled   0          3m21s

And the pipeline jobs show the disconnection and hangs forever:

Running on dse-team-apac-aburdajewicz-testscenario-4-10xd4-558nc-5khzj in /home/jenkins/workspace/dse-team-apac/aburdajewicz/testScenario
[Pipeline] {
[Pipeline] stage
[Pipeline] { (stress)
[Pipeline] sh
+ stress-ng --vm 2 --vm-bytes 1G --timeout 30s -v
stress-ng: debug: [86] 2 processors online, 2 processors configured
stress-ng: info:  [86] dispatching hogs: 2 vm
stress-ng: debug: [86] cache allocate: default cache size: 46080K
stress-ng: debug: [86] starting stressors
stress-ng: debug: [86] 2 stressors spawned
stress-ng: debug: [89] stress-ng-vm: started [89] (instance 1)
stress-ng: debug: [89] stress-ng-vm using method 'all'
stress-ng: debug: [88] stress-ng-vm: started [88] (instance 0)
stress-ng: debug: [88] stress-ng-vm using method 'all'
Cannot contact dse-team-apac-aburdajewicz-testscenario-4-10xd4-558nc-5khzj: hudson.remoting.RequestAbortedException: java.nio.channels.ClosedChannelException

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

agent-oom-killed-description.txt
4 kB
2019-09-13 01:04
build.log
3 kB
2019-09-13 01:04
durabletask-and-workflowdurabletask-fine.log
27 kB
2019-09-13 01:04
kubernetes-plugin-fine.log
112 kB
2019-09-13 01:04
support-bundle_2019-09-13_00.50.40.zip
260 kB
2019-09-13 01:04

relates to

JENKINS-49707 Auto retry for elastic agents after channel closure

Resolved

links to

CloudBees-internal issue

PR #772

Assignee:: Vincent Latombe

Reporter:: Allan BURDAJEWICZ

Votes:: 2 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2019-09-13 00:49

Updated:: 2020-05-18 12:21

Resolved:: 2020-05-18 12:21

Details

Description

Expected Behavior

How to Reproduce

Attachments

Attachments

Issue Links

Activity

People

Dates