The pod does actually go away though outside of the node blocks.
Not in my experience, although I did see it get killed later for unknown reasons. Not sure if that is related to the bug.
Finally managed to reproduce. FTR:
- Run Microk8s.
- Run: java -Dhudson.Main.development=true -jar jenkins-war-2.164.1.war --prefix=/jenkins --httpPort=8090 --httpListenAddress=10.1.1.1
- Load Jenkins via http://10.1.1.1:8090/jenkins/ which should also set its default URL (needed by the Kubernetes jnlp container callback).
- Install {blueocean}}, pipeline-stage-view, and kubernetes plugins.
- Add a Kubernetes cloud (no further configuration needed).
- Create a Pipeline job based on yours but with that stuff uncommented, and with the resources section removed, to wit:
def label = "jenkins-input-repro-${UUID.randomUUID().toString()}"
def scmVars;
def endToEndTests(target) {
stage('End to end tests') {
container('ci') {
parallel '1': {
sh 'sleep 10'
}, '2': {
sh 'sleep 10'
}, '3': {
sh 'sleep 10'
}
}
}
}
def deploy(target) {
stage("Deploy to ${target}") {
container('ci') {
sh 'sleep 10'
}
}
}
podTemplate(label: label, podRetention: onFailure(), activeDeadlineSeconds: 600, yaml: """
apiVersion: v1
kind: Pod
spec:
containers:
- name: ci
image: golang:latest
tty: true
"""
) {
node(label) {
this.deploy('staging')
this.endToEndTests('staging')
}
stage('Approve prod') {
input message: 'Deploy to prod?'
}
node(label) {
this.deploy('prod')
this.endToEndTests('prod')
}
}
- Start a build.
- Wait for it to get to approval.
- /safeRestart
- When Jenkins comes back up, try to Proceed.
Not sure exactly how the endToEndTests plays into this, but anyway compared to the passing case, there is no Ready to run at … message, http://10.1.1.1:8090/jenkins/job/p/1/threadDump/ displays
Program is not yet loaded
Looking for path named ‘/home/jenkins/workspace/p’ on computer named ‘jenkins-input-repro-15556b06-275f-494f-84d3-4ba9c006f0c1--8b27g’
and jstack shows various threads waiting for a monitor on InputAction.getExecutions, with one locked like
"Handling GET /jenkins/job/p/wfapi/runs from 10.1.1.1 : …" … waiting on condition […]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <…> (a com.google.common.util.concurrent.AbstractFuture$Sync)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:258)
at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:91)
at org.jenkinsci.plugins.workflow.support.steps.input.InputAction.loadExecutions(InputAction.java:71)
- locked <…> (a org.jenkinsci.plugins.workflow.support.steps.input.InputAction)
at org.jenkinsci.plugins.workflow.support.steps.input.InputAction.getExecutions(InputAction.java:146)
- locked <…> (a org.jenkinsci.plugins.workflow.support.steps.input.InputAction)
at com.cloudbees.workflow.rest.external.RunExt.isPendingInput(RunExt.java:347)
at com.cloudbees.workflow.rest.external.RunExt.initStatus(RunExt.java:377)
at com.cloudbees.workflow.rest.external.RunExt.createMinimal(RunExt.java:241)
at com.cloudbees.workflow.rest.external.RunExt.createNew(RunExt.java:317)
at com.cloudbees.workflow.rest.external.RunExt.create(RunExt.java:309)
at com.cloudbees.workflow.rest.external.JobExt.create(JobExt.java:131)
at com.cloudbees.workflow.rest.endpoints.JobAPI.doRuns(JobAPI.java:69)
at …
all of which does point to JENKINS-37998 as a root cause. The other contributing issue is a durability problem in the kubernetes plugin: the agent has either gone offline or restarted without a persistent workspace.
Not in my experience, although I did see it get killed later for unknown reasons. Not sure if that is related to the bug.
Finally managed to reproduce. FTR:
Not sure exactly how the endToEndTests plays into this, but anyway compared to the passing case, there is no Ready to run at … message, http://10.1.1.1:8090/jenkins/job/p/1/threadDump/ displays
and jstack shows various threads waiting for a monitor on InputAction.getExecutions, with one locked like
all of which does point to
JENKINS-37998as a root cause. The other contributing issue is a durability problem in the kubernetes plugin: the agent has either gone offline or restarted without a persistent workspace.