-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Jenkins LTS 2.190.2
Kubernetes Plugin 1.20.2
When my pods are killed by OOM, the nodes aren't removed, this pollutes the interface and causes the job stay running but zombie.
If I click to abort the job it prints "Are you sure you want to abort null?"
This message come from executors.jelly when executor.currentExecutable.fullDisplayName is null.
On proceed it deletes the node, as expected.
In the logs I found these entries:
INFO o.c.j.p.k.pod.retention.Reaper#eventReceived: default/infra-mf3jg was just deleted, so removing corresponding Jenkins agent INFO j.s.DefaultJnlpSlaveReceiver#channelClosed: IOHub#1: Worker[channel:java.nio.channels.SocketChannel[connected local=/172.17.0.2:50000 remote=ip-172-16-29-221.ec2.internal/172.16.29.221:39454]] / Computer.threadPoolForRemoting [#12347] for infra-mf3jg terminated: java.nio.channels.ClosedChannelException
I think it's related to Reaper class, when DELETED event is received (here) which calls Node#removeNode.] There I found this comment "If the node instance is not in the list of nodes, then this will be a no-op, even if there is another instance with the same".
I think by some reason the instance passed by Reaper is different from Node, which causes it to be ignored.
The OfflineCause for the node is "Node is being removed"