-
Bug
-
Resolution: Fixed
-
Major
-
None
I've detected a situation when reserved node wasn't put into disposable queue and was reserved forever. I've tried to find out the possible path/place where an exception occurred and ForemanSharedNode.terminate() wasn't invoked, but I failed.
What I can see from the log is:
[ForemanAPI] Reserving host X [ForemanAPI] Retrieving ip="AA.BB.CC.DD" for X [ForemanSharedNode] Instancing a new ForemanSharedNode: name='X', label='LABEL' [hudson.slaves.NodeProvisioner] (jenkins.util.Timer [#10]) Started provisioning LABEL from Foreman-test with 1 executors. Remaining excess workload: 0
X, AA.BB.CC.DD and LABEL are obfuscated only. The third reported line is produced by not released branch yet (see here).
Nothing else was reported for node X. In that time 3 nodes were reserved while network to Foreman was dropped, the first detection was in:
2016-10-06 03:50:34,331 INFO [STDOUT] (jenkins.util.Timer [#10]) 03:50:34,330 ERROR [ForemanAPI] Unhandled exception getting JENKINS_LABEL for Y 2016-10-06 03:50:34,331 INFO [STDOUT] (jenkins.util.Timer [#10]) 03:50:34,331 ERROR [ForemanSharedNodeCloud] com.fasterxml.jackson.databind.JsonMappingException: No content to map due to end-of-input at [Source: ; line: 1, column: 1] 2016-10-06 03:50:34,331 ERROR [STDERR] (jenkins.util.Timer [#10]) com.fasterxml.jackson.databind.JsonMappingException: No content to map due to end-of-input at [Source: ; line: 1, column: 1] 2016-10-06 03:50:34,331 ERROR [STDERR] (jenkins.util.Timer [#10]) at com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:148) 2016-10-06 03:50:34,331 ERROR [STDERR] (jenkins.util.Timer [#10]) at com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:3781) 2016-10-06 03:50:34,331 ERROR [STDERR] (jenkins.util.Timer [#10]) at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3721) 2016-10-06 03:50:34,331 ERROR [STDERR] (jenkins.util.Timer [#10]) at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2726) 2016-10-06 03:50:34,331 ERROR [STDERR] (jenkins.util.Timer [#10]) at com.redhat.foreman.ForemanAPI.getHostParameterValue(ForemanAPI.java:200) 2016-10-06 03:50:34,331 ERROR [STDERR] (jenkins.util.Timer [#10]) at com.redhat.foreman.ForemanAPI.getHostForQuery(ForemanAPI.java:305) 2016-10-06 03:50:34,331 ERROR [STDERR] (jenkins.util.Timer [#10]) at com.redhat.foreman.ForemanAPI.getCompatibleHosts(ForemanAPI.java:322) 2016-10-06 03:50:34,331 ERROR [STDERR] (jenkins.util.Timer [#10]) at com.redhat.foreman.ForemanSharedNodeCloud.canProvision(ForemanSharedNodeCloud.java:159) 2016-10-06 03:50:34,331 ERROR [STDERR] (jenkins.util.Timer [#10]) at hudson.model.Label.getClouds(Label.java:227) 2016-10-06 03:50:34,332 ERROR [STDERR] (jenkins.util.Timer [#10]) at hudson.model.Label.isEmpty(Label.java:435) 2016-10-06 03:50:34,332 ERROR [STDERR] (jenkins.util.Timer [#10]) at jenkins.model.Jenkins.trimLabels(Jenkins.java:1725) 2016-10-06 03:50:34,332 ERROR [STDERR] (jenkins.util.Timer [#10]) at jenkins.model.Nodes$2.run(Nodes.java:137) 2016-10-06 03:50:34,332 ERROR [STDERR] (jenkins.util.Timer [#10]) at hudson.model.Queue._withLock(Queue.java:1289) 2016-10-06 03:50:34,332 ERROR [STDERR] (jenkins.util.Timer [#10]) at hudson.model.Queue.withLock(Queue.java:1172) 2016-10-06 03:50:34,332 ERROR [STDERR] (jenkins.util.Timer [#10]) at jenkins.model.Nodes.addNode(Nodes.java:132) 2016-10-06 03:50:34,332 ERROR [STDERR] (jenkins.util.Timer [#10]) at jenkins.model.Jenkins.addNode(Jenkins.java:1693) 2016-10-06 03:50:34,332 ERROR [STDERR] (jenkins.util.Timer [#10]) at hudson.slaves.NodeProvisioner$2.run(NodeProvisioner.java:226) 2016-10-06 03:50:34,332 ERROR [STDERR] (jenkins.util.Timer [#10]) at hudson.model.Queue._withLock(Queue.java:1289) 2016-10-06 03:50:34,332 ERROR [STDERR] (jenkins.util.Timer [#10]) at hudson.model.Queue.withLock(Queue.java:1172) 2016-10-06 03:50:34,332 ERROR [STDERR] (jenkins.util.Timer [#10]) at hudson.slaves.NodeProvisioner.update(NodeProvisioner.java:208) 2016-10-06 03:50:34,332 ERROR [STDERR] (jenkins.util.Timer [#10]) at hudson.slaves.NodeProvisioner.access$000(NodeProvisioner.java:57) 2016-10-06 03:50:34,332 ERROR [STDERR] (jenkins.util.Timer [#10]) at hudson.slaves.NodeProvisioner$NodeProvisionerInvoker.doRun(NodeProvisioner.java:775) 2016-10-06 03:50:34,332 ERROR [STDERR] (jenkins.util.Timer [#10]) at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:51) 2016-10-06 03:50:34,332 ERROR [STDERR] (jenkins.util.Timer [#10]) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) 2016-10-06 03:50:34,332 ERROR [STDERR] (jenkins.util.Timer [#10]) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) 2016-10-06 03:50:34,332 ERROR [STDERR] (jenkins.util.Timer [#10]) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) 2016-10-06 03:50:34,333 ERROR [STDERR] (jenkins.util.Timer [#10]) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) 2016-10-06 03:50:34,333 ERROR [STDERR] (jenkins.util.Timer [#10]) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180) 2016-10-06 03:50:34,333 ERROR [STDERR] (jenkins.util.Timer [#10]) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204) 2016-10-06 03:50:34,333 ERROR [STDERR] (jenkins.util.Timer [#10]) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) 2016-10-06 03:50:34,333 ERROR [STDERR] (jenkins.util.Timer [#10]) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) 2016-10-06 03:50:34,333 ERROR [STDERR] (jenkins.util.Timer [#10]) at java.lang.Thread.run(Thread.java:682)
Y was another slave than X. Each following request to Foreman ended in:
ERROR [ForemanAPI] java.net.SocketException: Unexpected end of file from server
As I wrote, 3 nodes were reserved in that time (X, Y, Z) and the Y and Z were correctly released when I established network back. From the log above, there has to be a path during/after provisioning a slave which doesn't ended in ForemanSharedNode.terminate(), but I don't see it in the code.