The behaviour is surfacing frequently in the last two days. We had been aborting jobs manually/ forcefully (sometimes restarting Jenkins service)
Below is the console log of the job:
[Pipeline] withEnv (hide) [Pipeline] { (hide) [Pipeline] input (hide) Deploy to UAT? Proceed or Abort > git config core.sparsecheckout # timeout=10 > git read-tree -mu HEAD # timeout=10 > git checkout -f e841b11ad7aebdfcaab7ec6363a71xxxxxxxxx # timeout=10 Pausing Approved by Anuj Jain Resuming build at Fri Oct 23 09:18:57 UTC 2020 after Jenkins restart Waiting to resume part of UAT-client-service » UAT_Branch #10: Waiting for next available executor on ‘slave-UAT’ Still paused Click here to forcibly terminate running steps Resuming build at Fri Oct 23 09:55:40 UTC 2020 after Jenkins restart Waiting to resume part of UAT-client-service » UAT_Branch #10: Waiting for next available executor on ‘slave-UAT’ Still paused Terminating withEnv Click here to forcibly kill entire build Hard kill! [Checks API] No suitable checks publisher found. GitHub has been notified of this commit’s build result Finished: ABORTED
======================
The jobs have been running fine for over the last 30 days and now we have started observing it. No other job runs on the slave at the same time and we increased the executors from 2 to 4. Only restarting the Jenkins service and forcefully killing the job helps. After this, if we restart the job, it works fine.
Below is the jenkins.log cut:
[id=31] INFO jenkins.InitReactorRunner$1#onAttained: Completed initialization
SSH Launch of slave-CTE on cte-master completed in 5,980 ms
SSH Launch of slave-UAT on uat-master completed in 5,985 ms
[id=21] INFO hudson.WebAppMain$3#run: Jenkins is fully up and running
[id=18] WARNING o.j.p.w.s.s.input.InputAction#loadExecutions: no flow execution found for UAT-client-service/UAT_Branch #5
[id=18] WARNING o.j.p.w.s.s.input.InputAction#loadExecutions: no flow execution found for UAT-client-service/UAT_Branch #4
[id=18] WARNING o.j.p.w.s.s.input.InputAction#loadExecutions: no flow execution found for UAT-client-service/UAT_Branch #2
[id=99] INFO h.r.SynchronousCommandTransport$ReaderThread#run: I/O error in channel slave-UAT
java.io.EOFException
at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2760)
at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3235)
at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:914)
at java.io.ObjectInputStream.<init>(ObjectInputStream.java:376)
at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49)
at hudson.remoting.Command.readFrom(Command.java:142)
at hudson.remoting.Command.readFrom(Command.java:128)
at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:35)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
Caused: java.io.IOException: Unexpected termination of the channel
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)
[id=52] INFO hudson.slaves.SlaveComputer#tryReconnect: Attempting to reconnect slave-UAT
SSH Launch of slave-UAT on uat-master completed in 10,150 ms
SSH Launch of slave-UAT on uat-master completed in 7,577 ms
[id=250] INFO jenkins.model.Jenkins$22#run: Restarting VM as requested by xxx_admin
[id=250] INFO jenkins.model.Jenkins#cleanUp: Stopping Jenkins