Details
-
Type:
Bug
-
Status: Open (View Workflow)
-
Priority:
Blocker
-
Resolution: Unresolved
-
Component/s: remoting
-
Labels:None
-
Similar Issues:
-
Epic Link:
Description
The slave goes offline during the job execution and throws the error as mentioned below
Slave went offline during the build
01:20:15 ERROR: Connection was broken: java.io.EOFException
01:20:15 at org.jenkinsci.remoting.nio.NioChannelHub$3.run(NioChannelHub.java:613)
01:20:15 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
01:20:15 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
01:20:15 at java.util.concurrent.FutureTask.run(FutureTask.java:166)
01:20:15 at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
01:20:15 at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
01:20:15 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
01:20:15 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
01:20:15 at java.util.concurrent.FutureTask.run(FutureTask.java:166)
01:20:15 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
01:20:15 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
01:20:15 at java.lang.Thread.run(Thread.java:724)
01:20:15
Attachments
Issue Links
- is duplicated by
-
JENKINS-36944 Agent goes offline during the build
-
- Resolved
-
- is related to
-
JENKINS-40491 Preliminary FifoBuffer termination can cause outage of all JNLP1/2 agents
-
- Resolved
-
- relates to
-
JENKINS-25858 java.io.IOException: Unexpected termination of the channel
-
- Resolved
-
-
JENKINS-23419 FATAL: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected EOF
-
- Closed
-
- links to
In our configuration on AWS I found that the connection to slaves was being terminated around 1 minute for the particular pipeline stage that was running. The stage was a long running git checkout that intermittently succeeded.
The solution for me was to increase the ELB idle timeout property on the load balancer in between the slave and master (http://docs.aws.amazon.com/elasticloadbalancing/latest/classic/config-idle-timeout.html). By default this property is set to 60 seconds, whereas the Jenkins default for 'hudson.remoting.Launcher.pingTimeoutSec' is 240.
During the 1 minute period where the slave was executing the long-running git checkout it must have been transferring less than 1 byte of data and therefore the ELB was dropping the TCP connection.