[JENKINS-18781] Configurable channel timeout for slaves

Type: Improvement
Resolution: Unresolved
Priority: Major
Component/s: core
Labels:
- remoting

Similar Issues:
Powered by SuggestiMate

Show

This issue is related to ~~JENKINS-6817~~.

I am running Jenkins slaves inside virtual machines. Sometimes these machines are overloaded and I get the following exception:

FATAL: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
	at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:41)
	at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:34)
	at hudson.remoting.Request.call(Request.java:174)
	at hudson.remoting.Channel.call(Channel.java:713)
	at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:167)
	at $Proxy38.join(Unknown Source)
	at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:925)
	at hudson.Launcher$ProcStarter.join(Launcher.java:360)
	at hudson.tasks.Maven.perform(Maven.java:327)
	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:804)
	at hudson.model.Build$BuildExecution.build(Build.java:199)
	at hudson.model.Build$BuildExecution.doRun(Build.java:160)
	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:586)
	at hudson.model.Run.execute(Run.java:1593)
	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
	at hudson.model.ResourceController.execute(ResourceController.java:88)
	at hudson.model.Executor.run(Executor.java:247)
Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
	at hudson.remoting.Request.abort(Request.java:299)
	at hudson.remoting.Channel.terminate(Channel.java:773)
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69)
Caused by: java.io.IOException: Unexpected termination of the channel
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
Caused by: java.io.EOFException
	at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source)
	at java.io.ObjectInputStream.readObject0(Unknown Source)
	at java.io.ObjectInputStream.readObject(Unknown Source)
	at hudson.remoting.Command.readFrom(Command.java:92)
	at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:72)
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)

Is it possible to make the channel timeout configurable? I'd like to increase the value from, say 5 seconds, to 30 seconds or a minute.

depends on

JENKINS-44785 Add Built-in Request timeout support in Remoting

Open

is duplicated by

JENKINS-22754 Configurable node timeout

Closed

is related to

JENKINS-22853 SEVERE: Trying to unexport an object that's already unexported

Resolved

JENKINS-22722 Master doesn't show connected slave

Resolved

JENKINS-18879 Collecting finbugs analysis results randomly fails with exception

Resolved

JENKINS-6817 FATAL: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel

Closed

(1 is related to)

Ramin Baradari added a comment - 2013-09-06 13:11

I wonder if those new timeouts of the slaves might be related to this change https://github.com/jenkinsci/remoting/commit/28830e37b94387d0c6f9927ad897f4010e6c1bda
Maybe kohsuke knows and can add some logging in case of connection timeouts. Currently everything happens silently and there is no clue why the connections die and which timeout is actually responsible.

Ramin Baradari added a comment - 2013-09-06 13:11 I wonder if those new timeouts of the slaves might be related to this change https://github.com/jenkinsci/remoting/commit/28830e37b94387d0c6f9927ad897f4010e6c1bda Maybe kohsuke knows and can add some logging in case of connection timeouts. Currently everything happens silently and there is no clue why the connections die and which timeout is actually responsible.

cowwoc added a comment - 2013-09-06 22:12

I agree, we need more logging. On a side-note, a five second timeout is very low in my case. It is quite likely that overloaded VMs will fail to respond for longer (especially if swapping to disk occurs).

cowwoc added a comment - 2013-09-06 22:12 I agree, we need more logging. On a side-note, a five second timeout is very low in my case. It is quite likely that overloaded VMs will fail to respond for longer (especially if swapping to disk occurs).

Henri Gomez added a comment - 2013-10-01 10:15

+1, I get more and more often these failures.

It would be great to have slave connection and read timeout configurable by node, so we could set it independently.
For example, remote slaves via WAN could requires more time to be connected and respond

Henri Gomez added a comment - 2013-10-01 10:15 +1, I get more and more often these failures. It would be great to have slave connection and read timeout configurable by node, so we could set it independently. For example, remote slaves via WAN could requires more time to be connected and respond

Tony Greway added a comment - 2013-10-19 19:47

I've recently upgraded from 0.27 SSH plugin to 1.4 and I see this exception being thrown almost everyday on our nightly builds. We are using a multi-configuration project to essentially install software on our distributed cluster ever night and I see our builds fail randomly on one of the nodes with high regularity. All of our nodes are VMs that typically have no traffic at the time of the build and I have them configured to be offline until the build is requested. Is there a way to revert back to .27 until this is fixed - I've tried without success?

Tony Greway added a comment - 2013-10-19 19:47 I've recently upgraded from 0.27 SSH plugin to 1.4 and I see this exception being thrown almost everyday on our nightly builds. We are using a multi-configuration project to essentially install software on our distributed cluster ever night and I see our builds fail randomly on one of the nodes with high regularity. All of our nodes are VMs that typically have no traffic at the time of the build and I have them configured to be offline until the build is requested. Is there a way to revert back to .27 until this is fixed - I've tried without success?

Stoil Valchkov added a comment - 2014-04-22 10:16

Hi guys,

Are there any plans to fix that soon? It is crucial to have it in order to have stable environment.

Thanks

Stoil Valchkov added a comment - 2014-04-22 10:16 Hi guys, Are there any plans to fix that soon? It is crucial to have it in order to have stable environment. Thanks

Ian Norton added a comment - 2014-05-02 10:22

I hate to make a "me too" post but this is getting rather annoying here. My windows (VM) slaves tend to fail one in every 3 builds because of this, and as we have a matrix job for this it usually means 2/3 builds fail.

Ian Norton added a comment - 2014-05-02 10:22 I hate to make a "me too" post but this is getting rather annoying here. My windows (VM) slaves tend to fail one in every 3 builds because of this, and as we have a matrix job for this it usually means 2/3 builds fail.

Sajajd Rehman added a comment - 2014-05-13 13:10

PING any fixers there?

Sajajd Rehman added a comment - 2014-05-13 13:10 PING any fixers there?

Stoil Valchkov added a comment - 2014-05-21 08:25

it is reproducible quite often in our setup - with exceptino trace like:

07:25:16 hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
07:25:16 at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:41)
07:25:16 at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:34)
07:25:16 at hudson.remoting.Request.call(Request.java:174)
07:25:16 at hudson.remoting.Channel.call(Channel.java:722)
07:25:16 at hudson.EnvVars.getRemote(EnvVars.java:404)
07:25:16 at hudson.model.Computer.getEnvironment(Computer.java:911)
07:25:16 at jenkins.model.CoreEnvironmentContributor.buildEnvironmentFor(CoreEnvironmentContributor.java:29)
07:25:16 at hudson.model.Run.getEnvironment(Run.java:2201)
07:25:16 at hudson.model.AbstractBuild.getEnvironment(AbstractBuild.java:873)
07:25:16 at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:863)
07:25:16 at hudson.model.AbstractProject.checkout(AbstractProject.java:1320)
07:25:16 at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:609)
07:25:16 at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:88)
07:25:16 at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:518)
07:25:16 at hudson.model.Run.execute(Run.java:1688)
07:25:16 at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
07:25:16 at hudson.model.ResourceController.execute(ResourceController.java:88)
07:25:16 at hudson.model.Executor.run(Executor.java:231)
07:25:16 Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
07:25:16 at hudson.remoting.Request.abort(Request.java:299)
07:25:16 at hudson.remoting.Channel.terminate(Channel.java:782)
07:25:16 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69)
07:25:16 Caused by: java.io.IOException: Unexpected termination of the channel
07:25:16 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
07:25:16 Caused by: java.io.EOFException
07:25:16 at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2598)
07:25:16 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1318)
07:25:16 at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
07:25:16 at hudson.remoting.Command.readFrom(Command.java:92)
07:25:16 at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:71)
07:25:16 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)

I added a test step which confirms that connectivity is fine - ping command runs just fine, native ssh client is able to connect as well. Just after that Jenkins node is not able to work. Any suggestions?

Stoil Valchkov added a comment - 2014-05-21 08:25 it is reproducible quite often in our setup - with exceptino trace like: 07:25:16 hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel 07:25:16 at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:41) 07:25:16 at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:34) 07:25:16 at hudson.remoting.Request.call(Request.java:174) 07:25:16 at hudson.remoting.Channel.call(Channel.java:722) 07:25:16 at hudson.EnvVars.getRemote(EnvVars.java:404) 07:25:16 at hudson.model.Computer.getEnvironment(Computer.java:911) 07:25:16 at jenkins.model.CoreEnvironmentContributor.buildEnvironmentFor(CoreEnvironmentContributor.java:29) 07:25:16 at hudson.model.Run.getEnvironment(Run.java:2201) 07:25:16 at hudson.model.AbstractBuild.getEnvironment(AbstractBuild.java:873) 07:25:16 at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:863) 07:25:16 at hudson.model.AbstractProject.checkout(AbstractProject.java:1320) 07:25:16 at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:609) 07:25:16 at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:88) 07:25:16 at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:518) 07:25:16 at hudson.model.Run.execute(Run.java:1688) 07:25:16 at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43) 07:25:16 at hudson.model.ResourceController.execute(ResourceController.java:88) 07:25:16 at hudson.model.Executor.run(Executor.java:231) 07:25:16 Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel 07:25:16 at hudson.remoting.Request.abort(Request.java:299) 07:25:16 at hudson.remoting.Channel.terminate(Channel.java:782) 07:25:16 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69) 07:25:16 Caused by: java.io.IOException: Unexpected termination of the channel 07:25:16 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50) 07:25:16 Caused by: java.io.EOFException 07:25:16 at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2598) 07:25:16 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1318) 07:25:16 at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) 07:25:16 at hudson.remoting.Command.readFrom(Command.java:92) 07:25:16 at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:71) 07:25:16 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48) I added a test step which confirms that connectivity is fine - ping command runs just fine, native ssh client is able to connect as well. Just after that Jenkins node is not able to work. Any suggestions?

Gardner Bickford added a comment - 2014-07-09 19:25

This started happening in our environment after updating to Jenkins ver. 1.571

Server

gardner:~ buildmachine$ uname -a
Darwin buildmachine.company.com 12.5.0 Darwin Kernel Version 12.5.0: Sun Sep 29 13:33:47 PDT 2013; root:xnu-2050.48.12~1/RELEASE_X86_64 x86_64

gardner:~ buildmachine$ java -version
java version "1.7.0_25"
Java(TM) SE Runtime Environment (build 1.7.0_25-b15)
Java HotSpot(TM) 64-Bit Server VM (build 23.25-b01, mixed mode)

Client

gardner@build-node:~$ uname -a
Darwin build-node.corp.adobe.com 13.2.0 Darwin Kernel Version 13.2.0: Thu Apr 17 23:03:13 PDT 2014; root:xnu-2422.100.13~1/RELEASE_X86_64 x86_64

gardner@build-node:~$ java -version
java version "1.6.0_65"
Java(TM) SE Runtime Environment (build 1.6.0_65-b14-462-11M4609)
Java HotSpot(TM) 64-Bit Server VM (build 20.65-b04-462, mixed mode)

I will update the build node to 1.7 JRE and report back if the problem persists.

Full Job Outpout

Started by an SCM change
Building remotely on buildnode in workspace /Users/gardner/jenkins/workspace/gardnerFunctionalTests
FATAL: hudson.remoting.RequestAbortedException: java.io.EOFException
hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.EOFException
	at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:41)
	at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:34)
	at hudson.remoting.Request.call(Request.java:174)
	at hudson.remoting.Channel.call(Channel.java:739)
	at hudson.FilePath.act(FilePath.java:911)
	at hudson.FilePath.act(FilePath.java:895)
	at hudson.FilePath.mkdirs(FilePath.java:1081)
	at hudson.model.AbstractProject.checkout(AbstractProject.java:1245)
	at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:624)
	at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:530)
	at hudson.model.Run.execute(Run.java:1732)
	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
	at hudson.model.ResourceController.execute(ResourceController.java:88)
	at hudson.model.Executor.run(Executor.java:234)
Caused by: hudson.remoting.RequestAbortedException: java.io.EOFException
	at hudson.remoting.Request.abort(Request.java:299)
	at hudson.remoting.Channel.terminate(Channel.java:802)
	at org.jenkinsci.remoting.nio.NioChannelHub$3.run(NioChannelHub.java:566)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
	at java.util.concurrent.FutureTask.run(FutureTask.java:166)
	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:111)
	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
	at java.util.concurrent.FutureTask.run(FutureTask.java:166)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:724)
Caused by: java.io.EOFException
	... 12 more

Gardner Bickford added a comment - 2014-07-09 19:25 This started happening in our environment after updating to Jenkins ver. 1.571 Server gardner:~ buildmachine$ uname -a Darwin buildmachine.company.com 12.5.0 Darwin Kernel Version 12.5.0: Sun Sep 29 13:33:47 PDT 2013; root:xnu-2050.48.12~1/RELEASE_X86_64 x86_64 gardner:~ buildmachine$ java -version java version "1.7.0_25" Java(TM) SE Runtime Environment (build 1.7.0_25-b15) Java HotSpot(TM) 64-Bit Server VM (build 23.25-b01, mixed mode) Client gardner@build-node:~$ uname -a Darwin build-node.corp.adobe.com 13.2.0 Darwin Kernel Version 13.2.0: Thu Apr 17 23:03:13 PDT 2014; root:xnu-2422.100.13~1/RELEASE_X86_64 x86_64 gardner@build-node:~$ java -version java version "1.6.0_65" Java(TM) SE Runtime Environment (build 1.6.0_65-b14-462-11M4609) Java HotSpot(TM) 64-Bit Server VM (build 20.65-b04-462, mixed mode) I will update the build node to 1.7 JRE and report back if the problem persists. Full Job Outpout Started by an SCM change Building remotely on buildnode in workspace /Users/gardner/jenkins/workspace/gardnerFunctionalTests FATAL: hudson.remoting.RequestAbortedException: java.io.EOFException hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.EOFException at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:41) at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:34) at hudson.remoting.Request.call(Request.java:174) at hudson.remoting.Channel.call(Channel.java:739) at hudson.FilePath.act(FilePath.java:911) at hudson.FilePath.act(FilePath.java:895) at hudson.FilePath.mkdirs(FilePath.java:1081) at hudson.model.AbstractProject.checkout(AbstractProject.java:1245) at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:624) at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:530) at hudson.model.Run.execute(Run.java:1732) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:234) Caused by: hudson.remoting.RequestAbortedException: java.io.EOFException at hudson.remoting.Request.abort(Request.java:299) at hudson.remoting.Channel.terminate(Channel.java:802) at org.jenkinsci.remoting.nio.NioChannelHub$3.run(NioChannelHub.java:566) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:111) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang. Thread .run( Thread .java:724) Caused by: java.io.EOFException ... 12 more

Ray Sennewald added a comment - 2014-10-27 07:28

I'm also experiencing this same issue more and more now.

Ray Sennewald added a comment - 2014-10-27 07:28 I'm also experiencing this same issue more and more now.

Christer Engde added a comment - 2014-11-04 20:21

This is a problem for me too, please let me know if I can help out with debugging.
We are running Jenkins on a virtual linux machine, with a slave on the same machine.

FATAL: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:41)
...

Christer Engde added a comment - 2014-11-04 20:21 This is a problem for me too, please let me know if I can help out with debugging. We are running Jenkins on a virtual linux machine, with a slave on the same machine. FATAL: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:41) ...

azher khan added a comment - 2014-11-10 05:05

+1, I am facing the same issue and we too have setup jenkins on VMs. (Jenkins master - ver. 1.547)

Any help would be highly appreciated!

Error Log:
FATAL: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:41)
at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:34)
at hudson.remoting.Request.call(Request.java:174)
at hudson.remoting.Channel.call(Channel.java:722)
at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:167)
at $Proxy54.join(Unknown Source)

azher khan added a comment - 2014-11-10 05:05 +1, I am facing the same issue and we too have setup jenkins on VMs. (Jenkins master - ver. 1.547) Any help would be highly appreciated! Error Log: FATAL: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:41) at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:34) at hudson.remoting.Request.call(Request.java:174) at hudson.remoting.Channel.call(Channel.java:722) at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:167) at $Proxy54.join(Unknown Source)

Charles Ritchea added a comment - 2014-11-12 15:03

I'm experiencing this as well. It seems to only impact one job of many, and I can't think of anything special about that job except that it's the newest.

Charles Ritchea added a comment - 2014-11-12 15:03 I'm experiencing this as well. It seems to only impact one job of many, and I can't think of anything special about that job except that it's the newest.

mark mann added a comment - 2014-12-16 11:26

hey all...

some kind of Jenkins slave connection management settings would help.
I have desperately tried searching through all the config settings for slave and master, but cannot find a way of tuning the poll interval, length of timeout and number of retries to help diagnose or fix my issue.

Using vagrant, I'm spinning up a number of Windows VMs on vsphere and experiencing "random" socket resets or timeouts.
Below is a snippet of the error from the jenkins job trace. (Vagrant is at the point that it is polling the new VMs to discover when they are online and so the connection timeout from WinRM is as expected) however, during this polling cycle Jenkins falls over with the second part of the error trace... "java.net.SocketException: Connection reset"

surely there is a way of setting a number of retries or increasing the length of timeout?

if ($LASTEXITCODE)

{ exit $LASTEXITCODE }

else

{ exit 0 }

Message: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. - connect(2) (http://10.30.40.12:5985)>
INFO winrm: Checking whether WinRM is ready...
INFO winrmshell: Attempting to connect to WinRM...
INFO winrmshell: - Host: 10.30.40.12
INFO winrmshell: - Port: 5985
INFO winrmshell: - Username: Administrator
FATAL: hudson.remoting.RequestAbortedException: java.net.SocketException: Connection reset
hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.net.SocketException: Connection reset
at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:41)
at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:34)
at hudson.remoting.Request.call(Request.java:174)
at hudson.remoting.Channel.call(Channel.java:722)
at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:167)
at com.sun.proxy.$Proxy45.join(Unknown Source)
at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:950)
at hudson.Launcher$ProcStarter.join(Launcher.java:360)
at hudson.plugins.msbuild.MsBuildBuilder.perform(MsBuildBuilder.java:180)
at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:804)
at hudson.model.Build$BuildExecution.build(Build.java:199)
at hudson.model.Build$BuildExecution.doRun(Build.java:160)
at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:585)
at hudson.model.Run.execute(Run.java:1684)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:231)
Caused by: hudson.remoting.RequestAbortedException: java.net.SocketException: Connection reset
at hudson.remoting.Request.abort(Request.java:299)
at hudson.remoting.Channel.terminate(Channel.java:782)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69)
Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.io.BufferedInputStream.fill(Unknown Source)
at java.io.BufferedInputStream.read(Unknown Source)
at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:77)
at java.io.ObjectInputStream$PeekInputStream.peek(Unknown Source)
at java.io.ObjectInputStream$BlockDataInputStream.peek(Unknown Source)
at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source)
at java.io.ObjectInputStream.readObject0(Unknown Source)
at java.io.ObjectInputStream.readObject(Unknown Source)
at hudson.remoting.Command.readFrom(Command.java:92)
at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:71)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)

mark mann added a comment - 2014-12-16 11:26 hey all... some kind of Jenkins slave connection management settings would help. I have desperately tried searching through all the config settings for slave and master, but cannot find a way of tuning the poll interval, length of timeout and number of retries to help diagnose or fix my issue. Using vagrant, I'm spinning up a number of Windows VMs on vsphere and experiencing "random" socket resets or timeouts. Below is a snippet of the error from the jenkins job trace. (Vagrant is at the point that it is polling the new VMs to discover when they are online and so the connection timeout from WinRM is as expected) however, during this polling cycle Jenkins falls over with the second part of the error trace... "java.net.SocketException: Connection reset" surely there is a way of setting a number of retries or increasing the length of timeout? if ($LASTEXITCODE) { exit $LASTEXITCODE } else { exit 0 } Message: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. - connect(2) ( http://10.30.40.12:5985 )> INFO winrm: Checking whether WinRM is ready... INFO winrmshell: Attempting to connect to WinRM... INFO winrmshell: - Host: 10.30.40.12 INFO winrmshell: - Port: 5985 INFO winrmshell: - Username: Administrator FATAL: hudson.remoting.RequestAbortedException: java.net.SocketException: Connection reset hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.net.SocketException: Connection reset at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:41) at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:34) at hudson.remoting.Request.call(Request.java:174) at hudson.remoting.Channel.call(Channel.java:722) at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:167) at com.sun.proxy.$Proxy45.join(Unknown Source) at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:950) at hudson.Launcher$ProcStarter.join(Launcher.java:360) at hudson.plugins.msbuild.MsBuildBuilder.perform(MsBuildBuilder.java:180) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:804) at hudson.model.Build$BuildExecution.build(Build.java:199) at hudson.model.Build$BuildExecution.doRun(Build.java:160) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:585) at hudson.model.Run.execute(Run.java:1684) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:231) Caused by: hudson.remoting.RequestAbortedException: java.net.SocketException: Connection reset at hudson.remoting.Request.abort(Request.java:299) at hudson.remoting.Channel.terminate(Channel.java:782) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69) Caused by: java.net.SocketException: Connection reset at java.net.SocketInputStream.read(Unknown Source) at java.net.SocketInputStream.read(Unknown Source) at java.io.BufferedInputStream.fill(Unknown Source) at java.io.BufferedInputStream.read(Unknown Source) at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:77) at java.io.ObjectInputStream$PeekInputStream.peek(Unknown Source) at java.io.ObjectInputStream$BlockDataInputStream.peek(Unknown Source) at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source) at java.io.ObjectInputStream.readObject0(Unknown Source) at java.io.ObjectInputStream.readObject(Unknown Source) at hudson.remoting.Command.readFrom(Command.java:92) at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:71) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)

harschware added a comment - 2015-07-23 21:31

Also facing the problem... no comments in 7 months on this ticket with around 30 votes, but no assignee yet.

harschware added a comment - 2015-07-23 21:31 Also facing the problem... no comments in 7 months on this ticket with around 30 votes, but no assignee yet.

Gardner Bickford added a comment - 2015-07-24 01:15 - edited

This will happen if you slave goes to sleep. I ran into an issue where a corporate policy enforcement caused the slave to go to sleep at night.

If you want any bug assigned, the best thing to do is to get a reproducible case. Correlate the /var/logs/system.log with the stack trace. Or find out when it is likely to happen, get some coffee, and watch the machine with your eyeballs.

I am not a commiter on this project.

Gardner Bickford added a comment - 2015-07-24 01:15 - edited This will happen if you slave goes to sleep. I ran into an issue where a corporate policy enforcement caused the slave to go to sleep at night. If you want any bug assigned, the best thing to do is to get a reproducible case. Correlate the /var/logs/system.log with the stack trace. Or find out when it is likely to happen, get some coffee, and watch the machine with your eyeballs. I am not a commiter on this project.

Kevin Navero added a comment - 2016-01-07 04:22

I also have this issue on Jenkins 1.625.3 LTS. Using Java 8 on master and slave nodes. I have a Windows Server 2003/XP VM as my slave build node. The problem is intermittent and eventually the slave node regains connection, but the timeout is too short so it just fails the build.

In jenkins.err.log, I get

Jan 06, 2016 3:38:19 PM jenkins.slaves.JnlpSlaveAgentProtocol$Handler$1 onClosed
WARNING: Computer.threadPoolForRemoting [#3599] for cibuilder-8 terminated
java.io.EOFException
	at org.jenkinsci.remoting.nio.NioChannelHub$3.run(NioChannelHub.java:613)
	at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.util.concurrent.FutureTask.run(Unknown Source)
	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
	at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.util.concurrent.FutureTask.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.lang.Thread.run(Unknown Source)

On the slave node's build log:

Fetching upstream changes from file:////path/to/foo.git
 > git --version # timeout=10
 > git -c core.askpass=true fetch --tags --progress file:////path/to/foo.git +refs/heads/*:refs/remotes/origin/* # timeout=60
FATAL: java.io.EOFException
hudson.remoting.RequestAbortedException: java.io.EOFException
	at hudson.remoting.Request.abort(Request.java:297)
	at hudson.remoting.Channel.terminate(Channel.java:847)
	at org.jenkinsci.remoting.nio.NioChannelHub$3.run(NioChannelHub.java:613)
	at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.util.concurrent.FutureTask.run(Unknown Source)
	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
	at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.util.concurrent.FutureTask.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.lang.Thread.run(Unknown Source)
	at ......remote call to cibuilder-8(Native Method)
	at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1416)
	at hudson.remoting.Request.call(Request.java:172)
	at hudson.remoting.Channel.call(Channel.java:780)
	at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.execute(RemoteGitImpl.java:145)
	at sun.reflect.GeneratedMethodAccessor250.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
	at java.lang.reflect.Method.invoke(Unknown Source)
	at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.invoke(RemoteGitImpl.java:131)
	at com.sun.proxy.$Proxy51.execute(Unknown Source)
	at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1003)
	at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1043)
	at hudson.scm.SCM.checkout(SCM.java:485)
	at hudson.model.AbstractProject.checkout(AbstractProject.java:1275)
	at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:610)
	at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:532)
	at hudson.model.Run.execute(Run.java:1741)
	at hudson.matrix.MatrixRun.run(MatrixRun.java:146)
	at hudson.model.ResourceController.execute(ResourceController.java:98)
	at hudson.model.Executor.run(Executor.java:408)
Caused by: java.io.EOFException
	at org.jenkinsci.remoting.nio.NioChannelHub$3.run(NioChannelHub.java:613)
	at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.util.concurrent.FutureTask.run(Unknown Source)
	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
	at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.util.concurrent.FutureTask.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.lang.Thread.run(Unknown Source)

Kevin Navero added a comment - 2016-01-07 04:22 I also have this issue on Jenkins 1.625.3 LTS. Using Java 8 on master and slave nodes. I have a Windows Server 2003/XP VM as my slave build node. The problem is intermittent and eventually the slave node regains connection, but the timeout is too short so it just fails the build. In jenkins.err.log, I get Jan 06, 2016 3:38:19 PM jenkins.slaves.JnlpSlaveAgentProtocol$Handler$1 onClosed WARNING: Computer.threadPoolForRemoting [#3599] for cibuilder-8 terminated java.io.EOFException at org.jenkinsci.remoting.nio.NioChannelHub$3.run(NioChannelHub.java:613) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang. Thread .run(Unknown Source) On the slave node's build log: Fetching upstream changes from file: ////path/to/foo.git > git --version # timeout=10 > git -c core.askpass= true fetch --tags --progress file: ////path/to/foo.git +refs/heads/*:refs/remotes/origin/* # timeout=60 FATAL: java.io.EOFException hudson.remoting.RequestAbortedException: java.io.EOFException at hudson.remoting.Request.abort(Request.java:297) at hudson.remoting.Channel.terminate(Channel.java:847) at org.jenkinsci.remoting.nio.NioChannelHub$3.run(NioChannelHub.java:613) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang. Thread .run(Unknown Source) at ......remote call to cibuilder-8(Native Method) at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1416) at hudson.remoting.Request.call(Request.java:172) at hudson.remoting.Channel.call(Channel.java:780) at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.execute(RemoteGitImpl.java:145) at sun.reflect.GeneratedMethodAccessor250.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.invoke(RemoteGitImpl.java:131) at com.sun.proxy.$Proxy51.execute(Unknown Source) at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1003) at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1043) at hudson.scm.SCM.checkout(SCM.java:485) at hudson.model.AbstractProject.checkout(AbstractProject.java:1275) at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:610) at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:532) at hudson.model.Run.execute(Run.java:1741) at hudson.matrix.MatrixRun.run(MatrixRun.java:146) at hudson.model.ResourceController.execute(ResourceController.java:98) at hudson.model.Executor.run(Executor.java:408) Caused by: java.io.EOFException at org.jenkinsci.remoting.nio.NioChannelHub$3.run(NioChannelHub.java:613) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang. Thread .run(Unknown Source)

John McCullough added a comment - 2016-03-11 21:49

We also see this behavior periodicially in our system. Unfortunately for us we lose a lot of time and it is very disruptive of our release process when it occurs since the Windows slave nodes which show this problem are used to execute long-running tests. It would not be such a problem if it was just a short running module build job that can be readily retried. This seems like it would be really straightforward to add configurable values for this behavior and it would increase the value we get from Jenkins quite a lot. Please consider addressing this issue .
Thanks,
John

John McCullough added a comment - 2016-03-11 21:49 We also see this behavior periodicially in our system. Unfortunately for us we lose a lot of time and it is very disruptive of our release process when it occurs since the Windows slave nodes which show this problem are used to execute long-running tests. It would not be such a problem if it was just a short running module build job that can be readily retried. This seems like it would be really straightforward to add configurable values for this behavior and it would increase the value we get from Jenkins quite a lot. Please consider addressing this issue . Thanks, John

Rick Liu added a comment - 2016-06-03 19:54 - edited

Ubuntu 14.04 server 64-bit
oracle-java7: 1.7.0_80
Jenkins: 1.651.1 LTS

The build sometimes randomly failed with this kind of error.

This time happened in the post-build actions:
FATAL: channel is already closed
hudson.remoting.ChannelClosedException: channel is already closed
at hudson.remoting.Channel.send(Channel.java:578)
at hudson.remoting.Request.call(Request.java:130)
at hudson.remoting.Channel.call(Channel.java:780)
at hudson.Launcher$RemoteLauncher.kill(Launcher.java:953)
at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:540)
at hudson.model.Run.execute(Run.java:1738)
at hudson.matrix.MatrixBuild.run(MatrixBuild.java:313)
at hudson.model.ResourceController.execute(ResourceController.java:98)
at hudson.model.Executor.run(Executor.java:410)
Caused by: java.io.IOException: Unexpected termination of the channel
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
Caused by: java.io.EOFException
at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2325)
at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2794)
at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801)
at java.io.ObjectInputStream.<init>(ObjectInputStream.java:299)
at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:48)
at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)

Rick Liu added a comment - 2016-06-03 19:54 - edited Ubuntu 14.04 server 64-bit oracle-java7: 1.7.0_80 Jenkins: 1.651.1 LTS The build sometimes randomly failed with this kind of error. This time happened in the post-build actions: FATAL: channel is already closed hudson.remoting.ChannelClosedException: channel is already closed at hudson.remoting.Channel.send(Channel.java:578) at hudson.remoting.Request.call(Request.java:130) at hudson.remoting.Channel.call(Channel.java:780) at hudson.Launcher$RemoteLauncher.kill(Launcher.java:953) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:540) at hudson.model.Run.execute(Run.java:1738) at hudson.matrix.MatrixBuild.run(MatrixBuild.java:313) at hudson.model.ResourceController.execute(ResourceController.java:98) at hudson.model.Executor.run(Executor.java:410) Caused by: java.io.IOException: Unexpected termination of the channel at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50) Caused by: java.io.EOFException at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2325) at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2794) at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801) at java.io.ObjectInputStream.<init>(ObjectInputStream.java:299) at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:48) at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)

Oleg Nenashev added a comment - 2016-08-19 20:08

totoroliu this issue has been solved in remoting-2.62 (~~JENKINS-22853~~)
There was also a fix of SocketTimeoutException in remoting-2.62 (~~JENKINS-22722~~), which makes remoting tolerant against SocketTimeout exceptions.

So the remoting layer should be more stable now

Oleg Nenashev added a comment - 2016-08-19 20:08 totoroliu this issue has been solved in remoting-2.62 ( JENKINS-22853 ) There was also a fix of SocketTimeoutException in remoting-2.62 ( JENKINS-22722 ), which makes remoting tolerant against SocketTimeout exceptions. So the remoting layer should be more stable now

Stefan Möbius added a comment - 2016-09-12 08:41

oleg_nenashev: The reference to ~~JENKINS-22853~~ seems to be unrelated. Did you type the wrong number by any chance?
Also, ~~JENKINS-22722~~ states it was fixed in remoting-2.60 (although we still have pretty bad problems with broken connections)

All: Are you running Jenkins on VMs? We noticed that VMware moving VMs between hosts can cause a brief packet loss which can cause Jenkins to loose connection.

Stefan Möbius added a comment - 2016-09-12 08:41 oleg_nenashev : The reference to JENKINS-22853 seems to be unrelated. Did you type the wrong number by any chance? Also, JENKINS-22722 states it was fixed in remoting-2.60 (although we still have pretty bad problems with broken connections) All: Are you running Jenkins on VMs? We noticed that VMware moving VMs between hosts can cause a brief packet loss which can cause Jenkins to loose connection.

Elliott Jones added a comment - 2016-09-13 08:53

We have slave disconnect issues and are running on VMware (both master and slave). From the recent available data, the 'Tasks & Events' history does NOT show a 'Migrate virtual machine' entry at the time of disconnect (for either master or the slave or involved).

We'll continue to monitor, though we've not had any disconnects since our upgrade to Jenkins 2.7.2 and we used to get 1 or 2 a week.

Elliott Jones added a comment - 2016-09-13 08:53 We have slave disconnect issues and are running on VMware (both master and slave). From the recent available data, the 'Tasks & Events' history does NOT show a 'Migrate virtual machine' entry at the time of disconnect (for either master or the slave or involved). We'll continue to monitor, though we've not had any disconnects since our upgrade to Jenkins 2.7.2 and we used to get 1 or 2 a week.

Maciej Kusz added a comment - 2016-09-13 10:59

We;ve got similar problem when our master was on VMware. After migration to Hyper-V from Microsoft problem has been solved. I think that this is some problem with VMware configuration or it's network switch virtualization.

Maciej Kusz added a comment - 2016-09-13 10:59 We;ve got similar problem when our master was on VMware. After migration to Hyper-V from Microsoft problem has been solved. I think that this is some problem with VMware configuration or it's network switch virtualization.

Markus Niklasson added a comment - 2016-09-19 13:21 - edited

Hi,

We have recently also encountered disconnection issues. Slave is a Windows 7 (x64) PC with enough of RAM and CPU to run heavy applications. The Jenkins master is a Enterprise Redhat 7 (3.10.0-327.18.2.el7.x86_64) running Jenkins 2.23 also with enough memory and so on to run Jenkins. Both running Java 8 update 102. The slave are connected through JNLP. Network can be a bit unstable at times.

The following intermittent error occurs very frequently during builds:

Agent went offline during the build
ERROR: Connection was broken: java.io.IOException: Connection aborted: org.jenkinsci.remoting.nio.NioChannelHub$MonoNioTransport@69c08f2a[name=Buildserver]
at org.jenkinsci.remoting.nio.NioChannelHub$NioTransport.abort(NioChannelHub.java:208)
at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:629)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.io.IOException: Connection timed out
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(Unknown Source)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
at sun.nio.ch.IOUtil.read(Unknown Source)
at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:137)
at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:310)
at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:561)
... 6 more

I have unticked "Response Time" from "Preventive Node Monitoring" and Slaves has -Dhudson.slaves.ChannelPinger.pingInterval=1 set.

Any other workaround available?

Markus Niklasson added a comment - 2016-09-19 13:21 - edited Hi, We have recently also encountered disconnection issues. Slave is a Windows 7 (x64) PC with enough of RAM and CPU to run heavy applications. The Jenkins master is a Enterprise Redhat 7 (3.10.0-327.18.2.el7.x86_64) running Jenkins 2.23 also with enough memory and so on to run Jenkins. Both running Java 8 update 102. The slave are connected through JNLP. Network can be a bit unstable at times. The following intermittent error occurs very frequently during builds: Agent went offline during the build ERROR: Connection was broken: java.io.IOException: Connection aborted: org.jenkinsci.remoting.nio.NioChannelHub$MonoNioTransport@69c08f2a [name=Buildserver] at org.jenkinsci.remoting.nio.NioChannelHub$NioTransport.abort(NioChannelHub.java:208) at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:629) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: java.io.IOException: Connection timed out at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(Unknown Source) at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source) at sun.nio.ch.IOUtil.read(Unknown Source) at sun.nio.ch.SocketChannelImpl.read(Unknown Source) at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:137) at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:310) at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:561) ... 6 more I have unticked "Response Time" from "Preventive Node Monitoring" and Slaves has -Dhudson.slaves.ChannelPinger.pingInterval=1 set. Any other workaround available?

Joe George added a comment - 2016-09-19 13:40 - edited

According to the documentation (https://wiki.jenkins-ci.org/display/JENKINS/Ping+Thread) -Dhudson.slaves.ChannelPinger.pingInterval=1 should be set on Master. You should also try setting -Dhudson.remoting.Launcher.pingIntervalSec=-1 on the Slave.

I haven't experience any issues since disabling pinging this way. Next is to start testing different timeout values.

Joe George added a comment - 2016-09-19 13:40 - edited According to the documentation ( https://wiki.jenkins-ci.org/display/JENKINS/Ping+Thread ) -Dhudson.slaves.ChannelPinger.pingInterval=1 should be set on Master . You should also try setting -Dhudson.remoting.Launcher.pingIntervalSec=-1 on the Slave. I haven't experience any issues since disabling pinging this way. Next is to start testing different timeout values.

Markus Niklasson added a comment - 2016-09-21 06:39

Thanks for the tip!

By disabling the ping completely it made it more stable. However, I still experience intermittent connectivity problems. During an execution, the slave computer went offline for a couple of seconds and then reconnects to Jenkins Master as seen in the system log:
—
Accepted connection #7 from /10.31.43.49:52692

Sep 21, 2016 8:14:21 AM INFO jenkins.slaves.DefaultJnlpSlaveReceiver handle

Disconnecting Buildserver as we are reconnected from the current peer

Sep 21, 2016 8:29:49 AM WARNING org.jenkinsci.remoting.nio.NioChannelHub run

Communication problem
java.io.IOException: Connection timed out
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(Unknown Source)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
at sun.nio.ch.IOUtil.read(Unknown Source)
at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:137)
at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:310)
at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:561)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

Sep 21, 2016 8:29:49 AM WARNING jenkins.slaves.JnlpSlaveAgentProtocol$Handler$1 onClosed

NioChannelHub keys=3 gen=842933: Computer.threadPoolForRemoting 2 for Buildserver terminated
java.io.IOException: Connection aborted: org.jenkinsci.remoting.nio.NioChannelHub$MonoNioTransport@28b1969e[name=Buildserver]
at org.jenkinsci.remoting.nio.NioChannelHub$NioTransport.abort(NioChannelHub.java:208)
at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:629)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.io.IOException: Connection timed out
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(Unknown Source)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
at sun.nio.ch.IOUtil.read(Unknown Source)
at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:137)
at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:310)
at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:561)
... 6 more

Any ideas how I can prevent the Master from disconnecting the slave (use the reconnected session instead)?

Markus Niklasson added a comment - 2016-09-21 06:39 Thanks for the tip! By disabling the ping completely it made it more stable. However, I still experience intermittent connectivity problems. During an execution, the slave computer went offline for a couple of seconds and then reconnects to Jenkins Master as seen in the system log: — Accepted connection #7 from /10.31.43.49:52692 Sep 21, 2016 8:14:21 AM INFO jenkins.slaves.DefaultJnlpSlaveReceiver handle Disconnecting Buildserver as we are reconnected from the current peer Sep 21, 2016 8:29:49 AM WARNING org.jenkinsci.remoting.nio.NioChannelHub run Communication problem java.io.IOException: Connection timed out at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(Unknown Source) at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source) at sun.nio.ch.IOUtil.read(Unknown Source) at sun.nio.ch.SocketChannelImpl.read(Unknown Source) at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:137) at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:310) at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:561) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Sep 21, 2016 8:29:49 AM WARNING jenkins.slaves.JnlpSlaveAgentProtocol$Handler$1 onClosed NioChannelHub keys=3 gen=842933: Computer.threadPoolForRemoting 2 for Buildserver terminated java.io.IOException: Connection aborted: org.jenkinsci.remoting.nio.NioChannelHub$MonoNioTransport@28b1969e [name=Buildserver] at org.jenkinsci.remoting.nio.NioChannelHub$NioTransport.abort(NioChannelHub.java:208) at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:629) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: java.io.IOException: Connection timed out at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(Unknown Source) at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source) at sun.nio.ch.IOUtil.read(Unknown Source) at sun.nio.ch.SocketChannelImpl.read(Unknown Source) at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:137) at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:310) at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:561) ... 6 more Any ideas how I can prevent the Master from disconnecting the slave (use the reconnected session instead)?

Oleg Nenashev added a comment - 2019-01-02 10:32

JENKINS-44785 likely addresses this issue in general. There is a pull request to remoting: https://github.com/jenkinsci/remoting/pull/174 , but I have never finished it due to the review feedback.

I will remove the assignee from the ticket for now, see https://groups.google.com/d/msg/jenkinsci-dev/uc6NsMoCFQI/AIO4WG1UCwAJ for the context

Oleg Nenashev added a comment - 2019-01-02 10:32 JENKINS-44785 likely addresses this issue in general. There is a pull request to remoting: https://github.com/jenkinsci/remoting/pull/174 , but I have never finished it due to the review feedback. I will remove the assignee from the ticket for now, see https://groups.google.com/d/msg/jenkinsci-dev/uc6NsMoCFQI/AIO4WG1UCwAJ for the context

Assignee:: Unassigned

Reporter:: cowwoc

Votes:: 58 Vote for this issue

Watchers:: 69 Start watching this issue

Created:: 2013-07-16 21:30

Updated:: 2019-01-02 10:37

Details

Description

Attachments

Issue Links

Activity

Collapse comment: Ramin Baradari added a comment - 2013-09-06 13:11

Expand comment: Ramin Baradari added a comment - 2013-09-06 13:11

Collapse comment: cowwoc added a comment - 2013-09-06 22:12

Expand comment: cowwoc added a comment - 2013-09-06 22:12

Collapse comment: Henri Gomez added a comment - 2013-10-01 10:15

Expand comment: Henri Gomez added a comment - 2013-10-01 10:15

Collapse comment: Tony Greway added a comment - 2013-10-19 19:47

Expand comment: Tony Greway added a comment - 2013-10-19 19:47

Collapse comment: Stoil Valchkov added a comment - 2014-04-22 10:16

Expand comment: Stoil Valchkov added a comment - 2014-04-22 10:16

Collapse comment: Ian Norton added a comment - 2014-05-02 10:22

Expand comment: Ian Norton added a comment - 2014-05-02 10:22

Collapse comment: Sajajd Rehman added a comment - 2014-05-13 13:10

Expand comment: Sajajd Rehman added a comment - 2014-05-13 13:10

Collapse comment: Stoil Valchkov added a comment - 2014-05-21 08:25

Expand comment: Stoil Valchkov added a comment - 2014-05-21 08:25

Collapse comment: Gardner Bickford added a comment - 2014-07-09 19:25

Expand comment: Gardner Bickford added a comment - 2014-07-09 19:25

Collapse comment: Ray Sennewald added a comment - 2014-10-27 07:28

Expand comment: Ray Sennewald added a comment - 2014-10-27 07:28

Collapse comment: Christer Engde added a comment - 2014-11-04 20:21

Expand comment: Christer Engde added a comment - 2014-11-04 20:21

Collapse comment: azher khan added a comment - 2014-11-10 05:05

Expand comment: azher khan added a comment - 2014-11-10 05:05

Collapse comment: Charles Ritchea added a comment - 2014-11-12 15:03

Expand comment: Charles Ritchea added a comment - 2014-11-12 15:03

Collapse comment: mark mann added a comment - 2014-12-16 11:26

Expand comment: mark mann added a comment - 2014-12-16 11:26

Collapse comment: harschware added a comment - 2015-07-23 21:31

Expand comment: harschware added a comment - 2015-07-23 21:31

Collapse comment: Gardner Bickford added a comment - 2015-07-24 01:15, Edited by Gardner Bickford - 2015-07-24 01:15

Expand comment: Gardner Bickford added a comment - 2015-07-24 01:15, Edited by Gardner Bickford - 2015-07-24 01:15

Collapse comment: Kevin Navero added a comment - 2016-01-07 04:22

Expand comment: Kevin Navero added a comment - 2016-01-07 04:22

Collapse comment: John McCullough added a comment - 2016-03-11 21:49

Expand comment: John McCullough added a comment - 2016-03-11 21:49

Collapse comment: Rick Liu added a comment - 2016-06-03 19:54, Edited by Rick Liu - 2016-06-03 19:54

Expand comment: Rick Liu added a comment - 2016-06-03 19:54, Edited by Rick Liu - 2016-06-03 19:54

Collapse comment: Oleg Nenashev added a comment - 2016-08-19 20:08

Expand comment: Oleg Nenashev added a comment - 2016-08-19 20:08

Collapse comment: Stefan Möbius added a comment - 2016-09-12 08:41

Expand comment: Stefan Möbius added a comment - 2016-09-12 08:41

Collapse comment: Elliott Jones added a comment - 2016-09-13 08:53

Expand comment: Elliott Jones added a comment - 2016-09-13 08:53

Collapse comment: Maciej Kusz added a comment - 2016-09-13 10:59

Expand comment: Maciej Kusz added a comment - 2016-09-13 10:59

Collapse comment: Markus Niklasson added a comment - 2016-09-19 13:21, Edited by Markus Niklasson - 2016-09-19 13:22

Expand comment: Markus Niklasson added a comment - 2016-09-19 13:21, Edited by Markus Niklasson - 2016-09-19 13:22

Collapse comment: Joe George added a comment - 2016-09-19 13:40, Edited by Joe George - 2016-09-19 13:40

Expand comment: Joe George added a comment - 2016-09-19 13:40, Edited by Joe George - 2016-09-19 13:40

Collapse comment: Markus Niklasson added a comment - 2016-09-21 06:39

Expand comment: Markus Niklasson added a comment - 2016-09-21 06:39

Collapse comment: Oleg Nenashev added a comment - 2019-01-02 10:32

Expand comment: Oleg Nenashev added a comment - 2019-01-02 10:32

People

Dates