-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
Jenkins 2.46.2
Slave.jar version: 3.7
SSH Agent plugin 1.15
-
Powered by SuggestiMate
Randomly getting Agent diconnect with this output:
java.nio.channels.ClosedChannelException 19:33:02 at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:154)
19:33:02 at org.jenkinsci.remoting.protocol.impl.NIONetworkLayer.ready(NIONetworkLayer.java:179)
19:33:02 at org.jenkinsci.remoting.protocol.IOHub$OnReady.run(IOHub.java:721)
19:33:02 at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
19:33:02 at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
19:33:02 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
19:33:02 at java.lang.Thread.run(Unknown Source)
19:33:02 Caused: java.io.IOException: Backing channel 'JNLP4-connect connection from wr2czc42446kf.jdnet.deere.com/172.23.213.39:59664' is disconnected.
19:33:02 at hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:192)
19:33:02 at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:257)
19:33:02 at com.sun.proxy.$Proxy74.isAlive(Unknown Source)
19:33:02 at hudson.Launcher$RemoteLauncher$ProcImpl.isAlive(Launcher.java:1043)
19:33:02 at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:1035)
19:33:02 at hudson.tasks.CommandInterpreter.join(CommandInterpreter.java:155)
19:33:02 at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:109)
19:33:02 at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:66)
19:33:02 at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
19:33:02 at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:779)
19:33:02 at hudson.model.Build$BuildExecution.build(Build.java:206)
19:33:02 at hudson.model.Build$BuildExecution.doRun(Build.java:163)
19:33:02 at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:534)
19:33:02 at com.tikal.jenkins.plugins.multijob.MultiJobBuild$MultiJobRunnerImpl.run(MultiJobBuild.java:136)
19:33:02 at hudson.model.Run.execute(Run.java:1728)
19:33:02 at com.tikal.jenkins.plugins.multijob.MultiJobBuild.run(MultiJobBuild.java:73)
19:33:02 at hudson.model.ResourceController.execute(ResourceController.java:98)
19:33:02 at hudson.model.Executor.run(Executor.java:405)
[JENKINS-44132] Random JNLP Agent disconnect
further investigation into slave-wrapper.log shows that something is killing the jenkins service on the slave.
2018-02-27 01:16:11,305 INFO - Starting C:\Program Files (x86)\Java\jre1.8.0_91\bin\java.exe -Xrs -jar "e:\Jenkins-aws\slave.jar" -jnlpUrl https://jenkins-scrm.aws.aspect.com/computer/viper/slave-agent.jnlp -secret 14297f83b25a5e7a6c9539f75a47629308ded2467bcc54c91eb368e3846e98b6
2018-02-27 01:16:11,636 INFO - Started process 9848
2018-02-27 01:16:11,645 DEBUG - Forwarding logs of the process System.Diagnostics.Process (java) to winsw.SizeBasedRollingLogAppender
2018-02-27 01:16:11,651 INFO - Recording PID of the started process:9848. PID file destination is e:\Jenkins-aws\jenkins_agent.pid
2018-02-27 01:16:29,998 DEBUG - Starting ServiceWrapper in the CLI mode
2018-02-27 01:16:30,406 DEBUG - User requested the status of the process with id 'jenkinsslave-e__Jenkins-aws'
2018-02-27 01:16:30,409 DEBUG - Completed. Exit code is 0
2018-02-27 21:29:01,440 DEBUG - Starting ServiceWrapper in the CLI mode
2018-02-27 21:29:03,025 INFO - Restarting the service with id 'jenkinsslave-e__Jenkins-aws'
2018-02-27 21:29:03,067 DEBUG - Completed. Exit code is 0
2018-02-27 21:29:03,436 DEBUG - Starting ServiceWrapper in the CLI mode
2018-02-27 21:29:04,009 INFO - Restarting the service with id 'jenkinsslave-e__Jenkins-aws'
2018-02-27 21:29:04,211 INFO - Stopping jenkinsslave-e__Jenkins-aws
2018-02-27 21:29:04,212 DEBUG - ProcessKill 9848
2018-02-27 21:29:05,292 INFO - Found child process: 5592 Name: cmd.exe
2018-02-27 21:29:06,236 INFO - Found child process: 25792 Name: timeout.exe
2018-02-27 21:29:07,057 INFO - Stopping process 25792
2018-02-27 21:29:07,222 INFO - Process 25792 is already stopped
System.ArgumentException: Process with an Id of 25792 is not running.
at System.Diagnostics.Process.GetProcessById(Int32 processId, String machineName)
at winsw.Util.ProcessHelper.StopProcess(Int32 pid, TimeSpan stopTimeout)
2018-02-27 21:29:07,285 INFO - Stopping process 5592
2018-02-27 21:29:07,416 INFO - Send SIGINT 5592
2018-02-27 21:29:07,556 INFO - SIGINT to5592 successful
2018-02-27 21:29:07,557 INFO - Stopping process 9848
2018-02-27 21:29:07,630 INFO - Send SIGINT 9848
2018-02-27 21:29:07,721 INFO - SIGINT to9848 successful
2018-02-27 21:29:07,726 INFO - Finished jenkinsslave-e__Jenkins-aws
2018-02-27 21:29:07,727 DEBUG - Completed. Exit code is 0
2018-02-27 21:29:08,578 INFO - Starting ServiceWrapper in the service mode
2018-02-27 21:29:08,628 DEBUG - Completed. Exit code is 0
2018-02-27 21:29:08,665 INFO - Downloading: https://jenkins-scrm.aws.aspect.com/jnlpJars/slave.jar to e:\Jenkins-aws\slave.jar. failOnError=False
2018-02-27 21:29:09,241 ERROR - Failed to download https://jenkins-scrm.aws.aspect.com/jnlpJars/slave.jar to e:\Jenkins-aws\slave.jar
System.Net.WebException: The request was aborted: Could not create SSL/TLS secure channel.
at System.Net.HttpWebRequest.GetResponse()
at winsw.Download.Perform()
at winsw.WrapperService.OnStart(String[] _)
2018-02-27 21:29:09,336 INFO - Starting C:\Program Files (x86)\Java\jre1.8.0_91\bin\java.exe -Xrs -jar "e:\Jenkins-aws\slave.jar" -jnlpUrl https://jenkins-scrm.aws.aspect.com/computer/viper/slave-agent.jnlp -secret 14297f83b25a5e7a6c9539f75a47629308ded2467bcc54c91eb368e3846e98b6
2018-02-27 21:29:09,351 INFO - Extension loaded: killOnStartup
2018-02-27 21:29:09,365 DEBUG - Checking the potentially runaway process with PID=9848
2018-02-27 21:29:09,457 DEBUG - No runaway process with PID=9848. The process has been already stopped.
2018-02-27 21:29:09,460 INFO - Starting C:\Program Files (x86)\Java\jre1.8.0_91\bin\java.exe -Xrs -jar "e:\Jenkins-aws\slave.jar" -jnlpUrl https://jenkins-scrm.aws.aspect.com/computer/viper/slave-agent.jnlp -secret 14297f83b25a5e7a6c9539f75a47629308ded2467bcc54c91eb368e3846e98b6
2018-02-27 21:29:09,867 INFO - Started process 33900
2018-02-27 21:29:09,946 DEBUG - Forwarding logs of the process System.Diagnostics.Process (java) to winsw.SizeBasedRollingLogAppender
2018-02-27 21:29:09,954 INFO - Recording PID of the started process:33900. PID file destination is e:\Jenkins-aws\jenkins_agent.pid
2018-02-27 21:29:42,669 DEBUG - Starting ServiceWrapper in the CLI mode
2018-02-27 21:29:43,099 DEBUG - User requested the status of the process with id 'jenkinsslave-e__Jenkins-aws'
2018-02-27 21:29:43,105 DEBUG - Completed. Exit code is 0
Any help is appreciated to find what is doing this???
Hi, we have the same issue on Windows Server 2008 R2 SP1 with:
- Jenkins 2.117
- slave.jar version 3.19
Random during the day the slave server disconnected from the master, following the error:
INFO: Failed to synchronize IO streams on the channel hudson.remoting.Channel@40caf784:JNLP4-connect connection to serverMasterName/serverMasterIP:port
hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on JNLP4-connect connection to serverMasterName/serverMasterIP:port failed. The channel is closing down or has closed down
at hudson.remoting.Channel.call(Channel.java:948)
at hudson.remoting.Channel.syncIO(Channel.java:1683)
at hudson.Launcher$RemoteLaunchCallable$1.join(Launcher.java:1315)
at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at hudson.remoting.RemoteInvocationHandler$RPCRequest.perform(RemoteInvocationHandler.java:929)
at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:903)
at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:855)
at hudson.remoting.UserRequest.perform(UserRequest.java:212)
at hudson.remoting.UserRequest.perform(UserRequest.java:54)
at hudson.remoting.Request$2.run(Request.java:369)
at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:93)
at java.lang.Thread.run(Unknown Source)
Caused by: java.nio.channels.ClosedChannelException
at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:154)
at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer.access$1800(BIONetworkLayer.java:48)
at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer$Reader.run(BIONetworkLayer.java:264)
... 4 more
apr 20, 2018 3:36:20 AM hudson.remoting.UserRequest perform
WARNING: LinkageError while performing UserRequest:UserRPCRequest:hudson.Launcher$RemoteProcess.join[](58)
java.lang.NoClassDefFoundError: hudson/util/ProcessKillingVeto
at hudson.util.ProcessTree$OSProcess.getVeto(ProcessTree.java:244)
at hudson.util.ProcessTree$WindowsOSProcess.killRecursively(ProcessTree.java:431)
at hudson.util.ProcessTree.killAll(ProcessTree.java:146)
at hudson.Proc$LocalProc.destroy(Proc.java:384)
at hudson.Proc$LocalProc.join(Proc.java:357)
at hudson.Launcher$RemoteLaunchCallable$1.join(Launcher.java:1304)
at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at hudson.remoting.RemoteInvocationHandler$RPCRequest.perform(RemoteInvocationHandler.java:929)
at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:903)
at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:855)
at hudson.remoting.UserRequest.perform(UserRequest.java:212)
at hudson.remoting.UserRequest.perform(UserRequest.java:54)
at hudson.remoting.Request$2.run(Request.java:369)
at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:93)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.ClassNotFoundException: hudson.util.ProcessKillingVeto
at java.net.URLClassLoader.findClass(Unknown Source)
at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:171)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
... 21 more
I ran into this with a Windows build agent connecting via jnlp where the connection was going through an AWS ELB that was in front of our Kubernetes cluster where the Jenkins Master is running. (Note that we're using the ELB not so much for load balancing, there is only one master after all, but just as a way to provide access to the private remoting port from outside the kube cluster.)
The idle connection timeout on the ELB was set to 60 seconds and we had repo cloning that was taking longer than a minute and not producing any output during that time.
Seems to be fixed by changing the ELB idle connection timeout to something higher (300 seconds or more seems okay, maybe if you have a very silent, very long build step you might need more).
Also funeeldy, I don't know if you realize this or not, but you put your connection secret in the post, you should maybe change that
I'm getting this on Jenkins 2.121.2, on one of our several build agents. The remaining agents are functioning correctly. The failing agent was working fine until 4 days ago, but now it gets this error 100% of the time. I can't think of anything that changed between then and now. The master is running on a Win10 VM, as is the build agent, on the same local subnet.
I've tried restarting both the master and the slave systems, but that did not resolve the problem.
I have Agent -> Master Access Control disabled since the entire system is under my control behind our company's network firewall.
Any assistance would be appreciated.
I have the exact same problem. Jenkins master on linux, Slaves run on Windows 1803 Containers. Any ideas plz?
I have exactly the same problem. Jenkins version 2.109 on Windows 10 with linux slaves running RHEL5 and RHEL7. It strikes every few days and seems to me to be to do with transient network errors. I think that jenkins needs to be more robust in the face of transient network errors.
I have seen similar random build problem due to subversion flaking out over transient network errors. Subversion has been changed recently to be more robust in this area and things are noticably better. I think something similar is needed for jenkins.
I managed to resolve this locally. I looked at the server logs and saw an entry about an out of memory exception when the new agent was trying to connect further research into the problem indicated that Java was not able to allocate a new thread because I was using too much heap memory. My Jenkins startup parameters had this:
-Xrs -Xmx1536m
And there was a comment in the jenkins.xml file indicated the parameters were set that way because they had seen a lot of out of memory exceptions. But those parameters were the cause of the problem.
I replaced those two parameters with:
-Xms256m -Xmx512m
And now I have a master with 7 build agents connected and the system is stable.
Of course this specific solution may not solve other people's problems because in our case it was being caused by a previous maintainer not really understanding the implications of giving Java so much heap space. But the advice about examining the Jenkins server logs stands for everyone.
p.s. we would also get random disconnects from previously-connected agents but that turned out to be problem with the Power Management scheme on the agent. Be sure to disable all power-saving options so the agent is always running.
Several of these issues involve similar reports but possibly very different causes. Frequently the error indicates that the channel is closed but provides no indication as to how or why that occurred. Commonly remoting issues involve something in the networking or system environment terminating the connection from outside the process. The trick can be to determine what is doing that. In one instance (JENKINS-52922), Nush Ahmd discovered that setting hudson.slaves.ChannelPinger.pingIntervalSeconds kept the channel from getting disconnected. Or as sfabian noted in JENKINS-48895, fiddling with Windows sleep / hibernate options. Or various timeouts.
One thing that can help is to increase agent or master logging output. You can read about it here: https://github.com/jenkinsci/remoting/blob/master/docs/logging.md . In summary, if you add a java.util.logging properties file and then reference it via the `-loggingConfig` parameter to the agent. For example something like this: `-loggingConfig jenkins-logging.properties`.
Without further information it is difficult to diagnose anything from this side. Frequently the error is environmental.
Closing for lack of sufficient diagnostics and information to reproduce after no response for quite a while.
Seeing the same thing. Tried setting the display and power settings with no effect.
When will this be fixed please?
WARNING: LinkageError while performing UserRequest:RPCRequest(9,join)
java.lang.NoClassDefFoundError: hudson/util/ProcessKillingVeto
From the slave log:
Feb 27, 2018 9:28:49 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Terminated
Feb 27, 2018 9:28:49 PM hudson.remoting.UserRequest perform
WARNING: LinkageError while performing UserRequest:RPCRequest(9,join)
java.lang.NoClassDefFoundError: hudson/util/ProcessKillingVeto
at hudson.util.ProcessTree$OSProcess.getVeto(ProcessTree.java:242)
at hudson.util.ProcessTree$WindowsOSProcess.killRecursively(ProcessTree.java:429)
at hudson.util.ProcessTree.killAll(ProcessTree.java:146)
at hudson.Proc$LocalProc.destroy(Proc.java:384)
at hudson.Proc$LocalProc.join(Proc.java:357)
at hudson.Launcher$RemoteLaunchCallable$1.join(Launcher.java:1304)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at hudson.remoting.RemoteInvocationHandler$RPCRequest.perform(RemoteInvocationHandler.java:896)
at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:870)
at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:829)
at hudson.remoting.UserRequest.perform(UserRequest.java:208)
at hudson.remoting.UserRequest.perform(UserRequest.java:54)
at hudson.remoting.Request$2.run(Request.java:360)
at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at hudson.remoting.Engine$1$1.run(Engine.java:98)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.ClassNotFoundException: hudson.util.ProcessKillingVeto
at java.net.URLClassLoader.findClass(Unknown Source)
at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:157)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
... 22 more
Feb 27, 2018 9:28:50 PM hudson.remoting.Request$2 run
WARNING: Failed to send back a reply to the request hudson.remoting.Request$2@1d3c6b1
hudson.remoting.ChannelClosedException: channel is already closed
at hudson.remoting.Channel.send(Channel.java:667)
at hudson.remoting.Request$2.run(Request.java:372)
at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at hudson.remoting.Engine$1$1.run(Engine.java:98)
at java.lang.Thread.run(Unknown Source)
Caused by: java.nio.channels.ClosedChannelException
at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:154)
at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer.access$1800(BIONetworkLayer.java:48)
at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer$Reader.run(BIONetworkLayer.java:264)
... 4 more
Feb 27, 2018 9:28:59 PM jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$2$1 onReconnect
INFO: Restarting agent via jenkins.slaves.restarter.WinswSlaveRestarter@1c0296c