Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-62885

ClosedChannelException with a history of a Java heap dump a few days before

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Fixed but Unreleased (View Workflow)
    • Priority: Major
    • Resolution: Not A Defect
    • Component/s: remoting
    • Labels:
      None
    • Environment:
      Agent: Windows Datacenter 2016 with 16GB RAM from Azure
      Server: a custom Docker image off jenkins/jenkins:lts running with a Docker daemon on RedHat 7 with 32GB RAM from Azure
    • Similar Issues:

      Description

      We have a long-running mute task (Fortify's sourceanalyzer) that is supposed to chug on the analysis for 16 hours (as it did on 2020-05-20). This started getting disrupted in 5.5 hours with the last Jenkins version 2.242 (or a couple versions before it) and/or last Jenkins plugin updates in the last month.

      15:38:13 Analyze...
      15:38:13 + "/cygdrive/e/fortify/bin/sourceanalyzer" -Xmx12000M -debug -logfile "fortify\\fortify-scan.log" -b "PROJECT" -64 -scan -f "fortify\\fortify-PROJECT.fpr"
      21:15:58 FATAL: command execution failed
      21:15:58 java.nio.channels.ClosedChannelException
      21:15:58 	at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer.onReadClosed(ChannelApplicationLayer.java:209)
      21:15:58 	at org.jenkinsci.remoting.protocol.ApplicationLayer.onRecvClosed(ApplicationLayer.java:221)
      21:15:58 	at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:816)
      21:15:58 	at org.jenkinsci.remoting.protocol.FilterLayer.onRecvClosed(FilterLayer.java:287)
      21:15:58 	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecvClosed(SSLEngineFilterLayer.java:181)
      21:15:58 	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.switchToNoSecure(SSLEngineFilterLayer.java:283)
      21:15:58 	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processWrite(SSLEngineFilterLayer.java:503)
      21:15:58 	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processQueuedWrites(SSLEngineFilterLayer.java:248)
      21:15:58 	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.doSend(SSLEngineFilterLayer.java:200)
      21:15:58 	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.doCloseSend(SSLEngineFilterLayer.java:213)
      21:15:58 	at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.doCloseSend(ProtocolStack.java:784)
      21:15:58 	at org.jenkinsci.remoting.protocol.ApplicationLayer.doCloseWrite(ApplicationLayer.java:172)
      21:15:58 	at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer$ByteBufferCommandTransport.closeWrite(ChannelApplicationLayer.java:314)
      21:15:58 	at hudson.remoting.Channel.close(Channel.java:1493)
      21:15:58 	at hudson.remoting.Channel.close(Channel.java:1446)
      21:15:58 	at hudson.slaves.SlaveComputer.closeChannel(SlaveComputer.java:877)
      21:15:58 	at hudson.slaves.SlaveComputer.access$100(SlaveComputer.java:113)
      21:15:58 	at hudson.slaves.SlaveComputer$2.run(SlaveComputer.java:768)
      21:15:58 	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
      21:15:58 	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
      21:15:58 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      21:15:58 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      21:15:58 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      21:15:58 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      21:15:58 	at java.lang.Thread.run(Thread.java:748)
      21:15:58 Caused: java.io.IOException: Backing channel 'JNLP4-connect connection from CLIENTADDR/CLIENTADDR:50112' is disconnected.
      21:15:58 	at hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:216)
      21:15:58 	at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:285)
      21:15:58 	at com.sun.proxy.$Proxy102.isAlive(Unknown Source)
      21:15:58 	at hudson.Launcher$RemoteLauncher$ProcImpl.isAlive(Launcher.java:1147)
      21:15:58 	at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:1139)
      21:15:58 	at hudson.tasks.CommandInterpreter.join(CommandInterpreter.java:155)
      21:15:58 	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:109)
      21:15:58 	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:66)
      21:15:58 	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
      21:15:58 	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:741)
      21:15:58 	at hudson.model.Build$BuildExecution.build(Build.java:206)
      21:15:58 	at hudson.model.Build$BuildExecution.doRun(Build.java:163)
      21:15:58 	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:504)
      21:15:58 	at hudson.model.Run.execute(Run.java:1880)
      21:15:58 	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
      21:15:58 	at hudson.model.ResourceController.execute(ResourceController.java:97)
      21:15:58 	at hudson.model.Executor.run(Executor.java:428)
      21:15:58 FATAL: Unable to delete script file C:\windows\TEMP\jenkins8068258613156310901.bat
      21:15:58 java.nio.channels.ClosedChannelException
      21:15:58 	at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer.onReadClosed(ChannelApplicationLayer.java:209)
      21:15:58 	at org.jenkinsci.remoting.protocol.ApplicationLayer.onRecvClosed(ApplicationLayer.java:221)
      21:15:58 	at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:816)
      21:15:58 	at org.jenkinsci.remoting.protocol.FilterLayer.onRecvClosed(FilterLayer.java:287)
      21:15:58 	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecvClosed(SSLEngineFilterLayer.java:181)
      21:15:58 	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.switchToNoSecure(SSLEngineFilterLayer.java:283)
      21:15:58 	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processWrite(SSLEngineFilterLayer.java:503)
      21:15:58 	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processQueuedWrites(SSLEngineFilterLayer.java:248)
      21:15:58 	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.doSend(SSLEngineFilterLayer.java:200)
      21:15:58 	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.doCloseSend(SSLEngineFilterLayer.java:213)
      21:15:58 	at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.doCloseSend(ProtocolStack.java:784)
      21:15:58 	at org.jenkinsci.remoting.protocol.ApplicationLayer.doCloseWrite(ApplicationLayer.java:172)
      21:15:58 	at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer$ByteBufferCommandTransport.closeWrite(ChannelApplicationLayer.java:314)
      21:15:58 	at hudson.remoting.Channel.close(Channel.java:1493)
      21:15:58 	at hudson.remoting.Channel.close(Channel.java:1446)
      21:15:58 	at hudson.slaves.SlaveComputer.closeChannel(SlaveComputer.java:877)
      21:15:58 	at hudson.slaves.SlaveComputer.access$100(SlaveComputer.java:113)
      21:15:58 	at hudson.slaves.SlaveComputer$2.run(SlaveComputer.java:768)
      21:15:58 	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
      21:15:58 	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
      21:15:58 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      21:15:58 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      21:15:58 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      21:15:58 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      21:15:58 	at java.lang.Thread.run(Thread.java:748)
      21:15:58 Caused: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@42036b30:JNLP4-connect connection from CLIENTADDR/CLIENTADDR:50112": Remote call on JNLP4-connect connection from CLIENTADDR/CLIENTADDR:50112 failed. The channel is closing down or has closed down
      21:15:58 	at hudson.remoting.Channel.call(Channel.java:991)
      21:15:58 	at hudson.FilePath.act(FilePath.java:1069)
      21:15:58 	at hudson.FilePath.act(FilePath.java:1058)
      21:15:58 	at hudson.FilePath.delete(FilePath.java:1543)
      21:15:58 	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:123)
      21:15:58 	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:66)
      21:15:58 	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
      21:15:58 	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:741)
      21:15:58 	at hudson.model.Build$BuildExecution.build(Build.java:206)
      21:15:58 	at hudson.model.Build$BuildExecution.doRun(Build.java:163)
      21:15:58 	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:504)
      21:15:58 	at hudson.model.Run.execute(Run.java:1880)
      21:15:58 	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
      21:15:58 	at hudson.model.ResourceController.execute(ResourceController.java:97)
      21:15:58 	at hudson.model.Executor.run(Executor.java:428)
      21:15:58 Build step 'Execute Windows batch command' marked build as failure
      

      My Jenkins agent's jar is dated 2020-05-05 and has a version 4.3 in it,

      $ unzip -p /cygdrive/e/jenkins_agent/agent.jar jenkins/remoting/jenkins-version.properties
      version=4.3
      

      Its wrapper log shows no records between 2 days ago 2020-06-27 and today's 2020-06-29 disconnect at 21:16.

      2020-06-27 12:48:40,059 DEBUG - Completed. Exit code is 0
      2020-06-29 21:16:13,033 DEBUG - Starting WinSW in the CLI mode
      2020-06-29 21:16:20,828 INFO  - Restarting the service with id 'jenkins_agent'
      2020-06-29 21:16:20,987 DEBUG - Completed. Exit code is 0
      2020-06-29 21:16:21,338 DEBUG - Starting WinSW in the CLI mode
      2020-06-29 21:16:24,014 INFO  - Restarting the service with id 'jenkins_agent'
      2020-06-29 21:16:24,162 INFO  - Stopping jenkins_agent
      2020-06-29 21:16:24,176 DEBUG - ProcessKill 10260
      2020-06-29 21:16:24,492 INFO  - Found child process: 14484 Name: conhost.exe
      2020-06-29 21:16:24,933 INFO  - Stopping process 14484
      2020-06-29 21:16:24,942 INFO  - Send SIGINT 14484
      2020-06-29 21:16:24,949 WARN  - SIGINT to 14484 failed - Killing as fallback
      2020-06-29 21:16:24,957 INFO  - Stopping process 10260
      2020-06-29 21:16:24,963 INFO  - Send SIGINT 10260
      2020-06-29 21:16:24,984 WARN  - SIGINT to 10260 failed - Killing as fallback
      2020-06-29 21:16:24,994 INFO  - Finished jenkins_agent
      2020-06-29 21:16:24,997 DEBUG - Completed. Exit code is 0
      2020-06-29 21:16:26,007 DEBUG - Starting WinSW in the service mode
      2020-06-29 21:16:26,052 DEBUG - Completed. Exit code is 0
      2020-06-29 21:16:26,202 DEBUG - Checking the potentially runaway process with PID=10260
      2020-06-29 21:16:26,212 DEBUG - No runaway process with PID=10260. The process has been already stopped.
      2020-06-29 21:16:26,399 INFO  - Started process 13932
      2020-06-29 21:16:26,451 DEBUG - Forwarding logs of the process System.Diagnostics.Process (java) to winsw.SizeBasedRollingLogAppender
      2020-06-29 21:16:26,472 INFO  - Recording PID of the started process:13932. PID file destination is E:\jenkins_agent\jenkins_agent.pid
      2020-06-29 21:16:39,440 DEBUG - Starting WinSW in the CLI mode
      2020-06-29 21:16:39,710 DEBUG - User requested the status of the process with id 'jenkins_agent'
      2020-06-29 21:16:39,718 DEBUG - Completed. Exit code is 0
      

      The agent's error log reveals an out-of-memory error. (I wonder if this is due to our agent's Windows Datacenter 2016 machine having some temporary memory size fluctuations? Could it be an Azure Spot instance? I don't know).

      Jun 27, 2020 12:48:35 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Agent discovery successful
        Agent address: SERVERHOST
        Agent port:    50000
        Identity:      07:d6:1b:ab:77:2a:99:d2:bd:bd:06:17:f7:03:2e:ed
      Jun 27, 2020 12:48:35 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Handshaking
      Jun 27, 2020 12:48:35 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Connecting to SERVERHOST:50000
      Jun 27, 2020 12:48:35 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Trying protocol: JNLP4-connect
      Jun 27, 2020 12:48:35 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Remote identity confirmed: 07:d6:1b:ab:77:2a:99:d2:bd:bd:06:17:f7:03:2e:ed
      Jun 27, 2020 12:48:37 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Connected
      Jun 29, 2020 9:11:58 PM hudson.remoting.Channel$1 handle
      SEVERE: Failed to execute command UserRequest:hudson.remoting.PingThread$Ping@38f129d5 (channel JNLP4-connect connection to SERVERHOST/SERVERADDR:50000)
      java.lang.OutOfMemoryError: unable to create new native thread
              at java.lang.Thread.start0(Native Method)
              at java.lang.Thread.start(Thread.java:717)
              at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
              at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378)
              at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134)
              at hudson.remoting.DelegatingExecutorService.submit(DelegatingExecutorService.java:42)
              at hudson.remoting.InterceptingExecutorService.submit(InterceptingExecutorService.java:46)
              at hudson.remoting.InterceptingExecutorService.submit(InterceptingExecutorService.java:41)
              at hudson.remoting.Request.execute(Request.java:348)
              at hudson.remoting.Channel$1.handle(Channel.java:606)
              at hudson.remoting.AbstractByteBufferCommandTransport.processCommand(AbstractByteBufferCommandTransport.java:203)
              at hudson.remoting.AbstractByteBufferCommandTransport.receive(AbstractByteBufferCommandTransport.java:189)
              at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer.onRead(ChannelApplicationLayer.java:187)
              at org.jenkinsci.remoting.protocol.ApplicationLayer.onRecv(ApplicationLayer.java:206)
              at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecv(ProtocolStack.java:668)
              at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processRead(SSLEngineFilterLayer.java:369)
              at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecv(SSLEngineFilterLayer.java:117)
              at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecv(ProtocolStack.java:668)
              at org.jenkinsci.remoting.protocol.NetworkLayer.onRead(NetworkLayer.java:136)
              at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer.access$2200(BIONetworkLayer.java:48)
              at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer$Reader.run(BIONetworkLayer.java:283)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:117)
              at java.lang.Thread.run(Thread.java:748)
      
      Jun 29, 2020 9:11:58 PM hudson.remoting.Channel$1 handle
      SEVERE: This command is created here
      Command UserRequest:hudson.remoting.PingThread$Ping@38f129d5 created at
              at hudson.remoting.Command.<init>(Command.java:81)
              at hudson.remoting.Request.<init>(Request.java:112)
              at hudson.remoting.Request.<init>(Request.java:107)
              at hudson.remoting.UserRequest.<init>(UserRequest.java:77)
              at hudson.remoting.Channel.callAsync(Channel.java:1028)
              at hudson.remoting.PingThread.ping(PingThread.java:109)
              at hudson.remoting.PingThread.run(PingThread.java:89)
      
      Jun 29, 2020 9:15:58 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Terminated
      Jun 29, 2020 9:16:08 PM hudson.util.ProcessTree getKillers
      WARNING: Failed to obtain killers
      hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@71ab9337:JNLP4-connect connection to SERVERHOST/SERVERADDR:50000": Remote call on JNLP4-connect connection to SERVERHOST/SERVERADDR:50000 failed. The channel is closing down or has closed down
              at hudson.remoting.Channel.call(Channel.java:991)
              at hudson.util.ProcessTree.getKillers(ProcessTree.java:198)
              at hudson.util.ProcessTree$OSProcess.killByKiller(ProcessTree.java:261)
              at hudson.util.ProcessTree$WindowsOSProcess.killRecursively(ProcessTree.java:546)
              at hudson.util.ProcessTree.killAll(ProcessTree.java:182)
              at hudson.Proc$LocalProc.destroy(Proc.java:387)
              at hudson.Proc$LocalProc.join(Proc.java:360)
              at hudson.Launcher$RemoteLaunchCallable$1.join(Launcher.java:1321)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.lang.reflect.Method.invoke(Method.java:498)
              at hudson.remoting.RemoteInvocationHandler$RPCRequest.perform(RemoteInvocationHandler.java:931)
              at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:905)
              at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:857)
              at hudson.remoting.UserRequest.perform(UserRequest.java:211)
              at hudson.remoting.UserRequest.perform(UserRequest.java:54)
              at hudson.remoting.Request$2.run(Request.java:369)
              at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:117)
              at java.lang.Thread.run(Thread.java:748)
      Caused by: hudson.remoting.Channel$OrderlyShutdown: Command Close created at
              at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1314)
              at hudson.remoting.Channel$1.handle(Channel.java:606)
              at hudson.remoting.AbstractByteBufferCommandTransport.processCommand(AbstractByteBufferCommandTransport.java:203)
              at hudson.remoting.AbstractByteBufferCommandTransport.receive(AbstractByteBufferCommandTransport.java:189)
              at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer.onRead(ChannelApplicationLayer.java:187)
              at org.jenkinsci.remoting.protocol.ApplicationLayer.onRecv(ApplicationLayer.java:206)
              at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecv(ProtocolStack.java:668)
              at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processRead(SSLEngineFilterLayer.java:369)
              at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecv(SSLEngineFilterLayer.java:117)
              at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecv(ProtocolStack.java:668)
              at org.jenkinsci.remoting.protocol.NetworkLayer.onRead(NetworkLayer.java:136)
              at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer.access$2200(BIONetworkLayer.java:48)
              at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer$Reader.run(BIONetworkLayer.java:283)
              ... 4 more
      Caused by: Command Close created at
              at hudson.remoting.Command.<init>(Command.java:70)
              at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:1308)
              at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:1306)
              at hudson.remoting.Channel.close(Channel.java:1479)
              at hudson.remoting.Channel.close(Channel.java:1446)
              at hudson.slaves.SlaveComputer.closeChannel(SlaveComputer.java:877)
              at hudson.slaves.SlaveComputer.access$100(SlaveComputer.java:113)
              at hudson.slaves.SlaveComputer$2.run(SlaveComputer.java:768)
              at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
              at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              ... 1 more
      
      Jun 29, 2020 9:16:09 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Performing onReconnect operation.
      Jun 29, 2020 9:16:09 PM jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$FindEffectiveRestarters$1 onReconnect
      INFO: Restarting agent via jenkins.slaves.restarter.WinswSlaveRestarter@6ebaa9e7
      Jun 29, 2020 9:16:21 PM hudson.Launcher$RemoteLaunchCallable$1 join
      INFO: Failed to synchronize IO streams on the channel hudson.remoting.Channel@71ab9337:JNLP4-connect connection to SERVERHOST/SERVERADDR:50000
      hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@71ab9337:JNLP4-connect connection to SERVERHOST/SERVERADDR:50000": Remote call on JNLP4-connect connection to SERVERHOST/SERVERADDR:50000 failed. The channel is closing down or has closed down
              at hudson.remoting.Channel.call(Channel.java:991)
              at hudson.remoting.Channel.syncIO(Channel.java:1730)
              at hudson.Launcher$RemoteLaunchCallable$1.join(Launcher.java:1328)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.lang.reflect.Method.invoke(Method.java:498)
              at hudson.remoting.RemoteInvocationHandler$RPCRequest.perform(RemoteInvocationHandler.java:931)
              at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:905)
              at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:857)
              at hudson.remoting.UserRequest.perform(UserRequest.java:211)
              at hudson.remoting.UserRequest.perform(UserRequest.java:54)
              at hudson.remoting.Request$2.run(Request.java:369)
              at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:117)
              at java.lang.Thread.run(Thread.java:748)
      Caused by: hudson.remoting.Channel$OrderlyShutdown: Command Close created at
              at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1314)
              at hudson.remoting.Channel$1.handle(Channel.java:606)
              at hudson.remoting.AbstractByteBufferCommandTransport.processCommand(AbstractByteBufferCommandTransport.java:203)
              at hudson.remoting.AbstractByteBufferCommandTransport.receive(AbstractByteBufferCommandTransport.java:189)
              at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer.onRead(ChannelApplicationLayer.java:187)
              at org.jenkinsci.remoting.protocol.ApplicationLayer.onRecv(ApplicationLayer.java:206)
              at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecv(ProtocolStack.java:668)
              at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processRead(SSLEngineFilterLayer.java:369)
              at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecv(SSLEngineFilterLayer.java:117)
              at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecv(ProtocolStack.java:668)
              at org.jenkinsci.remoting.protocol.NetworkLayer.onRead(NetworkLayer.java:136)
              at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer.access$2200(BIONetworkLayer.java:48)
              at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer$Reader.run(BIONetworkLayer.java:283)
              ... 4 more
      Caused by: Command Close created at
              at hudson.remoting.Command.<init>(Command.java:70)
              at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:1308)
              at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:1306)
              at hudson.remoting.Channel.close(Channel.java:1479)
              at hudson.remoting.Channel.close(Channel.java:1446)
              at hudson.slaves.SlaveComputer.closeChannel(SlaveComputer.java:877)
              at hudson.slaves.SlaveComputer.access$100(SlaveComputer.java:113)
              at hudson.slaves.SlaveComputer$2.run(SlaveComputer.java:768)
              at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
              at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              ... 1 more
      
      Jun 29, 2020 9:16:21 PM hudson.remoting.Request$2 run
      INFO: Failed to send back a reply to the request hudson.remoting.Request$2@6ed67717: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@71ab9337:JNLP4-connect connection to SERVERHOST/SERVERADDR:50000": channel is already closed
      Jun 29, 2020 9:16:28 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
      INFO: Using E:\jenkins_agent\remoting as a remoting work directory
      Jun 29, 2020 9:16:28 PM org.jenkinsci.remoting.engine.WorkDirManager setupLogging
      INFO: Both error and output logs will be printed to E:\jenkins_agent\remoting
      Jun 29, 2020 9:16:31 PM hudson.remoting.jnlp.Main createEngine
      INFO: Setting up agent: AGENTHOST
      Jun 29, 2020 9:16:31 PM hudson.remoting.jnlp.Main$CuiListener <init>
      INFO: Jenkins agent is running in headless mode.
      Jun 29, 2020 9:16:31 PM hudson.remoting.Engine startEngine
      INFO: Using Remoting version: 4.3
      Jun 29, 2020 9:16:31 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
      INFO: Using E:\jenkins_agent\remoting as a remoting work directory
      Jun 29, 2020 9:16:31 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Locating server among [https://SERVERHOST:8083/]
      Jun 29, 2020 9:16:31 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve
      INFO: Remoting server accepts the following protocols: [JNLP4-connect, Ping]
      Jun 29, 2020 9:16:31 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Agent discovery successful
        Agent address: SERVERHOST
        Agent port:    50000
        Identity:      07:d6:1b:ab:77:2a:99:d2:bd:bd:06:17:f7:03:2e:ed
      Jun 29, 2020 9:16:31 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Handshaking
      Jun 29, 2020 9:16:31 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Connecting to SERVERHOST:50000
      Jun 29, 2020 9:16:31 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Trying protocol: JNLP4-connect
      Jun 29, 2020 9:16:31 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Remote identity confirmed: 07:d6:1b:ab:77:2a:99:d2:bd:bd:06:17:f7:03:2e:ed
      Jun 29, 2020 9:16:33 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Connected
      

      A 2020-06-26 agent output log mentions a Java heap dump,

      [thread 4380 also had an error]
      [thread 9376 also had an error]
      
      [error occurred during error reporting (null), id 0xc0000005]
      #
      # There is insufficient memory for the Java Runtime Environment to continue.
      # Native memory allocation (malloc) failed to allocate 165776 bytes for Chunk::new
      # An error report file with more information is saved as:
      # E:\jenkins_agent\hs_err_pid13852.log
      #
      # Compiler replay data is saved as:
      # E:\jenkins_agent\replay_pid13852.log
      An unrecoverable stack overflow has occurred.
      An unrecoverable stack overflow has occurred.
      [thread 8496 also had an error]
      

      I will attach the dump log and err files.

      The jenkins_agent.xml file (minus some commented out cruft) follows.

      <service>
        <id>jenkins_agent</id>
        <name>jenkins_agent</name>
        <description>This service runs an agent for Jenkins automation server.</description>
        <executable>e:\java8\jre\bin\java.exe</executable>
        <arguments>-Xrs  -jar "%BASE%\agent.jar" -jnlpUrl https://SERVERHOST:8083/computer/AGENTHOST/slave-agent.jnlp -secret @"%BASE%\secret-file" -workDir "%BASE%"</arguments>
        <logmode>rotate</logmode>
        <onfailure action="restart" />
        <extensions>
          <!-- This is a sample configuration for the RunawayProcessKiller extension. -->
          <extension enabled="true"
                     className="winsw.Plugins.RunawayProcessKiller.RunawayProcessKillerExtension"
                     id="killOnStartup">
            <pidfile>%BASE%\jenkins_agent.pid</pidfile>
            <stopTimeout>5000</stopTimeout>
            <stopParentFirst>false</stopParentFirst>
          </extension>
        </extensions>
      
        <!-- See referenced examples for more options -->
      </service>
      

        Attachments

        1. hs_err_pid13852.log
          32 kB
        2. remoting.log.1
          1 kB
        3. remoting.log.2
          13 kB
        4. replay_pid13852.log
          219 kB

          Activity

          Hide
          ilatypov Ilguiz Latypov added a comment - - edited

          Seeing my previous channel exceptions disrupting Fortify at ~5 hours and Oleg Nenashev's slide deck on remoting, I thought I would be a smart cookie by adding a short (100s) ping interval parameter to my server's startup java command line in JAVA_OPTS. It did show up in the process list after my server restart, but the 5hr failure occurred again, and I started this ticket.

          -Dhudson.slaves.ChannelPinger.pingIntervalSeconds=100
          
          Show
          ilatypov Ilguiz Latypov added a comment - - edited Seeing my previous channel exceptions disrupting Fortify at ~5 hours and Oleg Nenashev's slide deck on remoting, I thought I would be a smart cookie by adding a short (100s) ping interval parameter to my server's startup java command line in JAVA_OPTS. It did show up in the process list after my server restart, but the 5hr failure occurred again, and I started this ticket. -Dhudson.slaves.ChannelPinger.pingIntervalSeconds=100
          Hide
          jthompson Jeff Thompson added a comment -

          The OutOfMemoryError is significant and is the first thing to investigate and resolve.

          Are you using WebSockets? Otherwise there are no significant changes in the 4.3 version. What Agent / Remoting versions did you upgrade from?

          Given the OutOfMemoryError I doubt this is a problem with Remoting. As you mention, it could be an issue with your system configuration or utilization. These can also crop up from from the behavior of plugins or build scripts.

          Show
          jthompson Jeff Thompson added a comment - The OutOfMemoryError is significant and is the first thing to investigate and resolve. Are you using WebSockets? Otherwise there are no significant changes in the 4.3 version. What Agent / Remoting versions did you upgrade from? Given the OutOfMemoryError I doubt this is a problem with Remoting. As you mention, it could be an issue with your system configuration or utilization. These can also crop up from from the behavior of plugins or build scripts.
          Hide
          jthompson Jeff Thompson added a comment -

          The behavior of the ping thread can be weird. As Oleg recommended in one of his documentation things a few years back, if you change it and it doesn't resolve an issue, you should change it back.

          Show
          jthompson Jeff Thompson added a comment - The behavior of the ping thread can be weird. As Oleg recommended in one of his documentation things a few years back, if you change it and it doesn't resolve an issue, you should change it back.
          Hide
          ilatypov Ilguiz Latypov added a comment - - edited

          One of these changes got the channel stable again. The channel sustained the 18+ hours job.

          • Adding -Xms256M -Xmx512M to <arguments> beside -Xrs in jenkins_agent.xml and restarting the jenkins agent Windows service. (Thanks James Fairweather's comment in JENKINS-44132).
          • Turning up the verbosity of the mute command in the job.
          • We have instances of an unrelated Java application running from time to time on the same machine with the agent. Using that application without an allocation limit is known to generate own "out of memory" exceptions and showing significant (a few GB) consumption of memory in ProcessExplorer. So we exited the application instances, set their limit with -Xmx2G and restarted them.

          I am not aware of WebSockets. As for the agent version, its time stamp shows 2020-05-05 and its version has 4.3. The channel sustained a long build on 2020-05-21. The agent config history begins and ends on 2020-05-05. I did not have an automatic agent download option enabled in jenkins_agent.xml. The update history or an option to revert updates to a point in time is not directly visible in the Manage Jenkins pages.

          Show
          ilatypov Ilguiz Latypov added a comment - - edited One of these changes got the channel stable again. The channel sustained the 18+ hours job. Adding -Xms256M -Xmx512M to <arguments> beside -Xrs in jenkins_agent.xml and restarting the jenkins agent Windows service. (Thanks James Fairweather's comment in JENKINS-44132 ). Turning up the verbosity of the mute command in the job. We have instances of an unrelated Java application running from time to time on the same machine with the agent. Using that application without an allocation limit is known to generate own "out of memory" exceptions and showing significant (a few GB) consumption of memory in ProcessExplorer. So we exited the application instances, set their limit with -Xmx2G and restarted them. I am not aware of WebSockets. As for the agent version, its time stamp shows 2020-05-05 and its version has 4.3. The channel sustained a long build on 2020-05-21. The agent config history begins and ends on 2020-05-05. I did not have an automatic agent download option enabled in jenkins_agent.xml . The update history or an option to revert updates to a point in time is not directly visible in the Manage Jenkins pages.
          Hide
          jthompson Jeff Thompson added a comment -

          I glad you were able to resolve that.

          Show
          jthompson Jeff Thompson added a comment - I glad you were able to resolve that.

            People

            Assignee:
            jthompson Jeff Thompson
            Reporter:
            ilatypov Ilguiz Latypov
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: