-
Bug
-
Resolution: Unresolved
-
Major
-
Jenkins 2.249.1
Amazon EC2 plugin 1.53
-
Powered by SuggestiMate
Similar to https://issues.jenkins-ci.org/browse/JENKINS-61314 (I'll link it)
I'm finding my EC2 Agents sometimes lose their connection during a large unstash (don't ask why. I'm working on that).
At that point I the Agent remains offline. I can see on the Agent side it's trying to ping the server every 5 minutes without success.
I'll add some logs in the comments
[JENKINS-63778] Windows agents drop connection on large unstash
Agent log (from the server)
EC2 (AWS Dev Account) - Windows Test Executor (i-056d044cb9d06f9f9) booted at 1601008317000EC2 (AWS Dev Account) - Windows Test Executor (i-056d044cb9d06f9f9) booted at 1601008317000Connecting to (10.161.10.40) with WinRM as AdministratorWaiting for WinRM to come up. Sleeping 10s.Waiting for WinRM to come up. Sleeping 10s.Waiting for WinRM to come up. Sleeping 10s.WinRM service responded. Waiting for WinRM service to stabilize on EC2 (AWS Dev Account) - Windows Test Executor (i-056d044cb9d06f9f9)WinRM should now be ok on EC2 (AWS Dev Account) - Windows Test Executor (i-056d044cb9d06f9f9)Connected with WinRM.Creating tmp directory if it does not existremoting.jar sent remotely. Bootstrapping itLaunching via WinRM:java -Xmx2048m -jar C:\Windows\Temp\remoting.jar -workDir c:\jenkins<===[JENKINS REMOTING CAPACITY]===>Remoting version: 4.5This is a Windows agentERROR: Failed to monitor for Free Swap Spacejava.util.concurrent.TimeoutException at hudson.remoting.Request$1.get(Request.java:320) at hudson.remoting.Request$1.get(Request.java:239) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:64) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:114) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:78) at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:306)ERROR: Failed to monitor for Free Temp Spacejava.util.concurrent.TimeoutException at hudson.remoting.Request$1.get(Request.java:320) at hudson.remoting.Request$1.get(Request.java:239) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:64) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:114) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:78) at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:306)ERROR: Failed to monitor for Free Disk Spacejava.util.concurrent.TimeoutException at hudson.remoting.Request$1.get(Request.java:320) at hudson.remoting.Request$1.get(Request.java:239) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:64) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:114) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:78) at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:306)Agent successfully connected and onlineERROR: Connection terminatedhudson.remoting.FastPipedInputStream$ClosedBy: The pipe was closed at... at hudson.remoting.FastPipedOutputStream.error(FastPipedOutputStream.java:101) at hudson.remoting.FastPipedOutputStream.close(FastPipedOutputStream.java:90) at hudson.plugins.ec2.util.Closeables.closeQuietly(Closeables.java:23) at hudson.plugins.ec2.win.winrm.WindowsProcess$2.run(WindowsProcess.java:146)Caused: java.io.IOException: Pipe is already closed at hudson.remoting.FastPipedOutputStream.write(FastPipedOutputStream.java:156) at hudson.remoting.FastPipedOutputStream.write(FastPipedOutputStream.java:140) at hudson.remoting.ChunkedOutputStream.sendFrame(ChunkedOutputStream.java:89) at hudson.remoting.ChunkedOutputStream.drain(ChunkedOutputStream.java:85) at hudson.remoting.ChunkedOutputStream.write(ChunkedOutputStream.java:54) at java.base/java.io.OutputStream.write(OutputStream.java:122) at hudson.remoting.ChunkedCommandTransport.writeBlock(ChunkedCommandTransport.java:45) at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.write(AbstractSynchronousByteArrayCommandTransport.java:46) at hudson.remoting.Channel.send(Channel.java:766) at hudson.remoting.Channel.close(Channel.java:1488) at hudson.remoting.Channel.close(Channel.java:1455) at hudson.slaves.SlaveComputer.closeChannel(SlaveComputer.java:874) at hudson.slaves.SlaveComputer.access$100(SlaveComputer.java:110) at hudson.slaves.SlaveComputer$2.run(SlaveComputer.java:765) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834)ERROR: Connection terminatedjava.io.EOFException at java.base/java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2842) at java.base/java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3337) at java.base/java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:925) at java.base/java.io.ObjectInputStream.<init>(ObjectInputStream.java:368) at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49) at hudson.remoting.Command.readFrom(Command.java:142) at hudson.remoting.Command.readFrom(Command.java:128) at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:35) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)Caused: java.io.IOException: Unexpected termination of the channel at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)
Log on the Agent side
{{Sep 24, 2020 12:04:11 PM hudson.slaves.ChannelPinger$1 onDead
INFO: Ping failed. Terminating the channel channel.
java.util.concurrent.TimeoutException: Ping started at 1600912811820 hasn't completed by 1600913051827
at hudson.remoting.PingThread.ping(PingThread.java:134)
at hudson.remoting.PingThread.run(PingThread.java:90)Sep 24, 2020 12:09:11 PM hudson.slaves.ChannelPinger$1 onDead
INFO: Ping failed. Terminating the channel channel.
java.util.concurrent.TimeoutException: Ping started at 1600913111821 hasn't completed by 1600913351832
at hudson.remoting.PingThread.ping(PingThread.java:134)
at hudson.remoting.PingThread.run(PingThread.java:90)Sep 24, 2020 12:14:11 PM hudson.slaves.ChannelPinger$1 onDead
INFO: Ping failed. Terminating the channel channel.
java.util.concurrent.TimeoutException: Ping started at 1600913411822 hasn't completed by 1600913651827
at hudson.remoting.PingThread.ping(PingThread.java:134)
at hudson.remoting.PingThread.run(PingThread.java:90)Sep 24, 2020 12:19:11 PM hudson.slaves.ChannelPinger$1 onDead
INFO: Ping failed. Terminating the channel channel.
java.util.concurrent.TimeoutException: Ping started at 1600913711822 hasn't completed by 1600913951829
at hudson.remoting.PingThread.ping(PingThread.java:134)
at hudson.remoting.PingThread.run(PingThread.java:90)Sep 24, 2020 12:24:11 PM hudson.slaves.ChannelPinger$1 onDead
INFO: Ping failed. Terminating the channel channel.
java.util.concurrent.TimeoutException: Ping started at 1600914011822 hasn't completed by 1600914251828
at hudson.remoting.PingThread.ping(PingThread.java:134)
at hudson.remoting.PingThread.run(PingThread.java:90)Sep 24, 2020 12:29:11 PM hudson.slaves.ChannelPinger$1 onDead
INFO: Ping failed. Terminating the channel channel.
java.util.concurrent.TimeoutException: Ping started at 1600914311823 hasn't completed by 1600914551837
at hudson.remoting.PingThread.ping(PingThread.java:134)
at hudson.remoting.PingThread.run(PingThread.java:90)Sep 24, 2020 12:34:11 PM hudson.slaves.ChannelPinger$1 onDead
INFO: Ping failed. Terminating the channel channel.
java.util.concurrent.TimeoutException: Ping started at 1600914611824 hasn't completed by 1600914851830
at hudson.remoting.PingThread.ping(PingThread.java:134)
at hudson.remoting.PingThread.run(PingThread.java:90)Sep 24, 2020 12:39:11 PM hudson.slaves.ChannelPinger$1 onDead
INFO: Ping failed. Terminating the channel channel.
java.util.concurrent.TimeoutException: Ping started at 1600914911824 hasn't completed by 1600915151831
at hudson.remoting.PingThread.ping(PingThread.java:134)
at hudson.remoting.PingThread.run(PingThread.java:90)Sep 24, 2020 12:44:11 PM hudson.slaves.ChannelPinger$1 onDead
INFO: Ping failed. Terminating the channel channel.
java.util.concurrent.TimeoutException: Ping started at 1600915211824 hasn't completed by 1600915451831
at hudson.remoting.PingThread.ping(PingThread.java:134)
at hudson.remoting.PingThread.run(PingThread.java:90)Sep 24, 2020 12:49:11 PM hudson.slaves.ChannelPinger$1 onDead
INFO: Ping failed. Terminating the channel channel.
java.util.concurrent.TimeoutException: Ping started at 1600915511824 hasn't completed by 1600915751832
at hudson.remoting.PingThread.ping(PingThread.java:134)
at hudson.remoting.PingThread.run(PingThread.java:90)
We are experiencing the same issue very frequently when unstashing large number of files. We use the Artifact Manager on S3 plugin and we run our workload through kubernetes
We are also experiencing this issue since the recent Jenkins upgrade. Any quick help would be appreciated, thanks!
Example failure from job console
[Pipeline] unstash[Pipeline] }[Pipeline] // timestamps[Pipeline] cleanWsEC2 (AWS Dev Account) - Windows Test Executor (i-056d044cb9d06f9f9) was marked offline: Connection was broken: java.io.EOFException
at java.base/java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2842)
at java.base/java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3337)
at java.base/java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:925)
at java.base/java.io.ObjectInputStream.<init>(ObjectInputStream.java:368)
at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49)
at hudson.remoting.Command.readFrom(Command.java:142)
at hudson.remoting.Command.readFrom(Command.java:128)
at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:35)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
Caused: java.io.IOException: Unexpected termination of the channel
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)[Pipeline] }[Pipeline] // stage[Pipeline] }[Pipeline] // node[Pipeline] End of Pipeline[Checks API] No suitable checks publisher found.
Also: hudson.remoting.ProxyException: hudson.model.Computer$TerminationRequest: Termination requested at Fri Sep 25 16:28:40 AEST 2020 by Thread[Ping thread for channel hudson.remoting.Channel@68806821:EC2 (AWS Dev Account) - Windows Test Executor (i-056d044cb9d06f9f9),5,main] [id=7700]
at hudson.model.Computer.recordTermination(Computer.java:229)
at hudson.model.Computer.disconnect(Computer.java:495)
at hudson.slaves.SlaveComputer.disconnect(SlaveComputer.java:759)
at hudson.slaves.ChannelPinger$1.onDead(ChannelPinger.java:198)
at hudson.remoting.PingThread.run(PingThread.java:101)
hudson.remoting.ProxyException: java.io.IOException: Unable to create live FilePath for EC2 (AWS Dev Account) - Windows Test Executor (i-056d044cb9d06f9f9)
at org.jenkinsci.plugins.workflow.support.steps.FilePathDynamicContext.get(FilePathDynamicContext.java:64)
at org.jenkinsci.plugins.workflow.support.steps.FilePathDynamicContext.get(FilePathDynamicContext.java:47)
at org.jenkinsci.plugins.workflow.steps.DynamicContext$Typed.get(DynamicContext.java:94)
at org.jenkinsci.plugins.workflow.cps.ContextVariableSet.get(ContextVariableSet.java:138)
at org.jenkinsci.plugins.workflow.cps.CpsThread.getContextVariable(CpsThread.java:135)
at org.jenkinsci.plugins.workflow.cps.CpsStepContext.doGet(CpsStepContext.java:297)
at org.jenkinsci.plugins.workflow.support.DefaultStepContext.get(DefaultStepContext.java:67)
at org.jenkinsci.plugins.workflow.steps.StepDescriptor.checkContextAvailability(StepDescriptor.java:264)
at org.jenkinsci.plugins.workflow.cps.DSL.invokeStep(DSL.java:263)
Caused: hudson.remoting.ProxyException: org.codehaus.groovy.runtime.InvokerInvocationException: java.io.IOException: Unable to create live FilePath for EC2 (AWS Dev Account) - Windows Test Executor (i-056d044cb9d06f9f9)
at org.jenkinsci.plugins.workflow.cps.CpsStepContext.replay(CpsStepContext.java:496)
at org.jenkinsci.plugins.workflow.cps.DSL.invokeStep(DSL.java:317)
at org.jenkinsci.plugins.workflow.cps.DSL.invokeDescribable(DSL.java:417)
at org.jenkinsci.plugins.workflow.cps.DSL.invokeMethod(DSL.java:182)
at org.jenkinsci.plugins.workflow.cps.CpsScript.invokeMethod(CpsScript.java:122)
at jdk.internal.reflect.GeneratedMethodAccessor518.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1213)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1022)
at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:42)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)
at org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:163)
at org.kohsuke.groovy.sandbox.GroovyInterceptor.onMethodCall(GroovyInterceptor.java:23)
at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxInterceptor.onMethodCall(SandboxInterceptor.java:157)
at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxInterceptor.onMethodCall(SandboxInterceptor.java:142)
at org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:161)
at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:165)
at com.cloudbees.groovy.cps.sandbox.SandboxInvoker.methodCall(SandboxInvoker.java:17)
Caused: hudson.remoting.ProxyException: java.lang.IllegalArgumentException: Failed to prepare cleanWs step
at org.jenkinsci.plugins.workflow.cps.DSL.invokeDescribable(DSL.java:419)
at org.jenkinsci.plugins.workflow.cps.DSL.invokeMethod(DSL.java:182)
at org.jenkinsci.plugins.workflow.cps.CpsScript.invokeMethod(CpsScript.java:122)
at jdk.internal.reflect.GeneratedMethodAccessor518.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1213)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1022)
at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:42)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)
at org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:163)
at org.kohsuke.groovy.sandbox.GroovyInterceptor.onMethodCall(GroovyInterceptor.java:23)
at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxInterceptor.onMethodCall(SandboxInterceptor.java:157)
at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxInterceptor.onMethodCall(SandboxInterceptor.java:142)
at org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:161)
at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:165)
at com.cloudbees.groovy.cps.sandbox.SandboxInvoker.methodCall(SandboxInvoker.java:17)
at WorkflowScript.withWorkspace(WorkflowScript:551)
at WorkflowScript.withWorkspaceFromStash(WorkflowScript:565)
at WorkflowScript.run(WorkflowScript:342)
at __cps.transform__(Native Method)
at com.cloudbees.groovy.cps.impl.ContinuationGroup.methodCall(ContinuationGroup.java:86)
at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispatchOrArg(FunctionCallBlock.java:113)
at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixName(FunctionCallBlock.java:78)
at jdk.internal.reflect.GeneratedMethodAccessor510.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)
at com.cloudbees.groovy.cps.impl.ConstantBlock.eval(ConstantBlock.java:21)
at com.cloudbees.groovy.cps.Next.step(Next.java:83)
at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:174)
at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:163)
at org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:129)
at org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:268)
at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:163)
at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$001(SandboxContinuable.java:18)
at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:51)
at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:185)
at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:400)
at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$400(CpsThreadGroup.java:96)
at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:312)
at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:276)
at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:67)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:136)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Finished: FAILURE