-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
Environment:
AWS/ECS - EC2 mode
Jenkins controller runs as a container in ECS
Jenkins: 2.346.2
OS: Linux - 4.14.281-144.502.amzn1.x86_64
---
plugin version
amazon-ecs:1.41
====================================
Jenkins agents are launched via the ecs plugin
Remoting version: 4.13.2
Launcher: ECSLauncher
Communication Protocol: JNLP4-connect
Both agent and controller run in the same ECS cluster and the agents communicate with the controller via a Network Load Balancer listening on port 50000Environment: AWS/ECS - EC2 mode Jenkins controller runs as a container in ECS Jenkins: 2.346.2 OS: Linux - 4.14.281-144.502.amzn1.x86_64 --- plugin version amazon-ecs:1.41 ==================================== Jenkins agents are launched via the ecs plugin Remoting version: 4.13.2 Launcher: ECSLauncher Communication Protocol: JNLP4-connect Both agent and controller run in the same ECS cluster and the agents communicate with the controller via a Network Load Balancer listening on port 50000
The jenkins agents launched on ECS via the ecs plugin go offline randomly. This issue can happen once a day or 2-3 times per week, it is very random. Sometimes all the agents can go offline at the same time, preventing any builds running from Jenkins.
See below the logs from one of the agent that went offline
Inbound agent connected from ip-xx-xx-xx-xx.eu-west-1.compute.internal/xx.xx.xx.xx:46260 Remoting version: 4.13.2 Launcher: ECSLauncher Communication Protocol: JNLP4-connect This is a Unix agent Waiting for agent to connect: ecs-cloud-ecs-zmvlb Agent successfully connected and online ERROR: Failed to monitor for Architecture java.util.concurrent.TimeoutException at hudson.remoting.Request$1.get(Request.java:321) at hudson.remoting.Request$1.get(Request.java:240) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76) at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305) ERROR: Failed to monitor for Free Disk Space java.util.concurrent.TimeoutException at hudson.remoting.Request$1.get(Request.java:321) at hudson.remoting.Request$1.get(Request.java:240) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76) at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305) ERROR: Failed to monitor for Clock Difference java.util.concurrent.TimeoutException at hudson.remoting.Request$1.get(Request.java:321) at hudson.remoting.Request$1.get(Request.java:240) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76) at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305) ERROR: Failed to monitor for Free Swap Space java.util.concurrent.TimeoutException at hudson.remoting.Request$1.get(Request.java:321) at hudson.remoting.Request$1.get(Request.java:240) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76) at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305) ERROR: Failed to monitor for Response Time java.util.concurrent.TimeoutException at hudson.remoting.Request$1.get(Request.java:321) at hudson.remoting.Request$1.get(Request.java:240) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112) at hudson.node_monitors.ResponseTimeMonitor$1.monitor(ResponseTimeMonitor.java:56) at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305) ERROR: Failed to monitor for Free Temp Space java.util.concurrent.TimeoutException at hudson.remoting.Request$1.get(Request.java:321) at hudson.remoting.Request$1.get(Request.java:240) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76) at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305) ERROR: Failed to monitor for Architecture java.util.concurrent.TimeoutException at hudson.remoting.Request$1.get(Request.java:321) at hudson.remoting.Request$1.get(Request.java:240) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76) at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305) ERROR: Failed to monitor for Clock Difference java.util.concurrent.TimeoutException at hudson.remoting.Request$1.get(Request.java:321) at hudson.remoting.Request$1.get(Request.java:240) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76) at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305) ERROR: Failed to monitor for Free Disk Space java.util.concurrent.TimeoutException at hudson.remoting.Request$1.get(Request.java:321) at hudson.remoting.Request$1.get(Request.java:240) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76) at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305) ERROR: Failed to monitor for Free Swap Space java.util.concurrent.TimeoutException at hudson.remoting.Request$1.get(Request.java:321) at hudson.remoting.Request$1.get(Request.java:240) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76) at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305) ERROR: Failed to monitor for Free Temp Space ERROR: Failed to monitor for Response Time java.util.concurrent.TimeoutException at hudson.remoting.Request$1.get(Request.java:321) at hudson.remoting.Request$1.get(Request.java:240) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76) at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305) java.util.concurrent.TimeoutException at hudson.remoting.Request$1.get(Request.java:321) at hudson.remoting.Request$1.get(Request.java:240) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112) at hudson.node_monitors.ResponseTimeMonitor$1.monitor(ResponseTimeMonitor.java:56) at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305) ERROR: Failed to monitor for Architecture java.util.concurrent.TimeoutException at hudson.remoting.Request$1.get(Request.java:321) at hudson.remoting.Request$1.get(Request.java:240) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76) at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305) ERROR: Failed to monitor for Free Temp Space ERROR: ERROR: Failed to monitor for Clock Difference java.util.concurrent.TimeoutException at hudson.remoting.Request$1.get(Request.java:321) at hudson.remoting.Request$1.get(Request.java:240) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76) at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305) ERROR: Failed to monitor for Response Time java.util.concurrent.TimeoutException at hudson.remoting.Request$1.get(Request.java:321) at hudson.remoting.Request$1.get(Request.java:240) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76) at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305) Failed to monitor for Free Disk Space java.util.concurrent.TimeoutException at hudson.remoting.Request$1.get(Request.java:321) at hudson.remoting.Request$1.get(Request.java:240) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112) at hudson.node_monitors.ResponseTimeMonitor$1.monitor(ResponseTimeMonitor.java:56) at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305) java.util.concurrent.TimeoutException at hudson.remoting.Request$1.get(Request.java:321) at hudson.remoting.Request$1.get(Request.java:240) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76) at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305) ERROR: Failed to monitor for Free Swap Space java.util.concurrent.TimeoutException at hudson.remoting.Request$1.get(Request.java:321) at hudson.remoting.Request$1.get(Request.java:240) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76) at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305) ERROR: Connection terminated java.nio.channels.ClosedChannelException at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer.onReadClosed(ChannelApplicationLayer.java:240) at org.jenkinsci.remoting.protocol.ApplicationLayer.onRecvClosed(ApplicationLayer.java:221) at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:825) at org.jenkinsci.remoting.protocol.FilterLayer.onRecvClosed(FilterLayer.java:289) at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecvClosed(SSLEngineFilterLayer.java:177) at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.switchToNoSecure(SSLEngineFilterLayer.java:279) at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processWrite(SSLEngineFilterLayer.java:501) at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processQueuedWrites(SSLEngineFilterLayer.java:244) at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.doSend(SSLEngineFilterLayer.java:196) at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.doCloseSend(SSLEngineFilterLayer.java:209) at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.doCloseSend(ProtocolStack.java:793) at org.jenkinsci.remoting.protocol.ApplicationLayer.doCloseWrite(ApplicationLayer.java:172) at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer$ByteBufferCommandTransport.closeWrite(ChannelApplicationLayer.java:342) at hudson.remoting.Channel.close(Channel.java:1494) at hudson.remoting.Channel.close(Channel.java:1447) at hudson.slaves.SlaveComputer.closeChannel(SlaveComputer.java:923) at hudson.slaves.SlaveComputer.access$100(SlaveComputer.java:112) at hudson.slaves.SlaveComputer$2.run(SlaveComputer.java:803) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829)