Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-69286

ECS nodes go offline

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • amazon-ecs-plugin
    • None

      The jenkins agents launched on ECS via the ecs plugin go offline randomly. This issue can happen once a day or 2-3 times per week, it is very random. Sometimes all the agents can go offline at the same time, preventing any builds running from Jenkins.
      See below the logs from one of the agent that went offline

       

      Inbound agent connected from ip-xx-xx-xx-xx.eu-west-1.compute.internal/xx.xx.xx.xx:46260
      Remoting version: 4.13.2
      Launcher: ECSLauncher
      Communication Protocol: JNLP4-connect
      This is a Unix agent
      Waiting for agent to connect: ecs-cloud-ecs-zmvlb
      Agent successfully connected and online
      ERROR: Failed to monitor for Architecture
      java.util.concurrent.TimeoutException
      	at hudson.remoting.Request$1.get(Request.java:321)
      	at hudson.remoting.Request$1.get(Request.java:240)
      	at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66)
      	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112)
      	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76)
      	at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305)
      ERROR: Failed to monitor for Free Disk Space
      java.util.concurrent.TimeoutException
      	at hudson.remoting.Request$1.get(Request.java:321)
      	at hudson.remoting.Request$1.get(Request.java:240)
      	at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66)
      	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112)
      	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76)
      	at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305)
      ERROR: Failed to monitor for Clock Difference
      java.util.concurrent.TimeoutException
      	at hudson.remoting.Request$1.get(Request.java:321)
      	at hudson.remoting.Request$1.get(Request.java:240)
      	at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66)
      	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112)
      	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76)
      	at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305)
      ERROR: Failed to monitor for Free Swap Space
      java.util.concurrent.TimeoutException
      	at hudson.remoting.Request$1.get(Request.java:321)
      	at hudson.remoting.Request$1.get(Request.java:240)
      	at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66)
      	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112)
      	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76)
      	at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305)
      ERROR: Failed to monitor for Response Time
      java.util.concurrent.TimeoutException
      	at hudson.remoting.Request$1.get(Request.java:321)
      	at hudson.remoting.Request$1.get(Request.java:240)
      	at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66)
      	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112)
      	at hudson.node_monitors.ResponseTimeMonitor$1.monitor(ResponseTimeMonitor.java:56)
      	at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305)
      ERROR: Failed to monitor for Free Temp Space
      java.util.concurrent.TimeoutException
      	at hudson.remoting.Request$1.get(Request.java:321)
      	at hudson.remoting.Request$1.get(Request.java:240)
      	at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66)
      	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112)
      	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76)
      	at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305)
      ERROR: Failed to monitor for Architecture
      java.util.concurrent.TimeoutException
      	at hudson.remoting.Request$1.get(Request.java:321)
      	at hudson.remoting.Request$1.get(Request.java:240)
      	at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66)
      	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112)
      	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76)
      	at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305)
      ERROR: Failed to monitor for Clock Difference
      java.util.concurrent.TimeoutException
      	at hudson.remoting.Request$1.get(Request.java:321)
      	at hudson.remoting.Request$1.get(Request.java:240)
      	at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66)
      	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112)
      	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76)
      	at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305)
      ERROR: Failed to monitor for Free Disk Space
      java.util.concurrent.TimeoutException
      	at hudson.remoting.Request$1.get(Request.java:321)
      	at hudson.remoting.Request$1.get(Request.java:240)
      	at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66)
      	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112)
      	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76)
      	at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305)
      ERROR: Failed to monitor for Free Swap Space
      java.util.concurrent.TimeoutException
      	at hudson.remoting.Request$1.get(Request.java:321)
      	at hudson.remoting.Request$1.get(Request.java:240)
      	at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66)
      	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112)
      	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76)
      	at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305)
      ERROR: Failed to monitor for Free Temp Space
      ERROR: Failed to monitor for Response Time
      java.util.concurrent.TimeoutException
      	at hudson.remoting.Request$1.get(Request.java:321)
      	at hudson.remoting.Request$1.get(Request.java:240)
      	at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66)
      	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112)
      	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76)
      	at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305)
      java.util.concurrent.TimeoutException
      	at hudson.remoting.Request$1.get(Request.java:321)
      	at hudson.remoting.Request$1.get(Request.java:240)
      	at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66)
      	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112)
      	at hudson.node_monitors.ResponseTimeMonitor$1.monitor(ResponseTimeMonitor.java:56)
      	at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305)
      ERROR: Failed to monitor for Architecture
      java.util.concurrent.TimeoutException
      	at hudson.remoting.Request$1.get(Request.java:321)
      	at hudson.remoting.Request$1.get(Request.java:240)
      	at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66)
      	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112)
      	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76)
      	at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305)
      ERROR: Failed to monitor for Free Temp Space
      ERROR: ERROR: Failed to monitor for Clock Difference
      java.util.concurrent.TimeoutException
      	at hudson.remoting.Request$1.get(Request.java:321)
      	at hudson.remoting.Request$1.get(Request.java:240)
      	at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66)
      	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112)
      	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76)
      	at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305)
      ERROR: Failed to monitor for Response Time
      java.util.concurrent.TimeoutException
      	at hudson.remoting.Request$1.get(Request.java:321)
      	at hudson.remoting.Request$1.get(Request.java:240)
      	at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66)
      	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112)
      	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76)
      	at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305)
      Failed to monitor for Free Disk Space
      java.util.concurrent.TimeoutException
      	at hudson.remoting.Request$1.get(Request.java:321)
      	at hudson.remoting.Request$1.get(Request.java:240)
      	at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66)
      	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112)
      	at hudson.node_monitors.ResponseTimeMonitor$1.monitor(ResponseTimeMonitor.java:56)
      	at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305)
      java.util.concurrent.TimeoutException
      	at hudson.remoting.Request$1.get(Request.java:321)
      	at hudson.remoting.Request$1.get(Request.java:240)
      	at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66)
      	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112)
      	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76)
      	at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305)
      ERROR: Failed to monitor for Free Swap Space
      java.util.concurrent.TimeoutException
      	at hudson.remoting.Request$1.get(Request.java:321)
      	at hudson.remoting.Request$1.get(Request.java:240)
      	at hudson.remoting.FutureAdapter.get(FutureAdapter.java:66)
      	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:112)
      	at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76)
      	at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305)
      ERROR: Connection terminated
      java.nio.channels.ClosedChannelException
      	at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer.onReadClosed(ChannelApplicationLayer.java:240)
      	at org.jenkinsci.remoting.protocol.ApplicationLayer.onRecvClosed(ApplicationLayer.java:221)
      	at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:825)
      	at org.jenkinsci.remoting.protocol.FilterLayer.onRecvClosed(FilterLayer.java:289)
      	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecvClosed(SSLEngineFilterLayer.java:177)
      	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.switchToNoSecure(SSLEngineFilterLayer.java:279)
      	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processWrite(SSLEngineFilterLayer.java:501)
      	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processQueuedWrites(SSLEngineFilterLayer.java:244)
      	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.doSend(SSLEngineFilterLayer.java:196)
      	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.doCloseSend(SSLEngineFilterLayer.java:209)
      	at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.doCloseSend(ProtocolStack.java:793)
      	at org.jenkinsci.remoting.protocol.ApplicationLayer.doCloseWrite(ApplicationLayer.java:172)
      	at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer$ByteBufferCommandTransport.closeWrite(ChannelApplicationLayer.java:342)
      	at hudson.remoting.Channel.close(Channel.java:1494)
      	at hudson.remoting.Channel.close(Channel.java:1447)
      	at hudson.slaves.SlaveComputer.closeChannel(SlaveComputer.java:923)
      	at hudson.slaves.SlaveComputer.access$100(SlaveComputer.java:112)
      	at hudson.slaves.SlaveComputer$2.run(SlaveComputer.java:803)
      	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
      	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)
      	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
      	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
      	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
      	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
      	at java.base/java.lang.Thread.run(Thread.java:829)
      
      
      
      

       

            roehrijn2 Jan Roehrich
            saalrdc Alex Gay
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: