Uploaded image for project: 'Infrastructure'
  1. Infrastructure
  2. INFRA-2548

ec2 ubuntu instance are disconnected after a while

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      ERROR: Failed to monitor for Free Disk Space
      java.util.concurrent.TimeoutException
      at hudson.remoting.Request$1.get(Request.java:316)
      at hudson.remoting.Request$1.get(Request.java:240)
      at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59)
      at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:114)
      at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:78)
      at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:306)
      ERROR: Failed to monitor for Free Temp Space
      java.util.concurrent.TimeoutException
      at hudson.remoting.Request$1.get(Request.java:316)
      at hudson.remoting.Request$1.get(Request.java:240)
      at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59)
      at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:114)
      at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:78)
      at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:306)
      ERROR: Failed to monitor for Architecture
      java.util.concurrent.TimeoutException
      at hudson.remoting.Request$1.get(Request.java:316)
      at hudson.remoting.Request$1.get(Request.java:240)
      at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59)
      at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:114)
      at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:78)
      at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:306)
      ERROR: Failed to monitor for Free Swap Space
      ERROR: java.util.concurrent.TimeoutException
      at hudson.remoting.Request$1.get(Request.java:316)
      at hudson.remoting.Request$1.get(Request.java:240)
      at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59)
      at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:114)
      at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:78)
      at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:306)

        Attachments

          Issue Links

            Activity

            Show
            olblak Olivier Vernin added a comment - https://gist.github.com/olblak/9f560a5fabc9b83279557fddc0388d99
            Hide
            jglick Jesse Glick added a comment -

            Recent PR builds in https://ci.jenkins.io/job/Plugins/job/mercurial-plugin/ are all failing with various errors on EC2 agents—channel not responding, out of memory running Docker, etc.

            Show
            jglick Jesse Glick added a comment - Recent PR builds in https://ci.jenkins.io/job/Plugins/job/mercurial-plugin/ are all failing with various errors on EC2 agents—channel not responding, out of memory running Docker, etc.
            Hide
            mramonleon Ramon Leon added a comment -

            I've seen the stack trace on the description many times testing a local master with ubuntu agents and it's not a cause of problems, only monitors are not working so far.

            Show
            mramonleon Ramon Leon added a comment - I've seen the stack trace on the description many times testing a local master with ubuntu agents and it's not a cause of problems, only monitors are not working so far.
            Hide
            jglick Jesse Glick added a comment -

            Basil Crow suggests that the root cause may be insufficient VM memory for all of the JVMs needed for typical builds.

            Show
            jglick Jesse Glick added a comment - Basil Crow suggests that the root cause may be insufficient VM memory for all of the JVMs needed for typical builds.
            Hide
            slide_o_mix Alex Earl added a comment -

            I believe Tim Jacomb increased the memory for the VM's, is this issue still happening?

            Show
            slide_o_mix Alex Earl added a comment - I believe Tim Jacomb increased the memory for the VM's, is this issue still happening?
            Hide
            jglick Jesse Glick added a comment -

            Not sure offhand. Would be nice to have some sort of metrics of how often agents of a particular label/type get low-level disconnection errors of this kind.

            Show
            jglick Jesse Glick added a comment - Not sure offhand. Would be nice to have some sort of metrics of how often agents of a particular label/type get low-level disconnection errors of this kind.
            Hide
            timja Tim Jacomb added a comment -

            Not sure about disconnections but broken connections are still happening on high mem and windows. I haven’t seen any on regular Linux since the changes though

            Show
            timja Tim Jacomb added a comment - Not sure about disconnections but broken connections are still happening on high mem and windows. I haven’t seen any on regular Linux since the changes though

              People

              Assignee:
              naveenboni Naveen Boni
              Reporter:
              olblak Olivier Vernin
              Votes:
              4 Vote for this issue
              Watchers:
              8 Start watching this issue

                Dates

                Created:
                Updated: