Uploaded image for project: 'Infrastructure'
  1. Infrastructure
  2. INFRA-2548

ec2 ubuntu instance are disconnected after a while

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      ERROR: Failed to monitor for Free Disk Space
      java.util.concurrent.TimeoutException
      at hudson.remoting.Request$1.get(Request.java:316)
      at hudson.remoting.Request$1.get(Request.java:240)
      at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59)
      at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:114)
      at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:78)
      at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:306)
      ERROR: Failed to monitor for Free Temp Space
      java.util.concurrent.TimeoutException
      at hudson.remoting.Request$1.get(Request.java:316)
      at hudson.remoting.Request$1.get(Request.java:240)
      at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59)
      at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:114)
      at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:78)
      at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:306)
      ERROR: Failed to monitor for Architecture
      java.util.concurrent.TimeoutException
      at hudson.remoting.Request$1.get(Request.java:316)
      at hudson.remoting.Request$1.get(Request.java:240)
      at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59)
      at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:114)
      at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:78)
      at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:306)
      ERROR: Failed to monitor for Free Swap Space
      ERROR: java.util.concurrent.TimeoutException
      at hudson.remoting.Request$1.get(Request.java:316)
      at hudson.remoting.Request$1.get(Request.java:240)
      at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59)
      at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:114)
      at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:78)
      at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:306)

        Attachments

          Issue Links

            Activity

            Hide
            slide_o_mix Alex Earl added a comment -

            I believe Tim Jacomb increased the memory for the VM's, is this issue still happening?

            Show
            slide_o_mix Alex Earl added a comment - I believe Tim Jacomb increased the memory for the VM's, is this issue still happening?
            Hide
            jglick Jesse Glick added a comment -

            Not sure offhand. Would be nice to have some sort of metrics of how often agents of a particular label/type get low-level disconnection errors of this kind.

            Show
            jglick Jesse Glick added a comment - Not sure offhand. Would be nice to have some sort of metrics of how often agents of a particular label/type get low-level disconnection errors of this kind.
            Hide
            timja Tim Jacomb added a comment -

            Not sure about disconnections but broken connections are still happening on high mem and windows. I haven’t seen any on regular Linux since the changes though

            Show
            timja Tim Jacomb added a comment - Not sure about disconnections but broken connections are still happening on high mem and windows. I haven’t seen any on regular Linux since the changes though
            Hide
            dduportal Damien Duportal added a comment -

            For information, this issue is still happening on ci.jenkins.io, on EC2 agents, but also ACI agents.

            Show
            dduportal Damien Duportal added a comment - For information, this issue is still happening on ci.jenkins.io, on EC2 agents, but also ACI agents.
            Hide
            dduportal Damien Duportal added a comment -

            First, we switched ACI agents to Kubernetes, and we do not see the error anymore for theses cases (ref. INFRA-2918)

            It seems that this issue did not happen on EC2 since we updated the memory settings of the controller in https://github.com/jenkins-infra/jenkins-infra/pull/1789

            Please feel free to reopen and re-assign it to me if the error happen again.

            Show
            dduportal Damien Duportal added a comment - First, we switched ACI agents to Kubernetes, and we do not see the error anymore for theses cases (ref. INFRA-2918 ) It seems that this issue did not happen on EC2 since we updated the memory settings of the controller in https://github.com/jenkins-infra/jenkins-infra/pull/1789 Please feel free to reopen and re-assign it to me if the error happen again.

              People

              Assignee:
              dduportal Damien Duportal
              Reporter:
              olblak Olivier Vernin
              Votes:
              4 Vote for this issue
              Watchers:
              9 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: