Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-27046

Restarted the master and all slaves went offline

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • remoting

      We restarted the master and all the slaves (330 of them) went offline. We were able to bring back several slaves manually and through a job that restarts slaves remotely. Please review both the error logs (master and slave) attached.

      [Posting parts of the error log here for making it easier for others to search for the same problem.]

      On the master:
      Caused by: java.lang.OutOfMemoryError: unable to create new native thread

      On the slave:
      WARNING hudson.remoting.AbstractByteArrayCommandTransport
      Failed to construct Command
      java.io.EOFException
      at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2325)

          [JENKINS-27046] Restarted the master and all slaves went offline

          Aditya Inapurapu created issue -

          We restarted the master again after removing a few plug-ins and we have a new error message. The previous EOF related error is not in the log anymore.

          Aditya Inapurapu added a comment - We restarted the master again after removing a few plug-ins and we have a new error message. The previous EOF related error is not in the log anymore.
          Aditya Inapurapu made changes -
          Attachment New: logFromMaster2.txt [ 28632 ]
          Aditya Inapurapu made changes -
          Priority Original: Major [ 3 ] New: Minor [ 4 ]

          The default physical memory allocated in Redhat to a user profile is 1gb which is quite low for our needs. We increased this to 3gb and the issue did not reoccur. We changed the priority of this defect to minor.

          We will leave the ticket open and so that some enhancements to Jenkins can be made using this information. Perhaps Jenkins could check on the physical memory allocation on start up or restart.

          Aditya Inapurapu added a comment - The default physical memory allocated in Redhat to a user profile is 1gb which is quite low for our needs. We increased this to 3gb and the issue did not reoccur. We changed the priority of this defect to minor. We will leave the ticket open and so that some enhancements to Jenkins can be made using this information. Perhaps Jenkins could check on the physical memory allocation on start up or restart.

          Daniel Beck added a comment -

          The default physical memory allocated in Redhat to a user profile is 1gb which is quite low for our needs.

          Disk space or RAM? This looks more like a limitation of your specific environment than any sensible vendor default.

          Daniel Beck added a comment - The default physical memory allocated in Redhat to a user profile is 1gb which is quite low for our needs. Disk space or RAM? This looks more like a limitation of your specific environment than any sensible vendor default.

          Aditya Inapurapu added a comment - - edited

          Daniel,
          Yes, this was caused by a limitation in the environment. At the same time, could Jenkins possibly add a more appropriate log entry like 'Cannot start more slaves. Memory allocation of user is too low.'? An error like 'java.io.EOFException', 'Trying to unexport an object that's already unexported', 'ERROR: Connection terminated' did not lead us in the right direction.

          Aditya Inapurapu added a comment - - edited Daniel, Yes, this was caused by a limitation in the environment. At the same time, could Jenkins possibly add a more appropriate log entry like 'Cannot start more slaves. Memory allocation of user is too low.'? An error like 'java.io.EOFException', 'Trying to unexport an object that's already unexported', 'ERROR: Connection terminated' did not lead us in the right direction.
          Daniel Beck made changes -
          Component/s New: remoting [ 15489 ]
          Component/s Original: core [ 15593 ]

          Daniel Beck added a comment -

          Slave log says that the following is the original error:

          Caused by: java.lang.OutOfMemoryError: unable to create new native thread

          (Java exceptions are always read bottom to top, which is weird but a well-known convention)

          And in that situation, everything's horribly broken. Doing nicer error reporting is probably not worth the effort.

          Daniel Beck added a comment - Slave log says that the following is the original error: Caused by: java.lang.OutOfMemoryError: unable to create new native thread (Java exceptions are always read bottom to top, which is weird but a well-known convention) And in that situation, everything's horribly broken. Doing nicer error reporting is probably not worth the effort.
          R. Tyler Croy made changes -
          Workflow Original: JNJira [ 161238 ] New: JNJira + In-Review [ 180625 ]

            Unassigned Unassigned
            adityai Aditya Inapurapu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: