Today at ~10:50 CEST, all of my slaves went down, and it is impossible to relaunch them. All of them suddenly show the following error in the log:
[07/08/16 15:01:58] [SSH] Starting slave process: cd "/localworkspaces/coverity/jenkins" && /opt/java1.7_x86_64/bin/java -Xmx2g -Xms2g -verbose:gc -Xloggc:/tmp/gc.log -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC -XX:+HeapDumpOnOutOfMemoryError -jar slave.jar <===[JENKINS REMOTING CAPACITY]===>channel started hudson.util.IOException2: Slave JVM has not reported exit code. Is it still running? at hudson.plugins.sshslaves.SSHLauncher.startSlave(SSHLauncher.java:984) at hudson.plugins.sshslaves.SSHLauncher.access$400(SSHLauncher.java:137) at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:725) at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:706) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Remote call on socvm458 failed at hudson.remoting.Channel.call(Channel.java:789) at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:516) at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:389) at hudson.plugins.sshslaves.SSHLauncher.startSlave(SSHLauncher.java:976) ... 7 more Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded [07/08/16 15:02:01] Launch failed - cleaning up connection [07/08/16 15:02:01] [SSH] Connection closed. ERROR: Connection terminated java.io.IOException: Unexpected termination of the channel at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:73) Caused by: java.io.EOFException at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2353) at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2822) at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:804) at java.io.ObjectInputStream.<init>(ObjectInputStream.java:301) at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:48) at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:59)
I have already tried to increase the memory size and provide debugging options to JVM as described here: https://cloudbees.zendesk.com/hc/en-us/articles/204529970-Java-Heap-Out-of-Memory-Exception
... but to no effect. All my slaves are down, and I can't get them running again.
This is the gc.log:
Heap PSYoungGen total 611648K, used 41943K [0x00000007d5560000, 0x0000000800000000, 0x0000000800000000) eden space 524288K, 8% used [0x00000007d5560000,0x00000007d7e55da8,0x00000007f5560000) from space 87360K, 0% used [0x00000007faab0000,0x00000007faab0000,0x0000000800000000) to space 87360K, 0% used [0x00000007f5560000,0x00000007f5560000,0x00000007faab0000) ParOldGen total 1398144K, used 0K [0x0000000780000000, 0x00000007d5560000, 0x00000007d5560000) object space 1398144K, 0% used [0x0000000780000000,0x0000000780000000,0x00000007d5560000) PSPermGen total 21248K, used 6741K [0x000000077ae00000, 0x000000077c2c0000, 0x0000000780000000) object space 21248K, 31% used [0x000000077ae00000,0x000000077b4957e0,0x000000077c2c0000)
....now that I have restarted Jenkins, the slaves came up, too.
So, unless you want to do a postmortem analysis, this can be closed.