-
Bug
-
Resolution: Fixed
-
Major
-
Jenkins ver. 2.176.3
EC2 Plugin 1.45
-
-
1.66
We run a jenkins system that launches up to 120 ubuntu 18.04 ec2 nodes, and terminates the nodes down after an idle timeout of 5 min. We have observed that after several days of running nodes, the thread counts have increased until our system runs out of memory and eventually crashes with Out of Memory errors. The thread that looks to be created but never destroyed for each ec2 node:
"Thread-10000" daemon prio=5 RUNNABLE
java.net.SocketInputStream.socketRead0(Native Method)
java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
java.net.SocketInputStream.read(SocketInputStream.java:171)
java.net.SocketInputStream.read(SocketInputStream.java:141)
com.trilead.ssh2.crypto.cipher.CipherInputStream.fill_buffer(CipherInputStream.java:41)
com.trilead.ssh2.crypto.cipher.CipherInputStream.internal_read(CipherInputStream.java:52)
com.trilead.ssh2.crypto.cipher.CipherInputStream.getBlock(CipherInputStream.java:79)
com.trilead.ssh2.crypto.cipher.CipherInputStream.read(CipherInputStream.java:108)
com.trilead.ssh2.transport.TransportConnection.receiveMessage(TransportConnection.java:232)
com.trilead.ssh2.transport.TransportManager.receiveLoop(TransportManager.java:706)
com.trilead.ssh2.transport.TransportManager$1.run(TransportManager.java:502)
java.lang.Thread.run(Thread.java:748)
Below is an snippet of our cloud config (scrubbed of all user specific data):
<clouds>
<hudson.plugins.ec2.EC2Cloud plugin="ec2@1.45">
<name>cloud-name</name>
<useInstanceProfileForCredentials>false</useInstanceProfileForCredentials>
<roleArn></roleArn>
<roleSessionName></roleSessionName>
<credentialsId>cred</credentialsId>
<privateKey>
<privateKey>privatekey</privateKey>
</privateKey>
<instanceCap>140</instanceCap>
<templates>
<hudson.plugins.ec2.SlaveTemplate>
<ami>ami-id</ami>
<description>description</description>
<zone></zone>
<securityGroups>security-groups</securityGroups>
<remoteFS>/home/user</remoteFS>
<type>size</type>
<ebsOptimized>true</ebsOptimized>
<monitoring>true</monitoring>
<t2Unlimited>false</t2Unlimited>
<labels>custom-label</labels>
<mode>NORMAL</mode>
<initScript>#!/bin/bash -xe
echo "Hello world"
</initScript>
<tmpDir></tmpDir>
<userData></userData>
<numExecutors>1</numExecutors>
<remoteAdmin>user</remoteAdmin>
<jvmopts></jvmopts>
<subnetId>subnet-ids</subnetId>
<idleTerminationMinutes>5</idleTerminationMinutes>
<iamInstanceProfile>iam-profile</iamInstanceProfile>
<deleteRootOnTermination>true</deleteRootOnTermination>
<useEphemeralDevices>false</useEphemeralDevices>
<customDeviceMapping>/dev/sda1=:100:true:gp2</customDeviceMapping>
<instanceCap>120</instanceCap>
<stopOnTerminate>false</stopOnTerminate>
<tags>
<hudson.plugins.ec2.EC2Tag>
<name>Name</name>
<value>ec2name</value>
</hudson.plugins.ec2.EC2Tag>
<hudson.plugins.ec2.EC2Tag>
<name>tag</name>
<value>ec2tag</value>
</hudson.plugins.ec2.EC2Tag>
</tags>
<connectionStrategy>PRIVATE_IP</connectionStrategy>
<associatePublicIp>false</associatePublicIp>
<useDedicatedTenancy>false</useDedicatedTenancy>
<amiType class="hudson.plugins.ec2.UnixData"/>
<launchTimeout>2147483647</launchTimeout>
<connectBySSHProcess>true</connectBySSHProcess>
<maxTotalUses>-1</maxTotalUses>
<nextSubnet>0</nextSubnet>
</hudson.plugins.ec2.SlaveTemplate>
<region>region</region>
<noDelayProvisioning>false</noDelayProvisioning>
</hudson.plugins.ec2.EC2Cloud>
</clouds>
It looks to be related to the connectBySSHProcess option, when we set this value to false, we no longer see the Thread leak.
- links to