-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
Ubuntu 22.04
Docker jenkins/jenkins:latest-jdk17 (currently Jenkins 2.420)
LDAP Plugin Version694.vc02a_69c9787f
External Monitor Job Type Plugin Version207.v98a_a_37a_85525
Introduction
We have the following configuration:
- Jenkins server on internal IP 10.156.0.107
- LDAP Server #1 on internal IP 10.156.0.119
- LDAP Server #2 on internal IP 10.156.15.194
The LDAP plugin is running on the Jenkins server. In the security settings of Jenkins, LDAP is configured as a security realm with LDAP #2 as the first server and LDAP #1 as the second server.
There are about 500 External Jobs configured on the Jenkins server.
The External Jobs are not all on the same subnet. They call the jenkins using the $URL directed to the internal IP in /etc/hosts if the servers are on the same subnet. For the other servers, the name resolution is done via DNS.
The External Jobs are called on the respective system in this form:
$JAVA -jar /usr/local/jenkins/jenkins-cli.jar -s "https://$(USER):$(PASSWORD)@$"(URL) -webSocket set-external-build-result --job "${SERVER}_${JOBNAME}" -r "$status" -d "$runtime" -l - < ${TMPFILE}.gz;
Option -webSocket is set. It works like -http but using WebSocket (works better with most reverse proxies).
Since our Jenkins is running under Docker behind a reverse proxy we have set the -webSocket option.
So far so good - Jenkins works fine. LDAP authentication of users and external jobs works fine.
Now to the problem
On the LDAP server #2 the RAM is running full overnight.
It seems that the LDAP plugin could be the cause. We can reproduce this if we set LDAP Server #1 as the primary server in the Jenkins Security Settings. Then the RAM runs full here.
Troubleshooting approaches
Apparently the 500 jobs leave sockets open here and there, which we can see on the LDAP server with:
ss |grep 10.156.0.107 tcp ESTAB 0 0 [::ffff:10.156.15.194]:ldap [::ffff:10.156.0.107]:50104 tcp ESTAB 0 0 [::ffff:10.156.15.194]:ldap [::ffff:10.156.0.107]:50884 tcp ESTAB 0 0 [::ffff:10.156.15.194]:ldap [::ffff:10.156.0.107]:50108 tcp ESTAB 0 0 [::ffff:10.156.15.194]:ldap [::ffff:10.156.0.107]:50454 tcp ESTAB 0 0 [::ffff:10.156.15.194]:ldap [::ffff:10.156.0.107]:50102 tcp ESTAB 0 0 [::ffff:10.156.15.194]:ldap [::ffff:10.156.0.107]:33438 tcp ESTAB 0 0 [::ffff:10.156.15.194]:ldap [::ffff:10.156.0.107]:33032
On the servers where the external jobs are running, we see stuck External Jenkins jobs with::
ps aux |grep jenkins-cli.jar
root 1400923 0.1 0.0 5771952 11768 ? Sl Jul23 87:28 /usr/bin/java -jar /usr/local/jenkins/jenkins-cli.jar -s https://xxx:xxx@xxx/ -webSocket set-external-build-result --job yyy -r 0 -d 56355 -l - root 2185574 0.1 0.5 5772016 69072 ? Sl Aug23 15:00 /usr/bin/java -jar /usr/local/jenkins/jenkins-cli.jar -s https://xxx:xxx@xxx/ -webSocket set-external-build-result --job yyy -r 0 -d 58553 -l - root 3095506 0.1 0.1 5771952 13112 ? Sl Jul20 107:18 /usr/bin/java -jar /usr/local/jenkins/jenkins-cli.jar -s https://xxx:xxx@xxx/ -webSocket set-external-build-result --job yyy -r 0 -d 56254 -l -
Workaround
As a workaround, we set up a cronjob on LDAP Server #2 that stops and restarts the LDAP service at night. This empties the RAM and the game starts all over again.
I have attached a screenshot of the memory usage of LDAP server #2 to this issue.
Please let me know if you need further information.