Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-68199

java.lang.OutOfMemoryError: unable to create new native thread

    • Icon: Bug Bug
    • Resolution: Not A Defect
    • Icon: Minor Minor
    • git-client-plugin
    • None

      Please see here - https://issues.jenkins.io/browse/JENKINS-65873?focusedCommentId=424033&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-424033 and ticket in general. This issue only happens when there is a git checkout stage in the pipeline. One suggestion here - https://github.com/jenkinsci/remoting/pull/505#issuecomment-1062281877 - is to downgrade the git client plugin to 3.7.0. It seems that the suspicion is somewhere on the git side of things. Any ideas, suggestions, things we can do to help debug ?

          [JENKINS-68199] java.lang.OutOfMemoryError: unable to create new native thread

          Mark Waite added a comment - - edited

          I have no ideas to offer. I don't know why chanjetsdp suggested downgrading to git client plugin 3.7.0. There have been changes since git client plugin 3.7.0 (new versions of JGit, enhancements to support automatic selection of git version for best performance), but none that would make me suspect it somehow has significantly altered memory use.

          If the log file reports that JGit is being used instead of command line git, then you might try disabling the git plugin performance improvements that are based on repository size. That would force it to always use command line git even if it recognized that JGit would be faster due to the small size of the repository.

          Mark Waite added a comment - - edited I have no ideas to offer. I don't know why chanjetsdp suggested downgrading to git client plugin 3.7.0. There have been changes since git client plugin 3.7.0 (new versions of JGit, enhancements to support automatic selection of git version for best performance), but none that would make me suspect it somehow has significantly altered memory use. If the log file reports that JGit is being used instead of command line git, then you might try disabling the git plugin performance improvements that are based on repository size. That would force it to always use command line git even if it recognized that JGit would be faster due to the small size of the repository.

          Donald Gobin added a comment -

          Hi Mark,

          Thanks for responding. In regards to JGit usage, looks like we do not use this ?

          This makes me wonder now if the problem is actually with git itself leaking. Again, we only see this problem when the pipeline includes a git checkout.

          Donald Gobin added a comment - Hi Mark, Thanks for responding. In regards to JGit usage, looks like we do not use this ? This makes me wonder now if the problem is actually with git itself leaking. Again, we only see this problem when the pipeline includes a git checkout.

          Mark Waite added a comment -

          Since command line git is run as a separate process and is short-lived, I don't think there is much opportunity for a memory leak there. Changing the global setting to disable the performance optimization should be harmless and may be interesting if it actually results in a different behavior in the case you're seeing.

          Mark Waite added a comment - Since command line git is run as a separate process and is short-lived, I don't think there is much opportunity for a memory leak there. Changing the global setting to disable the performance optimization should be harmless and may be interesting if it actually results in a different behavior in the case you're seeing.

          Donald Gobin added a comment -

          I don't think there is a memory leak as well in the classic sense, but rather too many git processes chewing up RAM causing the error. Our Jenkins host are quite busy, so it's definitely possible that at any point in time, there could be a lot of spawned git processes eating up system RAM.

          In terms of changing configuration, we don't have the option to disable performance optimization because we're using git and not JGit. Are you thinking that maybe we should switch to JGit and disable performance optimization and see if the problem goes away ? Maybe this will reduce the system memory consumption by moving some of the git checkouts to inside the JVM ?

           

          Donald Gobin added a comment - I don't think there is a memory leak as well in the classic sense, but rather too many git processes chewing up RAM causing the error. Our Jenkins host are quite busy, so it's definitely possible that at any point in time, there could be a lot of spawned git processes eating up system RAM. In terms of changing configuration, we don't have the option to disable performance optimization because we're using git and not JGit. Are you thinking that maybe we should switch to JGit and disable performance optimization and see if the problem goes away ? Maybe this will reduce the system memory consumption by moving some of the git checkouts to inside the JVM ?  

          Mark Waite added a comment -

          I must be missing the context. The git checkout for a job happens on an agent, not on the controller. Usually, the agent is only performing a single job before it exits. I assumed that the message about being unable to create a new native thread was being reported by the agent, not by the controller. Is the message happening on the agent or on the controller?

          My suggestion to disable the performance optimization was offered just in case the git plugin was ignoring the fact that you have not enabled JGit. Switching to JGit on the controller will switch from forking a separate git process to instead perform the git operations inside the Jenkins process (controller or agent, as required by the context). We found in performance testing that it was faster to use JGit with small repositories and to use CLI git with large repositories. Your mileage may vary (as it does in almost all performance related topics).

          Mark Waite added a comment - I must be missing the context. The git checkout for a job happens on an agent, not on the controller. Usually, the agent is only performing a single job before it exits. I assumed that the message about being unable to create a new native thread was being reported by the agent, not by the controller. Is the message happening on the agent or on the controller? My suggestion to disable the performance optimization was offered just in case the git plugin was ignoring the fact that you have not enabled JGit. Switching to JGit on the controller will switch from forking a separate git process to instead perform the git operations inside the Jenkins process (controller or agent, as required by the context). We found in performance testing that it was faster to use JGit with small repositories and to use CLI git with large repositories. Your mileage may vary (as it does in almost all performance related topics).

          Donald Gobin added a comment - - edited

          The checkout is happening on both. There is a checkout on the master to get the Jenkinsfile pipeline definition, libs, etc from the repo (actually it does a full checkout on both sides, see here - https://issues.jenkins.io/browse/JENKINS-64199). The checkout is then done again on the agent where the pipeline can work with the repo contents to do its work.

          By the way, I linked this ticket to the other https://issues.jenkins.io/browse/JENKINS-65873 where the latest update seems to point to a kernel bug and a workaround in jdk 18 (we're on Jenkins LTS 2.303 right now). But again, we're only able to reproduce this problem when we have a git checkout stage in our pipeline – without this stage, we do not see the error.

           

          Donald Gobin added a comment - - edited The checkout is happening on both. There is a checkout on the master to get the Jenkinsfile pipeline definition, libs, etc from the repo (actually it does a full checkout on both sides, see here - https://issues.jenkins.io/browse/JENKINS-64199 ). The checkout is then done again on the agent where the pipeline can work with the repo contents to do its work. By the way, I linked this ticket to the other https://issues.jenkins.io/browse/JENKINS-65873  where the latest update seems to point to a kernel bug and a workaround in jdk 18 (we're on Jenkins LTS 2.303 right now). But again, we're only able to reproduce this problem when we have a git checkout stage in our pipeline – without this stage, we do not see the error.  

          See JENKINS-65873. The problem lies between the JDK and the linux kernel. It's not a Java bug.

          Vincent Latombe added a comment - See JENKINS-65873 . The problem lies between the JDK and the linux kernel. It's not a Java bug.

          Donald Gobin added a comment -

          vlatombe This could be part of the problem, but I'm not sure it's the whole story. Note that we have not seen this problem prior to mid last year and we've been running Jenkins for much longer. If it was a linux bug that was there all along, why didn't we see it earlier ? And again, we cannot reproduce without the git stage in the pipeline. So, two pieces of evidence: 1) did not start to happen until mid last year, and 2) does not happen unless there is a git checkout stage in the pipeline. markewaite the reason that it was suggested to downgrade to the git plugin 3.7.0 was because it was prior to mid last year when our problem started to happen. I acknowledge that the plugin, in our case, just spawns the git executable and this should not cause a leak, at least not on the linux OS process side, but I don't know what else the plugin does internally.

          Donald Gobin added a comment - vlatombe This could be part of the problem, but I'm not sure it's the whole story. Note that we have not seen this problem prior to mid last year and we've been running Jenkins for much longer. If it was a linux bug that was there all along, why didn't we see it earlier ? And again, we cannot reproduce without the git stage in the pipeline. So, two pieces of evidence: 1) did not start to happen until mid last year, and 2) does not happen unless there is a git checkout stage in the pipeline. markewaite the reason that it was suggested to downgrade to the git plugin 3.7.0 was because it was prior to mid last year when our problem started to happen. I acknowledge that the plugin, in our case, just spawns the git executable and this should not cause a leak, at least not on the linux OS process side, but I don't know what else the plugin does internally.

          rhinoceros.xn added a comment -

          After adding sleep(10) before git checkout this problem no longer occurs.

          Maybe sleep(10) before git or checkout step is a workaround.

          sleep(10)
          checkout changelog: false, poll: false, scm: ........
          
          OR
          
          sleep(10)
          git branch: 'master', credentialsId: '******', url: 'git@git.yourcomampy.com:xx/zz.git'

          rhinoceros.xn added a comment - After adding sleep(10) before git checkout this problem no longer occurs. Maybe sleep(10) before git or checkout step is a workaround. sleep(10) checkout changelog: false , poll: false , scm: ........ OR sleep(10) git branch: 'master' , credentialsId: '******' , url: 'git@git.yourcomampy.com:xx/zz.git'

            Unassigned Unassigned
            dg424 Donald Gobin
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: