• Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Major Major
    • git-plugin, pipeline

      This issue resembles very much JENKINS-43106.

      We have a pipeline job that is run in parallel on 10 different executors every night, multiple times.

      For the past week, the jobs get stuck on gitSCM checkout which is configured as follows:

      dir(sourceDir) {
          deleteDir()
          echo "Checking out ${commitId} from ${url}"
          checkout changelog: updateChanges, scm: [
              $class: 'GitSCM', branches: [[name: commitId]],
              userRemoteConfigs: [[url: url]]]
          commitId = sh returnStdout: true, script: "git rev-parse HEAD"
          echo "Checked out ${commitId} from ${url}"

      When this code is run in another job, it never fails.

      The difference is that when it fails, we have a "manager" job that runs the following:

      def call(testRunners, maxNumberOfTests, buildId) {
          def parallelRuns = [:]
          def numberOfRuns = 0
          for (int i = 0; i < availableExecutors; i++) {
             parallelRuns[i] = {
                waitUntil {
                   build job: 'TestRunner', parameters: [
                     string(name: 'sessionId', value: buildId),
                     string(name: 'randomBit', value: "${randomBit}")
                   ], propagate: false
                return (numberOfRuns > maxNumberOfRuns)}}}
        parallel parallelRuns
      }

      On the other hand it passes when we have a single pipeline, running in parallel the same function.

      Randomly, the TestRunner job will hang on checkout: I can see the first echo. There's no access to the git server (according to the git-daemon logs).

          [JENKINS-56838] pipeline job hangs forever at checkout GitSCM

          Mark Waite added a comment -

          The conditions which caused JENKINS-46106 seemed to be specifically connected to Jira plugin version 2.5.0. Since your list of installed plugins does not include Jira plugin 2.5.0, I assume it is not the same condition as JENKINS-46106. I don't have any suggestions of experiments which might help isolate the problem.

          Others might be able to analyze a thread dump from the Jenkins server when the checkout hangs. I don't have that skill.

          Mark Waite added a comment - The conditions which caused JENKINS-46106 seemed to be specifically connected to Jira plugin version 2.5.0. Since your list of installed plugins does not include Jira plugin 2.5.0, I assume it is not the same condition as JENKINS-46106 . I don't have any suggestions of experiments which might help isolate the problem. Others might be able to analyze a thread dump from the Jenkins server when the checkout hangs. I don't have that skill.

          Tsvi Mostovicz added a comment - - edited

          From the support logs I found the following hint.

          2019-04-02 02:42:31.157+0000 [id=10666] WARNING hudson.Proc$LocalProc#join: Process leaked file descriptors. See https://jenkins.io/redirect/troubleshooting/process-leaked-file-descriptors for more information
          java.lang.Exception
            at hudson.Proc$LocalProc.join(Proc.java:334)
            at hudson.Proc.joinWithTimeout(Proc.java:170)
            at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:2311)
          {{   at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:2248)}}
            at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:2244)
           at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommand(CliGitAPIImpl.java:1777)
            at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommand(CliGitAPIImpl.java:1789)
           at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$9.sparseCheckout(CliGitAPIImpl.java:2675)
            at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$9.execute(CliGitAPIImpl.java:2595)
            at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1228)
            at org.jenkinsci.plugins.workflow.steps.scm.SCMStep.checkout(SCMStep.java:120)
            at org.jenkinsci.plugins.workflow.libs.SCMSourceRetriever.lambda$doRetrieve$1(SCMSourceRetriever.java:147)
           at org.jenkinsci.plugins.workflow.libs.SCMSourceRetriever.retrySCMOperation(SCMSourceRetriever.java:98)
            at org.jenkinsci.plugins.workflow.libs.SCMSourceRetriever.doRetrieve(SCMSourceRetriever.java:146)
            at org.jenkinsci.plugins.workflow.libs.SCMSourceRetriever.retrieve(SCMSourceRetriever.java:87)
            at org.jenkinsci.plugins.workflow.libs.LibraryAdder.retrieve(LibraryAdder.java:157)
            at org.jenkinsci.plugins.workflow.libs.LibraryAdder.add(LibraryAdder.java:138)
           at org.jenkinsci.plugins.workflow.libs.LibraryDecorator$1.call(LibraryDecorator.java:125)
           at org.codehaus.groovy.control.CompilationUnit.applyToPrimaryClassNodes(CompilationUnit.java:1065)
            at org.codehaus.groovy.control.CompilationUnit.doPhaseOperation(CompilationUnit.java:603)
           at org.codehaus.groovy.control.CompilationUnit.processPhaseOperations(CompilationUnit.java:581)
           at org.codehaus.groovy.control.CompilationUnit.compile(CompilationUnit.java:558)
           at groovy.lang.GroovyClassLoader.doParseClass(GroovyClassLoader.java:298)
           at groovy.lang.GroovyClassLoader.parseClass(GroovyClassLoader.java:268)
            at groovy.lang.GroovyShell.parseClass(GroovyShell.java:688)
            at groovy.lang.GroovyShell.parse(GroovyShell.java:700)
            at org.jenkinsci.plugins.workflow.cps.CpsGroovyShell.lambda$doParse$0(CpsGroovyShell.java:135)
            at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.GroovySandbox.runInSandbox(GroovySandbox.java:136)
            at org.jenkinsci.plugins.workflow.cps.CpsGroovyShell.doParse(CpsGroovyShell.java:132)
            at org.jenkinsci.plugins.workflow.cps.CpsGroovyShell.reparse(CpsGroovyShell.java:127)
            at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.parseScript(CpsFlowExecution.java:560)
            at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.start(CpsFlowExecution.java:521)
            at org.jenkinsci.plugins.workflow.job.WorkflowRun.run(WorkflowRun.java:320)
            at hudson.model.ResourceController.execute(ResourceController.java:97)
            at hudson.model.Executor.run(Executor.java:429)

           

          Tsvi Mostovicz added a comment - - edited From the support logs I found the following hint. 2019-04-02 02:42:31.157+0000 [id=10666] WARNING hudson.Proc$LocalProc#join: Process leaked file descriptors. See https://jenkins.io/redirect/troubleshooting/process-leaked-file-descriptors for more information java.lang.Exception   at hudson.Proc$LocalProc.join(Proc.java:334)   at hudson.Proc.joinWithTimeout(Proc.java:170)   at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:2311) {{   at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:2248)}}   at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:2244)  at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommand(CliGitAPIImpl.java:1777)   at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommand(CliGitAPIImpl.java:1789)  at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$9.sparseCheckout(CliGitAPIImpl.java:2675)   at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$9.execute(CliGitAPIImpl.java:2595)   at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1228)   at org.jenkinsci.plugins.workflow.steps.scm.SCMStep.checkout(SCMStep.java:120)   at org.jenkinsci.plugins.workflow.libs.SCMSourceRetriever.lambda$doRetrieve$1(SCMSourceRetriever.java:147)  at org.jenkinsci.plugins.workflow.libs.SCMSourceRetriever.retrySCMOperation(SCMSourceRetriever.java:98)   at org.jenkinsci.plugins.workflow.libs.SCMSourceRetriever.doRetrieve(SCMSourceRetriever.java:146)   at org.jenkinsci.plugins.workflow.libs.SCMSourceRetriever.retrieve(SCMSourceRetriever.java:87)   at org.jenkinsci.plugins.workflow.libs.LibraryAdder.retrieve(LibraryAdder.java:157)   at org.jenkinsci.plugins.workflow.libs.LibraryAdder.add(LibraryAdder.java:138)  at org.jenkinsci.plugins.workflow.libs.LibraryDecorator$1.call(LibraryDecorator.java:125)  at org.codehaus.groovy.control.CompilationUnit.applyToPrimaryClassNodes(CompilationUnit.java:1065)   at org.codehaus.groovy.control.CompilationUnit.doPhaseOperation(CompilationUnit.java:603)  at org.codehaus.groovy.control.CompilationUnit.processPhaseOperations(CompilationUnit.java:581)  at org.codehaus.groovy.control.CompilationUnit.compile(CompilationUnit.java:558)  at groovy.lang.GroovyClassLoader.doParseClass(GroovyClassLoader.java:298)  at groovy.lang.GroovyClassLoader.parseClass(GroovyClassLoader.java:268)   at groovy.lang.GroovyShell.parseClass(GroovyShell.java:688)   at groovy.lang.GroovyShell.parse(GroovyShell.java:700)   at org.jenkinsci.plugins.workflow.cps.CpsGroovyShell.lambda$doParse$0(CpsGroovyShell.java:135)   at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.GroovySandbox.runInSandbox(GroovySandbox.java:136)   at org.jenkinsci.plugins.workflow.cps.CpsGroovyShell.doParse(CpsGroovyShell.java:132)   at org.jenkinsci.plugins.workflow.cps.CpsGroovyShell.reparse(CpsGroovyShell.java:127)   at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.parseScript(CpsFlowExecution.java:560)   at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.start(CpsFlowExecution.java:521)   at org.jenkinsci.plugins.workflow.job.WorkflowRun.run(WorkflowRun.java:320)   at hudson.model.ResourceController.execute(ResourceController.java:97)   at hudson.model.Executor.run(Executor.java:429)  

          This appears to be due to the fact that Git SCM parses through all of the older builds when checking out.

          (See https://github.com/jenkinsci/git-plugin/blob/34fa174566716c8c6a1ab392a0cf3b5c05fc4d41/src/main/java/hudson/plugins/git/GitSCM.java#L1136)

          As we were not clearing out old builds, over time the performance hit caused by parsing over all the older builds became so huge, checkout would timeout after 1 hour.

          Clearing out old builds resolved the issue for us.

           

          Note that the File variable passed to the function was set to null, as the scm step function calling it had updateChanges set to false.

          This might be a possible enhancement to the GitSCM plugin to eventually skip calculation of the history if the changelog file passed to checkout is null.

           

          As I'm not really sure about how this might affect other parts of the code, I'm weary of implementing it.

           

          Tsvi Mostovicz added a comment - This appears to be due to the fact that Git SCM parses through all of the older builds when checking out. (See  https://github.com/jenkinsci/git-plugin/blob/34fa174566716c8c6a1ab392a0cf3b5c05fc4d41/src/main/java/hudson/plugins/git/GitSCM.java#L1136 ) As we were not clearing out old builds, over time the performance hit caused by parsing over all the older builds became so huge, checkout would timeout after 1 hour. Clearing out old builds resolved the issue for us.   Note that the File variable passed to the function was set to null, as the scm step function calling it had updateChanges set to false. This might be a possible enhancement to the GitSCM plugin to eventually skip calculation of the history if the changelog file passed to checkout is null.   As I'm not really sure about how this might affect other parts of the code, I'm weary of implementing it.  

          markewaite please see the comment I added. I solved the issue for us, I wonder though who should I tag for the enhancement I proposed. (I don't mind putting in a PR myself, but I have never developed for Jenkins so I'll need some handholding)

          Tsvi Mostovicz added a comment - markewaite please see the comment I added. I solved the issue for us, I wonder though who should I tag for the enhancement I proposed. (I don't mind putting in a PR myself, but I have never developed for Jenkins so I'll need some handholding)

          Mark Waite added a comment -

          I suspect this is related to JENKINS-19022, git plugin mistakenly retains list of SHA1's of all preceding builds in later builds, bloating memory use. We've attempted at least 3 different times to resolve JENKINS-19022, each time abandoning due to incompatibilities that are introduced by the proposed changes.

          Reducing the number of builds retained in history is the most direct workaround for the problem. Other workarounds exist as well, like groovy scripts that will remove old BuildData records.

          Unless you're ready for months and months of work to implement the fix without causing compatibility problems, I'd recommend you prefer retaining fewer builds rather than attempting to make a code change in the git plugin for this case.

          Mark Waite added a comment - I suspect this is related to JENKINS-19022 , git plugin mistakenly retains list of SHA1's of all preceding builds in later builds, bloating memory use. We've attempted at least 3 different times to resolve JENKINS-19022 , each time abandoning due to incompatibilities that are introduced by the proposed changes. Reducing the number of builds retained in history is the most direct workaround for the problem. Other workarounds exist as well, like groovy scripts that will remove old BuildData records. Unless you're ready for months and months of work to implement the fix without causing compatibility problems, I'd recommend you prefer retaining fewer builds rather than attempting to make a code change in the git plugin for this case.

          Based on my read of JENKINS-19022, it seems this is caused by the same issue

          Tsvi Mostovicz added a comment - Based on my read of JENKINS-19022 , it seems this is caused by the same issue

            Unassigned Unassigned
            tsvi Tsvi Mostovicz
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: