Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-56838

pipeline job hangs forever at checkout GitSCM

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Duplicate
    • git-plugin, pipeline

    Description

      This issue resembles very much JENKINS-43106.

      We have a pipeline job that is run in parallel on 10 different executors every night, multiple times.

      For the past week, the jobs get stuck on gitSCM checkout which is configured as follows:

      dir(sourceDir) {
          deleteDir()
          echo "Checking out ${commitId} from ${url}"
          checkout changelog: updateChanges, scm: [
              $class: 'GitSCM', branches: [[name: commitId]],
              userRemoteConfigs: [[url: url]]]
          commitId = sh returnStdout: true, script: "git rev-parse HEAD"
          echo "Checked out ${commitId} from ${url}"

      When this code is run in another job, it never fails.

      The difference is that when it fails, we have a "manager" job that runs the following:

      def call(testRunners, maxNumberOfTests, buildId) {
          def parallelRuns = [:]
          def numberOfRuns = 0
          for (int i = 0; i < availableExecutors; i++) {
             parallelRuns[i] = {
                waitUntil {
                   build job: 'TestRunner', parameters: [
                     string(name: 'sessionId', value: buildId),
                     string(name: 'randomBit', value: "${randomBit}")
                   ], propagate: false
                return (numberOfRuns > maxNumberOfRuns)}}}
        parallel parallelRuns
      }

      On the other hand it passes when we have a single pipeline, running in parallel the same function.

      Randomly, the TestRunner job will hang on checkout: I can see the first echo. There's no access to the git server (according to the git-daemon logs).

      Attachments

        Issue Links

          Activity

            markewaite Mark Waite added a comment -

            The conditions which caused JENKINS-46106 seemed to be specifically connected to Jira plugin version 2.5.0. Since your list of installed plugins does not include Jira plugin 2.5.0, I assume it is not the same condition as JENKINS-46106. I don't have any suggestions of experiments which might help isolate the problem.

            Others might be able to analyze a thread dump from the Jenkins server when the checkout hangs. I don't have that skill.

            markewaite Mark Waite added a comment - The conditions which caused JENKINS-46106 seemed to be specifically connected to Jira plugin version 2.5.0. Since your list of installed plugins does not include Jira plugin 2.5.0, I assume it is not the same condition as JENKINS-46106 . I don't have any suggestions of experiments which might help isolate the problem. Others might be able to analyze a thread dump from the Jenkins server when the checkout hangs. I don't have that skill.
            tsvi Tsvi Mostovicz added a comment - - edited

            From the support logs I found the following hint.

            2019-04-02 02:42:31.157+0000 [id=10666] WARNING hudson.Proc$LocalProc#join: Process leaked file descriptors. See https://jenkins.io/redirect/troubleshooting/process-leaked-file-descriptors for more information
            java.lang.Exception
              at hudson.Proc$LocalProc.join(Proc.java:334)
              at hudson.Proc.joinWithTimeout(Proc.java:170)
              at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:2311)
            {{   at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:2248)}}
              at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:2244)
             at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommand(CliGitAPIImpl.java:1777)
              at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommand(CliGitAPIImpl.java:1789)
             at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$9.sparseCheckout(CliGitAPIImpl.java:2675)
              at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$9.execute(CliGitAPIImpl.java:2595)
              at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1228)
              at org.jenkinsci.plugins.workflow.steps.scm.SCMStep.checkout(SCMStep.java:120)
              at org.jenkinsci.plugins.workflow.libs.SCMSourceRetriever.lambda$doRetrieve$1(SCMSourceRetriever.java:147)
             at org.jenkinsci.plugins.workflow.libs.SCMSourceRetriever.retrySCMOperation(SCMSourceRetriever.java:98)
              at org.jenkinsci.plugins.workflow.libs.SCMSourceRetriever.doRetrieve(SCMSourceRetriever.java:146)
              at org.jenkinsci.plugins.workflow.libs.SCMSourceRetriever.retrieve(SCMSourceRetriever.java:87)
              at org.jenkinsci.plugins.workflow.libs.LibraryAdder.retrieve(LibraryAdder.java:157)
              at org.jenkinsci.plugins.workflow.libs.LibraryAdder.add(LibraryAdder.java:138)
             at org.jenkinsci.plugins.workflow.libs.LibraryDecorator$1.call(LibraryDecorator.java:125)
             at org.codehaus.groovy.control.CompilationUnit.applyToPrimaryClassNodes(CompilationUnit.java:1065)
              at org.codehaus.groovy.control.CompilationUnit.doPhaseOperation(CompilationUnit.java:603)
             at org.codehaus.groovy.control.CompilationUnit.processPhaseOperations(CompilationUnit.java:581)
             at org.codehaus.groovy.control.CompilationUnit.compile(CompilationUnit.java:558)
             at groovy.lang.GroovyClassLoader.doParseClass(GroovyClassLoader.java:298)
             at groovy.lang.GroovyClassLoader.parseClass(GroovyClassLoader.java:268)
              at groovy.lang.GroovyShell.parseClass(GroovyShell.java:688)
              at groovy.lang.GroovyShell.parse(GroovyShell.java:700)
              at org.jenkinsci.plugins.workflow.cps.CpsGroovyShell.lambda$doParse$0(CpsGroovyShell.java:135)
              at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.GroovySandbox.runInSandbox(GroovySandbox.java:136)
              at org.jenkinsci.plugins.workflow.cps.CpsGroovyShell.doParse(CpsGroovyShell.java:132)
              at org.jenkinsci.plugins.workflow.cps.CpsGroovyShell.reparse(CpsGroovyShell.java:127)
              at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.parseScript(CpsFlowExecution.java:560)
              at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.start(CpsFlowExecution.java:521)
              at org.jenkinsci.plugins.workflow.job.WorkflowRun.run(WorkflowRun.java:320)
              at hudson.model.ResourceController.execute(ResourceController.java:97)
              at hudson.model.Executor.run(Executor.java:429)

             

            tsvi Tsvi Mostovicz added a comment - - edited From the support logs I found the following hint. 2019-04-02 02:42:31.157+0000 [id=10666] WARNING hudson.Proc$LocalProc#join: Process leaked file descriptors. See https://jenkins.io/redirect/troubleshooting/process-leaked-file-descriptors for more information java.lang.Exception   at hudson.Proc$LocalProc.join(Proc.java:334)   at hudson.Proc.joinWithTimeout(Proc.java:170)   at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:2311) {{   at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:2248)}}   at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:2244)  at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommand(CliGitAPIImpl.java:1777)   at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommand(CliGitAPIImpl.java:1789)  at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$9.sparseCheckout(CliGitAPIImpl.java:2675)   at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$9.execute(CliGitAPIImpl.java:2595)   at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1228)   at org.jenkinsci.plugins.workflow.steps.scm.SCMStep.checkout(SCMStep.java:120)   at org.jenkinsci.plugins.workflow.libs.SCMSourceRetriever.lambda$doRetrieve$1(SCMSourceRetriever.java:147)  at org.jenkinsci.plugins.workflow.libs.SCMSourceRetriever.retrySCMOperation(SCMSourceRetriever.java:98)   at org.jenkinsci.plugins.workflow.libs.SCMSourceRetriever.doRetrieve(SCMSourceRetriever.java:146)   at org.jenkinsci.plugins.workflow.libs.SCMSourceRetriever.retrieve(SCMSourceRetriever.java:87)   at org.jenkinsci.plugins.workflow.libs.LibraryAdder.retrieve(LibraryAdder.java:157)   at org.jenkinsci.plugins.workflow.libs.LibraryAdder.add(LibraryAdder.java:138)  at org.jenkinsci.plugins.workflow.libs.LibraryDecorator$1.call(LibraryDecorator.java:125)  at org.codehaus.groovy.control.CompilationUnit.applyToPrimaryClassNodes(CompilationUnit.java:1065)   at org.codehaus.groovy.control.CompilationUnit.doPhaseOperation(CompilationUnit.java:603)  at org.codehaus.groovy.control.CompilationUnit.processPhaseOperations(CompilationUnit.java:581)  at org.codehaus.groovy.control.CompilationUnit.compile(CompilationUnit.java:558)  at groovy.lang.GroovyClassLoader.doParseClass(GroovyClassLoader.java:298)  at groovy.lang.GroovyClassLoader.parseClass(GroovyClassLoader.java:268)   at groovy.lang.GroovyShell.parseClass(GroovyShell.java:688)   at groovy.lang.GroovyShell.parse(GroovyShell.java:700)   at org.jenkinsci.plugins.workflow.cps.CpsGroovyShell.lambda$doParse$0(CpsGroovyShell.java:135)   at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.GroovySandbox.runInSandbox(GroovySandbox.java:136)   at org.jenkinsci.plugins.workflow.cps.CpsGroovyShell.doParse(CpsGroovyShell.java:132)   at org.jenkinsci.plugins.workflow.cps.CpsGroovyShell.reparse(CpsGroovyShell.java:127)   at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.parseScript(CpsFlowExecution.java:560)   at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.start(CpsFlowExecution.java:521)   at org.jenkinsci.plugins.workflow.job.WorkflowRun.run(WorkflowRun.java:320)   at hudson.model.ResourceController.execute(ResourceController.java:97)   at hudson.model.Executor.run(Executor.java:429)  

            This appears to be due to the fact that Git SCM parses through all of the older builds when checking out.

            (See https://github.com/jenkinsci/git-plugin/blob/34fa174566716c8c6a1ab392a0cf3b5c05fc4d41/src/main/java/hudson/plugins/git/GitSCM.java#L1136)

            As we were not clearing out old builds, over time the performance hit caused by parsing over all the older builds became so huge, checkout would timeout after 1 hour.

            Clearing out old builds resolved the issue for us.

             

            Note that the File variable passed to the function was set to null, as the scm step function calling it had updateChanges set to false.

            This might be a possible enhancement to the GitSCM plugin to eventually skip calculation of the history if the changelog file passed to checkout is null.

             

            As I'm not really sure about how this might affect other parts of the code, I'm weary of implementing it.

             

            tsvi Tsvi Mostovicz added a comment - This appears to be due to the fact that Git SCM parses through all of the older builds when checking out. (See  https://github.com/jenkinsci/git-plugin/blob/34fa174566716c8c6a1ab392a0cf3b5c05fc4d41/src/main/java/hudson/plugins/git/GitSCM.java#L1136 ) As we were not clearing out old builds, over time the performance hit caused by parsing over all the older builds became so huge, checkout would timeout after 1 hour. Clearing out old builds resolved the issue for us.   Note that the File variable passed to the function was set to null, as the scm step function calling it had updateChanges set to false. This might be a possible enhancement to the GitSCM plugin to eventually skip calculation of the history if the changelog file passed to checkout is null.   As I'm not really sure about how this might affect other parts of the code, I'm weary of implementing it.  

            markewaite please see the comment I added. I solved the issue for us, I wonder though who should I tag for the enhancement I proposed. (I don't mind putting in a PR myself, but I have never developed for Jenkins so I'll need some handholding)

            tsvi Tsvi Mostovicz added a comment - markewaite please see the comment I added. I solved the issue for us, I wonder though who should I tag for the enhancement I proposed. (I don't mind putting in a PR myself, but I have never developed for Jenkins so I'll need some handholding)
            markewaite Mark Waite added a comment -

            I suspect this is related to JENKINS-19022, git plugin mistakenly retains list of SHA1's of all preceding builds in later builds, bloating memory use. We've attempted at least 3 different times to resolve JENKINS-19022, each time abandoning due to incompatibilities that are introduced by the proposed changes.

            Reducing the number of builds retained in history is the most direct workaround for the problem. Other workarounds exist as well, like groovy scripts that will remove old BuildData records.

            Unless you're ready for months and months of work to implement the fix without causing compatibility problems, I'd recommend you prefer retaining fewer builds rather than attempting to make a code change in the git plugin for this case.

            markewaite Mark Waite added a comment - I suspect this is related to JENKINS-19022 , git plugin mistakenly retains list of SHA1's of all preceding builds in later builds, bloating memory use. We've attempted at least 3 different times to resolve JENKINS-19022 , each time abandoning due to incompatibilities that are introduced by the proposed changes. Reducing the number of builds retained in history is the most direct workaround for the problem. Other workarounds exist as well, like groovy scripts that will remove old BuildData records. Unless you're ready for months and months of work to implement the fix without causing compatibility problems, I'd recommend you prefer retaining fewer builds rather than attempting to make a code change in the git plugin for this case.

            Based on my read of JENKINS-19022, it seems this is caused by the same issue

            tsvi Tsvi Mostovicz added a comment - Based on my read of JENKINS-19022 , it seems this is caused by the same issue

            People

              Unassigned Unassigned
              tsvi Tsvi Mostovicz
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: