Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-32767

GitSCMSource can hang in branch indexing, blocking builds

      Hi,
      Yesterday our corporate Git repo server had an unavailability. The machine was stuck or something, so somehow accepting connections but never responding.

      Since then (for 13 hours as of writing), the pipeline multibranch `Branch Indexing` process has been "building" (stuck actually) on the master.

      The thing is: I've been unable to kill this build. Looking at the stack trace, it's certainly because of the classical Java IO issue: waiting on a socket, so uninterruptible?...

      Anyway, another bad thing is that as there seems to be a shared lock between branch jobs of that multibranch pipeline, I can't even manually start a build on the modified branch.

      So my current solution is I'm gonna have to wait for a calmer moment and restart the whole instance. But it's not really possible in the work day.

      Maybe a solution/workaround like for the `Hard kill` issue could be introduced to try and workaround that rare, but very annoying since as explained the build won't even start manually triggered.

      Gonna attach some thread dumps.

      If you need anything else, just let me know.

      Thanks for that wonderful tool!

          [JENKINS-32767] GitSCMSource can hang in branch indexing, blocking builds

          Jesse Glick added a comment -

          Specifically

          "Executor #-1 for master : executing jenkins.branch.MultiBranchProject$BranchIndexing@…" … state=RUNNABLE cpu=0% (running in native)
              at java.net.SocketInputStream.socketRead0(Native Method)
              at …
              at org.eclipse.jgit.util.IO.readFully(IO.java:246)
              at org.eclipse.jgit.transport.PacketLineIn.readLength(PacketLineIn.java:186)
              at org.eclipse.jgit.transport.PacketLineIn.readString(PacketLineIn.java:138)
              at org.eclipse.jgit.transport.BasePackConnection.readAdvertisedRefsImpl(BasePackConnection.java:195)
              at org.eclipse.jgit.transport.BasePackConnection.readAdvertisedRefs(BasePackConnection.java:176)
              at org.eclipse.jgit.transport.TransportGitAnon$TcpFetchConnection.<init>(TransportGitAnon.java:194)
              at org.eclipse.jgit.transport.TransportGitAnon.openFetch(TransportGitAnon.java:120)
              at org.eclipse.jgit.transport.FetchProcess.executeImp(FetchProcess.java:136)
              at org.eclipse.jgit.transport.FetchProcess.execute(FetchProcess.java:122)
              at org.eclipse.jgit.transport.Transport.fetch(Transport.java:1138)
              at org.eclipse.jgit.api.FetchCommand.call(FetchCommand.java:130)
              at org.jenkinsci.plugins.gitclient.JGitAPIImpl.fetch(JGitAPIImpl.java:674)
              at jenkins.plugins.git.AbstractGitSCMSource.retrieve(AbstractGitSCMSource.java:171)
              at jenkins.scm.api.SCMSource.fetch(SCMSource.java:143)
              at jenkins.branch.MultiBranchProject.computeChildren(MultiBranchProject.java:295)
              at com.cloudbees.hudson.plugins.folder.computed.ComputedFolder.updateChildren(ComputedFolder.java:151)
              at com.cloudbees.hudson.plugins.folder.computed.FolderComputation.run(FolderComputation.java:106)
              at hudson.model.ResourceController.execute(ResourceController.java:98)
              at hudson.model.Executor.run(Executor.java:408)
          

          The immediate problem I see is that socket reads in the Oracle JRE are not interruptible, so using a native SCM client is a bad idea, unless that client is very careful to impose read timeouts on all connections, which apparently the JGit implementation is failing to do. (This code is not calling setSoTimeout.) Would be better to use the CLI Git implementation (cf. JENKINS-31924).

          Other builds are blocked on the cache lock:

          "Executor #-1 for master : executing something/master #123" … state=WAITING cpu=86%
              - waiting on <…> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
              - locked <…> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
                owned by "Executor #-1 for master : executing jenkins.branch.MultiBranchProject$BranchIndexing@…" …
              at …
              at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
              at jenkins.plugins.git.AbstractGitSCMSource.retrieve(AbstractGitSCMSource.java:120)
              at jenkins.scm.api.SCMSource.fetch(SCMSource.java:233)
              at org.jenkinsci.plugins.workflow.multibranch.SCMBinder.create(SCMBinder.java:77)
              at org.jenkinsci.plugins.workflow.job.WorkflowRun.run(WorkflowRun.java:206)
              at hudson.model.ResourceController.execute(ResourceController.java:98)
              at hudson.model.Executor.run(Executor.java:408)
          

          This code should be using lockInterruptibly, I think.

          Jesse Glick added a comment - Specifically "Executor #-1 for master : executing jenkins.branch.MultiBranchProject$BranchIndexing@…" … state=RUNNABLE cpu=0% (running in native) at java.net.SocketInputStream.socketRead0(Native Method) at … at org.eclipse.jgit.util.IO.readFully(IO.java:246) at org.eclipse.jgit.transport.PacketLineIn.readLength(PacketLineIn.java:186) at org.eclipse.jgit.transport.PacketLineIn.readString(PacketLineIn.java:138) at org.eclipse.jgit.transport.BasePackConnection.readAdvertisedRefsImpl(BasePackConnection.java:195) at org.eclipse.jgit.transport.BasePackConnection.readAdvertisedRefs(BasePackConnection.java:176) at org.eclipse.jgit.transport.TransportGitAnon$TcpFetchConnection.<init>(TransportGitAnon.java:194) at org.eclipse.jgit.transport.TransportGitAnon.openFetch(TransportGitAnon.java:120) at org.eclipse.jgit.transport.FetchProcess.executeImp(FetchProcess.java:136) at org.eclipse.jgit.transport.FetchProcess.execute(FetchProcess.java:122) at org.eclipse.jgit.transport.Transport.fetch(Transport.java:1138) at org.eclipse.jgit.api.FetchCommand.call(FetchCommand.java:130) at org.jenkinsci.plugins.gitclient.JGitAPIImpl.fetch(JGitAPIImpl.java:674) at jenkins.plugins.git.AbstractGitSCMSource.retrieve(AbstractGitSCMSource.java:171) at jenkins.scm.api.SCMSource.fetch(SCMSource.java:143) at jenkins.branch.MultiBranchProject.computeChildren(MultiBranchProject.java:295) at com.cloudbees.hudson.plugins.folder.computed.ComputedFolder.updateChildren(ComputedFolder.java:151) at com.cloudbees.hudson.plugins.folder.computed.FolderComputation.run(FolderComputation.java:106) at hudson.model.ResourceController.execute(ResourceController.java:98) at hudson.model.Executor.run(Executor.java:408) The immediate problem I see is that socket reads in the Oracle JRE are not interruptible, so using a native SCM client is a bad idea, unless that client is very careful to impose read timeouts on all connections, which apparently the JGit implementation is failing to do. ( This code is not calling setSoTimeout .) Would be better to use the CLI Git implementation (cf. JENKINS-31924 ). Other builds are blocked on the cache lock: "Executor #-1 for master : executing something/master #123" … state=WAITING cpu=86% - waiting on <…> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) - locked <…> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) owned by "Executor #-1 for master : executing jenkins.branch.MultiBranchProject$BranchIndexing@…" … at … at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at jenkins.plugins.git.AbstractGitSCMSource.retrieve(AbstractGitSCMSource.java:120) at jenkins.scm.api.SCMSource.fetch(SCMSource.java:233) at org.jenkinsci.plugins.workflow.multibranch.SCMBinder.create(SCMBinder.java:77) at org.jenkinsci.plugins.workflow.job.WorkflowRun.run(WorkflowRun.java:206) at hudson.model.ResourceController.execute(ResourceController.java:98) at hudson.model.Executor.run(Executor.java:408) This code should be using lockInterruptibly , I think.

          Mark Waite added a comment -

          I'm not sure why the AbstractGitSCMSource is choosing JGit as the implementation, since command line git should be the default implementation for the git client plugin and the git plugin.

          Mark Waite added a comment - I'm not sure why the AbstractGitSCMSource is choosing JGit as the implementation, since command line git should be the default implementation for the git client plugin and the git plugin.

          If the Default Git installation is removed and only jgit is added to the list (Manage Jenkins » Tools Config), then JGit is used everywhere (didn't check the code, but that's what I experienced some time ago).

          Antonio Muñiz added a comment - If the Default Git installation is removed and only jgit is added to the list (Manage Jenkins » Tools Config), then JGit is used everywhere (didn't check the code, but that's what I experienced some time ago).

          Mark Waite added a comment - - edited

          Pull request 424 allows AbstractGitSCMSource to use command line git. That should resolve this issue for users that are willing to use the command line git implementation. The command line git implementation is the definitive implementation of the git client API. There are very few things which the JGit API can do that the command line git API can't do. There are quite a few things which the command line git API can do that the JGit API cannot do.

          The command line git implementation in AbstractGitSCMSource has been merged to the master branch and will be included in the next release of the git plugin.

          The git client plugin will switch from using JGit 3.7 to JGit 4.4 when git client plugin 2.0.0 is released (likely before end of September 2016). I don't know if the blocking behavior has been changed in JGit 4.4 and will rely on the experience of others to decide if the problem is resolved with JGit 4.4.

          Mark Waite added a comment - - edited Pull request 424 allows AbstractGitSCMSource to use command line git. That should resolve this issue for users that are willing to use the command line git implementation. The command line git implementation is the definitive implementation of the git client API. There are very few things which the JGit API can do that the command line git API can't do. There are quite a few things which the command line git API can do that the JGit API cannot do. The command line git implementation in AbstractGitSCMSource has been merged to the master branch and will be included in the next release of the git plugin. The git client plugin will switch from using JGit 3.7 to JGit 4.4 when git client plugin 2.0.0 is released (likely before end of September 2016). I don't know if the blocking behavior has been changed in JGit 4.4 and will rely on the experience of others to decide if the problem is resolved with JGit 4.4.

          Mark Waite added a comment -

          That pull request has been merged. Git plugin 2.6.0 will include the change to command line git which provides at least one path to resolve this (use git plugin 2.6.0 with its default implementation).

          Mark Waite added a comment - That pull request has been merged. Git plugin 2.6.0 will include the change to command line git which provides at least one path to resolve this (use git plugin 2.6.0 with its default implementation).

          Mark Waite added a comment -

          Included in git plugin 2.6.0, released 2 Sep 2016, at least by virtue of command line git being made the default implementation for the multi-branch plugin.

          Mark Waite added a comment - Included in git plugin 2.6.0 , released 2 Sep 2016, at least by virtue of command line git being made the default implementation for the multi-branch plugin.

          David Taylor added a comment -

          Similar errors are still popping up in 2018, no resolution yet:

          https://issues.jenkins-ci.org/browse/JENKINS-48933?attachmentViewMode=list

          David Taylor added a comment - Similar errors are still popping up in 2018, no resolution yet: https://issues.jenkins-ci.org/browse/JENKINS-48933?attachmentViewMode=list

            markewaite Mark Waite
            batmat Baptiste Mathus
            Votes:
            2 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: