Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-25353

Git operations fail due to "dead", but lingering lock file

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Major Major
    • git-client-plugin
    • None

      Time to time I see the following error. I suppose, it can be caused by job abort in git checkout stage.

      java.io.IOException: Could not checkout 154d73b8218b3a4c0db7808853565ca5ed0b8999
      	at hudson.plugins.git.GitSCM.checkout(GitSCM.java:966)
      	at hudson.model.AbstractProject.checkout(AbstractProject.java:1253)
      	at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:622)
      	at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
      	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:528)
      	at hudson.model.Run.execute(Run.java:1745)
      	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
      	at hudson.model.ResourceController.execute(ResourceController.java:89)
      	at hudson.model.Executor.run(Executor.java:240)
      Caused by: hudson.plugins.git.GitLockFailedException: Could not lock repository. Please try again
      	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$8.execute(CliGitAPIImpl.java:1619)
      	at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1.call(RemoteGitImpl.java:153)
      	at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1.call(RemoteGitImpl.java:146)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:118)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:48)
      	at hudson.remoting.Request$2.run(Request.java:328)
      	at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
      	at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
      	at java.util.concurrent.FutureTask.run(Unknown Source)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
      	at hudson.remoting.Engine$1$1.run(Engine.java:63)
      	at java.lang.Thread.run(Unknown Source)
      Caused by: hudson.plugins.git.GitException: Command "C:\Program Files (x86)\Git\bin\git.exe checkout -f 154d73b8218b3a4c0db7808853565ca5ed0b8999" returned status code 128:
      stdout: 
      stderr: fatal: Unable to create 'e:/jenkins/workspace/xxx/.git/index.lock': File exists.
      
      If no other git process is currently running, this probably means a
      git process crashed in this repository earlier. Make sure no other git
      process is running and remove the file manually to continue.
      
      	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1437)
      	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.access$500(CliGitAPIImpl.java:87)
      	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$8.execute(CliGitAPIImpl.java:1616)
      	... 12 more
      

          [JENKINS-25353] Git operations fail due to "dead", but lingering lock file

          Daniel Beck added a comment -

          What exactly is the bug here? I mean, it even says

          If no other git process is currently running, this probably means a
          git process crashed in this repository earlier. Make sure no other git
          process is running and remove the file manually to continue.

          This does not look like something that could be resolved automatically.

          Daniel Beck added a comment - What exactly is the bug here? I mean, it even says If no other git process is currently running, this probably means a git process crashed in this repository earlier. Make sure no other git process is running and remove the file manually to continue. This does not look like something that could be resolved automatically.

          My understanding, is that Jenkins prevents multiple builds in the same workspace. So, we can be sure, that if new build is triggered, no other git processes should access the workspace, and .git repository as well. And I believe, that no other users are interacting git at Jenkins slave.

          My suggestion is to unlock the git repository (removing .git/git.lock) file before checkout.

          Pavel Baranchikov added a comment - My understanding, is that Jenkins prevents multiple builds in the same workspace. So, we can be sure, that if new build is triggered, no other git processes should access the workspace, and .git repository as well. And I believe, that no other users are interacting git at Jenkins slave. My suggestion is to unlock the git repository (removing .git/git.lock) file before checkout.

          Daniel Beck added a comment -

          My understanding, is that Jenkins prevents multiple builds in the same workspace

          Only if the workspace is not a custom workspace (and details like this should be transparent to the SCM).

          And I believe, that no other users are interacting git at Jenkins slave.

          Given how many weird setups I've seen, I don't think this is necessarily universal.


          That said, it may be possible to make sure the file is deleted when aborting a build during a Git operation (if it's ensured Git gets properly killed).

          Daniel Beck added a comment - My understanding, is that Jenkins prevents multiple builds in the same workspace Only if the workspace is not a custom workspace (and details like this should be transparent to the SCM). And I believe, that no other users are interacting git at Jenkins slave. Given how many weird setups I've seen, I don't think this is necessarily universal. That said, it may be possible to make sure the file is deleted when aborting a build during a Git operation (if it's ensured Git gets properly killed).

          A agree with the approach to force unlocking git repository on job aborting.
          For today, I've seen this error 3 times on my build server.

          Pavel Baranchikov added a comment - A agree with the approach to force unlocking git repository on job aborting. For today, I've seen this error 3 times on my build server.

          Mark Waite added a comment - - edited

          Reliably removing the lock file and not harming some other use model seems complicated and risky. If a custom workspace is involved, there may be multiple git processes operating in the directory. If a typical workspace is involved, there may be build steps running which start a git process and expect the build step to complete before the git process has completed.

          What if we removed lock files as part of the optional "clean" step? Would that have resolved the case you detected? Would you have been willing to clean the workspace to remove the lock file?

          Wiping the workspace should be one way to work around this problem, since wiping the workspace will reconstruct the git repository from a freshly cloned copy.

          Mark Waite added a comment - - edited Reliably removing the lock file and not harming some other use model seems complicated and risky. If a custom workspace is involved, there may be multiple git processes operating in the directory. If a typical workspace is involved, there may be build steps running which start a git process and expect the build step to complete before the git process has completed. What if we removed lock files as part of the optional "clean" step? Would that have resolved the case you detected? Would you have been willing to clean the workspace to remove the lock file? Wiping the workspace should be one way to work around this problem, since wiping the workspace will reconstruct the git repository from a freshly cloned copy.

          I have no objections on separate "clean" step. For now, I've set "wipe out repository and force clean", but it take too much time to recustruct

          Pavel Baranchikov added a comment - I have no objections on separate "clean" step. For now, I've set "wipe out repository and force clean", but it take too much time to recustruct

          I tried on my local PC (openSuSE) and found that index.lock only left in the .git directory, when git is killed with SIGKILL. But there is not index.lock file when killing with SIGTERM.

          The operation, causing the index.lock file creation is

          git checkout
          

          Pavel Baranchikov added a comment - I tried on my local PC (openSuSE) and found that index.lock only left in the .git directory, when git is killed with SIGKILL. But there is not index.lock file when killing with SIGTERM. The operation, causing the index.lock file creation is git checkout

          Also, the issue reproduces on Windows only - just abort git checkout operation using CliGit

          Pavel Baranchikov added a comment - Also, the issue reproduces on Windows only - just abort git checkout operation using CliGit

          My suggestion is to extract InterruptedException from org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(ArgumentListBuilder, File, EnvVars, Integer) method call and peform cleanup (removing index.lock file) in org.jenkinsci.plugins.gitclient.CliGitAPIImpl.checkout().new CheckoutCommand()

          {...}

          .execute() method.

          I suggest to add new Exception - GitInterruptedException, which should indicate, that git operation is interrupted and reuse its information to remove lock file.

          What's your opinion?

          Pavel Baranchikov added a comment - My suggestion is to extract InterruptedException from org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(ArgumentListBuilder, File, EnvVars, Integer) method call and peform cleanup (removing index.lock file) in org.jenkinsci.plugins.gitclient.CliGitAPIImpl.checkout().new CheckoutCommand() {...} .execute() method. I suggest to add new Exception - GitInterruptedException, which should indicate, that git operation is interrupted and reuse its information to remove lock file. What's your opinion?

          Mark Waite added a comment -

          I think the extraction you propose makes sense.

          I'm not sure why we need a new exception. Isn't the InterruptedException already enough to detect this case, without adding a new Exception? Can you explain further?

          Mark Waite added a comment - I think the extraction you propose makes sense. I'm not sure why we need a new exception. Isn't the InterruptedException already enough to detect this case, without adding a new Exception? Can you explain further?

          InterruptedException is catched in method org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(ArgumentListBuilder, File, EnvVars, Integer)
          There it is wrapped in GitException to add some message. Nevertheless, the method is declared to throw InterruptedException. So, it whould be no problem to make the following change

          diff --git a/src/main/java/org/jenkinsci/plugins/gitclient/CliGitAPIImpl.java b/src/main/java/org/jenkinsci/plugins/gitclient/CliGitAPIImpl.java
          index c55806e..3fdd2ca 100644
          --- a/src/main/java/org/jenkinsci/plugins/gitclient/CliGitAPIImpl.java
          +++ b/src/main/java/org/jenkinsci/plugins/gitclient/CliGitAPIImpl.java
          @@ -1447,6 +1447,8 @@ public class CliGitAPIImpl extends LegacyCompatibleGitAPIImpl {
                       throw e;
                   } catch (IOException e) {
                       throw new GitException("Error performing command: " + command, e);
          +        } catch (InterruptedException e) {
          +            throw e;
                   } catch (Throwable t) {
                       throw new GitException("Error performing git command", t);
                   }
          

          Do you prefer this approach instead of creating separate exception class?

          Pavel Baranchikov added a comment - InterruptedException is catched in method org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(ArgumentListBuilder, File, EnvVars, Integer) There it is wrapped in GitException to add some message. Nevertheless, the method is declared to throw InterruptedException. So, it whould be no problem to make the following change diff --git a/src/main/java/org/jenkinsci/plugins/gitclient/CliGitAPIImpl.java b/src/main/java/org/jenkinsci/plugins/gitclient/CliGitAPIImpl.java index c55806e..3fdd2ca 100644 --- a/src/main/java/org/jenkinsci/plugins/gitclient/CliGitAPIImpl.java +++ b/src/main/java/org/jenkinsci/plugins/gitclient/CliGitAPIImpl.java @@ -1447,6 +1447,8 @@ public class CliGitAPIImpl extends LegacyCompatibleGitAPIImpl { throw e; } catch (IOException e) { throw new GitException( "Error performing command: " + command, e); + } catch (InterruptedException e) { + throw e; } catch (Throwable t) { throw new GitException( "Error performing git command" , t); } Do you prefer this approach instead of creating separate exception class?

          Mark Waite added a comment -

          I think throwing InterruptedException is the better approach. Would one of the callers then be expected to handle the InterruptedException and remove the lock file if it is found?

          Mark Waite added a comment - I think throwing InterruptedException is the better approach. Would one of the callers then be expected to handle the InterruptedException and remove the lock file if it is found?

          Yes, I suggest to perfrom this in org.jenkinsci.plugins.gitclient.CliGitAPIImpl.checkout().new CheckoutCommand()

          {...}

          .execute(), as checkout is the command, I've found creating index.lock file.

          Pavel Baranchikov added a comment - Yes, I suggest to perfrom this in org.jenkinsci.plugins.gitclient.CliGitAPIImpl.checkout().new CheckoutCommand() {...} .execute(), as checkout is the command, I've found creating index.lock file.

          Mark Waite added a comment -

          Will be included in next git-client-plugin release after 1.12.0

          Mark Waite added a comment - Will be included in next git-client-plugin release after 1.12.0

          Code changed in jenkins
          User: Pavel Baranchikov
          Path:
          src/main/java/org/jenkinsci/plugins/gitclient/CliGitAPIImpl.java
          http://jenkins-ci.org/commit/git-client-plugin/70a8747dbb2d7d11627e1620e783b383f959454f
          Log:
          JENKINS-25353: Cleaned up index.lock file when checkout is aborted

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Pavel Baranchikov Path: src/main/java/org/jenkinsci/plugins/gitclient/CliGitAPIImpl.java http://jenkins-ci.org/commit/git-client-plugin/70a8747dbb2d7d11627e1620e783b383f959454f Log: JENKINS-25353 : Cleaned up index.lock file when checkout is aborted

          Code changed in jenkins
          User: Mark Waite
          Path:
          src/main/java/org/jenkinsci/plugins/gitclient/CliGitAPIImpl.java
          src/test/java/org/jenkinsci/plugins/gitclient/GitAPITestCase.java
          http://jenkins-ci.org/commit/git-client-plugin/cfe6e69eb59d3e5349a9cb348368994d7517c83e
          Log:
          Test index.lock removal - JENKINS-25353

          Compare: https://github.com/jenkinsci/git-client-plugin/compare/f9a5a2ccb6a5...cfe6e69eb59d

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Mark Waite Path: src/main/java/org/jenkinsci/plugins/gitclient/CliGitAPIImpl.java src/test/java/org/jenkinsci/plugins/gitclient/GitAPITestCase.java http://jenkins-ci.org/commit/git-client-plugin/cfe6e69eb59d3e5349a9cb348368994d7517c83e Log: Test index.lock removal - JENKINS-25353 Compare: https://github.com/jenkinsci/git-client-plugin/compare/f9a5a2ccb6a5...cfe6e69eb59d

          Mark Waite added a comment -

          Fixed in git-client-plugin 1.13.0 released 18 Dec 2014

          Mark Waite added a comment - Fixed in git-client-plugin 1.13.0 released 18 Dec 2014

          Jan Hudec added a comment -

          It still happens with 1.16.1 when the job is cancelled during Git operation.

          Jan Hudec added a comment - It still happens with 1.16.1 when the job is cancelled during Git operation.

          Ron MacNeil added a comment - - edited

          Could we just have an optional "clean lock file(s) before checking out" additional behaviour for the Git SCM plugin please? I think that's exactly what folks are looking for.

          Presently, this can be emulated with a pre-checkout "del /f /s index.lock" (or similar) via the pre-checkout-buildstep plugin.

          I appreciate the cleverness of the clean-up-on-exception idea but, as Pavel points out, it doesn't work in all cases and I, for one, would like something that works in all cases.

          Ron MacNeil added a comment - - edited Could we just have an optional "clean lock file(s) before checking out" additional behaviour for the Git SCM plugin please? I think that's exactly what folks are looking for. Presently, this can be emulated with a pre-checkout "del /f /s index.lock" (or similar) via the pre-checkout-buildstep plugin. I appreciate the cleverness of the clean-up-on-exception idea but, as Pavel points out , it doesn't work in all cases and I, for one, would like something that works in all cases.

          Mark Waite added a comment -

          I don't think an optional "clean lock file(s) before checkout out" will fix the problem completely for Windows. If the git process which creates the lock file (on windows) is still running, then the file cannot be deleted due to the Windows file system rules which will not allow open files to be deleted.

          Mark Waite added a comment - I don't think an optional "clean lock file(s) before checkout out" will fix the problem completely for Windows. If the git process which creates the lock file (on windows) is still running, then the file cannot be deleted due to the Windows file system rules which will not allow open files to be deleted.

          Ron MacNeil added a comment -

          True, but if we assume the common case where jobs have their own isolated workspaces, then the only place a conflicting Git process could come from would be a previous run of the same job. For that to happen, the process would have to have somehow escaped Jenkins' process tracking/killing behaviour, something we've never seen in our particular environment and, I submit, is unlikely in general.

          On the other hand, we have evidence that folks are still coming up against abandoned lock files, whether due to processes being KILL'ed or otherwise. So the optional "clean locks" behaviour would seem to at least put those folks in a better place than they're in now.

          Either way, thank you for your work on Jenkins, much appreciated.

          Ron MacNeil added a comment - True, but if we assume the common case where jobs have their own isolated workspaces, then the only place a conflicting Git process could come from would be a previous run of the same job. For that to happen, the process would have to have somehow escaped Jenkins' process tracking/killing behaviour, something we've never seen in our particular environment and, I submit, is unlikely in general. On the other hand, we have evidence that folks are still coming up against abandoned lock files, whether due to processes being KILL'ed or otherwise. So the optional "clean locks" behaviour would seem to at least put those folks in a better place than they're in now. Either way, thank you for your work on Jenkins, much appreciated.

          Jan Hudec added a comment -

          On Windows the lock file can't be deleted while still held, which is what makes unconditional removal safe. Unless somebody is messing up with the Jenkins workspaces by hand, there is no process running in them when build starts, so it still does fix the issue.

          Jan Hudec added a comment - On Windows the lock file can't be deleted while still held, which is what makes unconditional removal safe . Unless somebody is messing up with the Jenkins workspaces by hand, there is no process running in them when build starts, so it still does fix the issue.

          Hi guys, so I manage to reproduce a posible cause for this bug (there could be other cases also):
          I am certain the build triggers are one of the problems: "pull scm" and "build periodically" collide. So the pull scm will work ok but when build periodically comes, the build is going to fail with the lock issue.
          Replicate: have pull scm at 5 minute and build periodically at 5 minute and look what happens.
          Note: I am using the latest plugins (as of 22 mar 2016).

          Nicolae Dragos Sava added a comment - Hi guys, so I manage to reproduce a posible cause for this bug (there could be other cases also): I am certain the build triggers are one of the problems: "pull scm" and "build periodically" collide. So the pull scm will work ok but when build periodically comes, the build is going to fail with the lock issue. Replicate: have pull scm at 5 minute and build periodically at 5 minute and look what happens. Note: I am using the latest plugins (as of 22 mar 2016).

          Is there any progress on this?

          matthew giardina added a comment - Is there any progress on this?

          Mark Waite added a comment - - edited

          No, and not likely to be any progress in the near term.  

          You can install the "pre-scm build" plugin and and a pre-scm step to unconditionally remove lock files in the .git directory if you're seeing this frequently enough to justify changing your job definition.

          Mark Waite added a comment - - edited No, and not likely to be any progress in the near term.   You can install the "pre-scm build" plugin and and a pre-scm step to unconditionally remove lock files in the .git directory if you're seeing this frequently enough to justify changing your job definition.

          trejkaz added a comment -

          Copying from JENKINS-47652, commands leading up to the error were:

          [macosx]  > git rev-parse --is-inside-work-tree # timeout=10
          [macosx]  > git config remote.origin.url git@_git-server_:_app_/core.git # timeout=10
          [macosx]  > git --version # timeout=10
          [macosx]  > git fetch --tags --progress git@_git-server_:_app_/core.git +refs/pull/50/head:refs/remotes/origin/PR-50 +refs/heads/master:refs/remotes/origin/master --prune # timeout=40
          [macosx]  > git config core.sparsecheckout # timeout=10
          [macosx]  > git checkout -f 72d57d4d7854d9b2b9eb78e0fe158869b3f04f80 # timeout=30
          [macosx]  > git merge 0abbc94cdff38bfe76dad45562011f6f703143ce # timeout=10
          [macosx]  > git config core.sparsecheckout # timeout=10
          [macosx]  > git checkout -f 72d57d4d7854d9b2b9eb78e0fe158869b3f04f80 # timeout=30
          

          The sequence of commands seems odd to me, but surely it wasn't locked before SCM commands started, or the first `git checkout` would have failed, so I'm not sure a pre-SCM step would work either, unless there is a way to do it half way through the SCM step? Or are we saying that this log represents multiple SCM steps, so the pre would run before each? (Jenkins' logs make what's being done very unclear.)

          trejkaz added a comment - Copying from JENKINS-47652 , commands leading up to the error were: [macosx] > git rev-parse --is-inside-work-tree # timeout=10 [macosx] > git config remote.origin.url git@_git-server_:_app_/core.git # timeout=10 [macosx] > git --version # timeout=10 [macosx] > git fetch --tags --progress git@_git-server_:_app_/core.git +refs/pull/50/head:refs/remotes/origin/PR-50 +refs/heads/master:refs/remotes/origin/master --prune # timeout=40 [macosx] > git config core.sparsecheckout # timeout=10 [macosx] > git checkout -f 72d57d4d7854d9b2b9eb78e0fe158869b3f04f80 # timeout=30 [macosx] > git merge 0abbc94cdff38bfe76dad45562011f6f703143ce # timeout=10 [macosx] > git config core.sparsecheckout # timeout=10 [macosx] > git checkout -f 72d57d4d7854d9b2b9eb78e0fe158869b3f04f80 # timeout=30 The sequence of commands seems odd to me, but surely it wasn't locked before SCM commands started, or the first `git checkout` would have failed, so I'm not sure a pre-SCM step would work either, unless there is a way to do it half way through the SCM step? Or are we saying that this log represents multiple SCM steps, so the pre would run before each? (Jenkins' logs make what's being done very unclear.)

            ndeloof Nicolas De Loof
            pbaranchikov Pavel Baranchikov
            Votes:
            8 Vote for this issue
            Watchers:
            15 Start watching this issue

              Created:
              Updated:
              Resolved: