Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-48258

git client plugin occasionally fails with "text file busy" error

    XMLWordPrintable

Details

    Description

      Occasionally I get the following git error early in some of my pipeline builds, but have not been able to reproduce the issue during testing.

      Perhaps the createUnixGitSSH method in CliGitAPIImpl.java should explicitly close the PrintWriter before returning?  Or maybe it just needs to flush the stream, and the script should be executed as "sh tmpfile.sh" instead of "./tmpfile.sh" ?

      using GIT_SSH to set credentials 
       > git ls-remote -h -t git.example.org:/var/git/foo.git # timeout=10
      hudson.plugins.git.GitException: Command "git ls-remote -h -t git.example.org:/var/git/foo.git" returned status code 128:
      stdout: 
      stderr: fatal: cannot exec '/tmp/ssh4044675960910064496.sh': Text file busy
      fatal: unable to fork
      
      	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1970)
      	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandWithCredentials(CliGitAPIImpl.java:1689)
      	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandWithCredentials(CliGitAPIImpl.java:1600)
      	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandWithCredentials(CliGitAPIImpl.java:1591)
      	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.getRemoteReferences(CliGitAPIImpl.java:2785)
      	at jenkins.plugins.git.AbstractGitSCMSource.retrieve(AbstractGitSCMSource.java:708)
      	at jenkins.scm.api.SCMSource.fetch(SCMSource.java:598)
      	at org.jenkinsci.plugins.workflow.libs.SCMSourceRetriever.retrieve(SCMSourceRetriever.java:80)
      	at org.jenkinsci.plugins.workflow.libs.LibraryAdder.retrieve(LibraryAdder.java:153)
      	at org.jenkinsci.plugins.workflow.libs.LibraryAdder.add(LibraryAdder.java:134)
      	at org.jenkinsci.plugins.workflow.libs.LibraryDecorator$1.call(LibraryDecorator.java:125)
      	at org.codehaus.groovy.control.CompilationUnit.applyToPrimaryClassNodes(CompilationUnit.java:1065)
      	at org.codehaus.groovy.control.CompilationUnit.doPhaseOperation(CompilationUnit.java:603)
      	at org.codehaus.groovy.control.CompilationUnit.processPhaseOperations(CompilationUnit.java:581)
      	at org.codehaus.groovy.control.CompilationUnit.compile(CompilationUnit.java:558)
      	at groovy.lang.GroovyClassLoader.doParseClass(GroovyClassLoader.java:298)
      	at groovy.lang.GroovyClassLoader.parseClass(GroovyClassLoader.java:268)
      	at groovy.lang.GroovyShell.parseClass(GroovyShell.java:688)
      	at groovy.lang.GroovyShell.parse(GroovyShell.java:700)
      	at org.jenkinsci.plugins.workflow.cps.CpsGroovyShell.doParse(CpsGroovyShell.java:129)
      	at org.jenkinsci.plugins.workflow.cps.CpsGroovyShell.reparse(CpsGroovyShell.java:123)
      	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.parseScript(CpsFlowExecution.java:517)
      	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.start(CpsFlowExecution.java:480)
      	at org.jenkinsci.plugins.workflow.job.WorkflowRun.run(WorkflowRun.java:269)
      	at hudson.model.ResourceController.execute(ResourceController.java:97)
      	at hudson.model.Executor.run(Executor.java:421)
      org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed:
      WorkflowScript: Loading libraries failed
      
      1 error
      

      Attachments

        Activity

          markewaite Mark Waite added a comment -

          The "try with resources" technique used in CliGitAPIImpl has been more reliable (for me at least) than explicit calls to close files. It was used to resolve one or more file handle leaks a few years ago.

          The command that is being called by the plugin is git ls-remote -h -t git.example.org:/var/git/foo.git. The plugin does not execute that shell script directly, but relies on git to execute the shell script. The environment variables set prior to that call refer to the temporary file which was written (and closed) prior to the call of the git command.

          If you find a repeatable way to see the problem, please update the bug report with that information.

          markewaite Mark Waite added a comment - The " try with resources " technique used in CliGitAPIImpl has been more reliable (for me at least) than explicit calls to close files. It was used to resolve one or more file handle leaks a few years ago. The command that is being called by the plugin is git ls-remote -h -t git.example.org:/var/git/foo.git . The plugin does not execute that shell script directly, but relies on git to execute the shell script. The environment variables set prior to that call refer to the temporary file which was written (and closed) prior to the call of the git command. If you find a repeatable way to see the problem, please update the bug report with that information.

          Thanks for the tip, I now suspect that this is not a jenkins issue, but a git on NFS issue in my cloud setup.  I created a tmpfs for /tmp in the meantime while I investigate.

           

          Marking this closed.

          mbmop Mostyn Bramley-Moore added a comment - Thanks for the tip, I now suspect that this is not a jenkins issue, but a git on NFS issue in my cloud setup.  I created a tmpfs for /tmp in the meantime while I investigate.   Marking this closed.
          magnusbaeck Magnus Bäck added a comment - - edited

          FWIW one of our users just experienced a very similar error; just a different path and a for a fetch operation rather than ls-remote. Running Jenkins 2.60.3 with 2.5.0 of the git client plugin. No NFS shares are involved in our case, just plain ext4 partitions mounted into the Docker container where Jenkins runs.

          git fetch --tags --progress origin +refs/heads/:refs/remotes/origin/
           hudson.plugins.git.GitException: Command "git fetch --tags --progress origin +refs/heads/:refs/remotes/origin/" returned status code 128:
           stdout: 
           stderr: fatal: cannot exec '/srv/jenkins/caches/git-a67f10a9b25be4c33f1701c0caff217e@tmp/ssh4168691036775888323.sh': Text file busy
           fatal: unable to fork
          at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1924)
           at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandWithCredentials(CliGitAPIImpl.java:1643)
           at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.access$300(CliGitAPIImpl.java:71)
           at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$1.execute(CliGitAPIImpl.java:352)
           at jenkins.plugins.git.AbstractGitSCMSource.doRetrieve(AbstractGitSCMSource.java:344)
           at jenkins.plugins.git.AbstractGitSCMSource.retrieve(AbstractGitSCMSource.java:524)
           at jenkins.scm.api.SCMSource.fetch(SCMSource.java:598)
           at org.jenkinsci.plugins.workflow.libs.SCMSourceRetriever.retrieve(SCMSourceRetriever.java:80)
           at org.jenkinsci.plugins.workflow.libs.LibraryAdder.retrieve(LibraryAdder.java:150)
           at org.jenkinsci.plugins.workflow.libs.LibraryAdder.add(LibraryAdder.java:131)
           at org.jenkinsci.plugins.workflow.libs.LibraryDecorator$1.call(LibraryDecorator.java:114)
           at org.codehaus.groovy.control.CompilationUnit.applyToPrimaryClassNodes(CompilationUnit.java:1065)
           at org.codehaus.groovy.control.CompilationUnit.doPhaseOperation(CompilationUnit.java:603)
           at org.codehaus.groovy.control.CompilationUnit.processPhaseOperations(CompilationUnit.java:581)
           at org.codehaus.groovy.control.CompilationUnit.compile(CompilationUnit.java:558)
           at groovy.lang.GroovyClassLoader.doParseClass(GroovyClassLoader.java:298)
           at groovy.lang.GroovyClassLoader.parseClass(GroovyClassLoader.java:268)
           at groovy.lang.GroovyShell.parseClass(GroovyShell.java:688)
           at groovy.lang.GroovyShell.parse(GroovyShell.java:700)
           at org.jenkinsci.plugins.workflow.cps.CpsGroovyShell.doParse(CpsGroovyShell.java:129)
           at org.jenkinsci.plugins.workflow.cps.CpsGroovyShell.reparse(CpsGroovyShell.java:123)
           at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.parseScript(CpsFlowExecution.java:516)
           at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.start(CpsFlowExecution.java:479)
           at org.jenkinsci.plugins.workflow.job.WorkflowRun.run(WorkflowRun.java:252)
           at hudson.model.ResourceController.execute(ResourceController.java:97)
           at hudson.model.Executor.run(Executor.java:405)
          

           

          magnusbaeck Magnus Bäck added a comment - - edited FWIW one of our users just experienced a very similar error; just a different path and a for a fetch operation rather than ls-remote. Running Jenkins 2.60.3 with 2.5.0 of the git client plugin. No NFS shares are involved in our case, just plain ext4 partitions mounted into the Docker container where Jenkins runs. git fetch --tags --progress origin +refs/heads/:refs/remotes/origin/ hudson.plugins.git.GitException: Command "git fetch --tags --progress origin +refs/heads/:refs/remotes/origin/" returned status code 128: stdout: stderr: fatal: cannot exec '/srv/jenkins/caches/git-a67f10a9b25be4c33f1701c0caff217e@tmp/ssh4168691036775888323.sh': Text file busy fatal: unable to fork at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1924) at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandWithCredentials(CliGitAPIImpl.java:1643) at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.access$300(CliGitAPIImpl.java:71) at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$1.execute(CliGitAPIImpl.java:352) at jenkins.plugins.git.AbstractGitSCMSource.doRetrieve(AbstractGitSCMSource.java:344) at jenkins.plugins.git.AbstractGitSCMSource.retrieve(AbstractGitSCMSource.java:524) at jenkins.scm.api.SCMSource.fetch(SCMSource.java:598) at org.jenkinsci.plugins.workflow.libs.SCMSourceRetriever.retrieve(SCMSourceRetriever.java:80) at org.jenkinsci.plugins.workflow.libs.LibraryAdder.retrieve(LibraryAdder.java:150) at org.jenkinsci.plugins.workflow.libs.LibraryAdder.add(LibraryAdder.java:131) at org.jenkinsci.plugins.workflow.libs.LibraryDecorator$1.call(LibraryDecorator.java:114) at org.codehaus.groovy.control.CompilationUnit.applyToPrimaryClassNodes(CompilationUnit.java:1065) at org.codehaus.groovy.control.CompilationUnit.doPhaseOperation(CompilationUnit.java:603) at org.codehaus.groovy.control.CompilationUnit.processPhaseOperations(CompilationUnit.java:581) at org.codehaus.groovy.control.CompilationUnit.compile(CompilationUnit.java:558) at groovy.lang.GroovyClassLoader.doParseClass(GroovyClassLoader.java:298) at groovy.lang.GroovyClassLoader.parseClass(GroovyClassLoader.java:268) at groovy.lang.GroovyShell.parseClass(GroovyShell.java:688) at groovy.lang.GroovyShell.parse(GroovyShell.java:700) at org.jenkinsci.plugins.workflow.cps.CpsGroovyShell.doParse(CpsGroovyShell.java:129) at org.jenkinsci.plugins.workflow.cps.CpsGroovyShell.reparse(CpsGroovyShell.java:123) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.parseScript(CpsFlowExecution.java:516) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.start(CpsFlowExecution.java:479) at org.jenkinsci.plugins.workflow.job.WorkflowRun.run(WorkflowRun.java:252) at hudson.model.ResourceController.execute(ResourceController.java:97) at hudson.model.Executor.run(Executor.java:405)  
          markewaite Mark Waite added a comment -

          magnusbaeck thanks for the report. I suspect the case you're seeing is due to multiple threads attempting to scan for repository changes on a multi-branch pipeline. Can you confirm the caches/git-xxx directory is pointing to a remote repository which is used by one or more multi-branch pipeline jobs?

          I don't have a fix for it, or even a workaround, since that message would usually mean that one thread is attempting to replace content in the temporary file which holds the credentials while another thread has called command line git with environment variables which will cause it to use that file as the credentials information source.

          It is also possible that command line git has been called after Java wrote the file but before Java closed the file. That would surprise me very much, since that file is written from within a try with resources block which should close the file on exit from the block.

          markewaite Mark Waite added a comment - magnusbaeck thanks for the report. I suspect the case you're seeing is due to multiple threads attempting to scan for repository changes on a multi-branch pipeline. Can you confirm the caches/git-xxx directory is pointing to a remote repository which is used by one or more multi-branch pipeline jobs? I don't have a fix for it, or even a workaround, since that message would usually mean that one thread is attempting to replace content in the temporary file which holds the credentials while another thread has called command line git with environment variables which will cause it to use that file as the credentials information source. It is also possible that command line git has been called after Java wrote the file but before Java closed the file. That would surprise me very much, since that file is written from within a try with resources block which should close the file on exit from the block.
          magnusbaeck Magnus Bäck added a comment -

          Can you confirm the caches/git-xxx directory is pointing to a remote repository which is used by one or more multi-branch pipeline jobs?

          It's the git used for a shared pipeline library, which indeed is used in concurrently running pipeline jobs but each pipeline is single-branch.

          magnusbaeck Magnus Bäck added a comment - Can you confirm the  caches/git-xxx  directory is pointing to a remote repository which is used by one or more multi-branch pipeline jobs? It's the git used for a shared pipeline library, which indeed is used in concurrently running pipeline jobs but each pipeline is single-branch.
          markewaite Mark Waite added a comment -

          magnusbaeck thanks for the confirmation.  Access to that cache is controlled by a lock so that there should only be one git process accessing that cache at a time.  If multiple processes are writing that cache at the same time it is unexpected.

          markewaite Mark Waite added a comment - magnusbaeck thanks for the confirmation.  Access to that cache is controlled by a lock so that there should only be one git process accessing that cache at a time.  If multiple processes are writing that cache at the same time it is unexpected.
          pixman20 pixman20 added a comment -

          markewaite, I am having the same issue as magnusbaeck . We have a few hundred multi-branch pipeline jobs (created by the Bitbucket Branch Source Plugin) that all refer to a shared library. They all run during the night so it's easily possible that multiple jobs are trying to pull the shared library at the same time. It's worth noting that I am now using a git reference repository to speed up cloning, but I may have seen this issue prior to adding the reference repository as well. In theory the reference repo is irrelevant anyway since it's failing on /tmp/ssh....
          My stack trace is slightly different than the above description, so I'm pasting it in here in case it helps:

          hudson.plugins.git.GitException: Command "git ls-remote -h -t ssh://git@somegit:somport/someproject/somerepo.git" returned status code 128:
          stdout: 
          stderr: fatal: cannot exec '/tmp/ssh1395686403629394139.sh': Text file busy
          fatal: unable to fork
          
          	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1990)
          	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandWithCredentials(CliGitAPIImpl.java:1709)
          	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandWithCredentials(CliGitAPIImpl.java:1620)
          	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandWithCredentials(CliGitAPIImpl.java:1611)
          	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.getRemoteReferences(CliGitAPIImpl.java:2825)
          	at jenkins.plugins.git.AbstractGitSCMSource.retrieve(AbstractGitSCMSource.java:708)
          	at jenkins.scm.api.SCMSource.fetch(SCMSource.java:598)
          	at org.jenkinsci.plugins.workflow.libs.SCMSourceRetriever.retrieve(SCMSourceRetriever.java:80)
          	at org.jenkinsci.plugins.workflow.libs.LibraryAdder.retrieve(LibraryAdder.java:153)
          	at org.jenkinsci.plugins.workflow.libs.LibraryStep$Execution.run(LibraryStep.java:205)
          	at org.jenkinsci.plugins.workflow.libs.LibraryStep$Execution.run(LibraryStep.java:154)
          	at org.jenkinsci.plugins.workflow.steps.AbstractSynchronousNonBlockingStepExecution$1$1.call(AbstractSynchronousNonBlockingStepExecution.java:47)
          	at hudson.security.ACL.impersonate(ACL.java:274)
          	at org.jenkinsci.plugins.workflow.steps.AbstractSynchronousNonBlockingStepExecution$1.run(AbstractSynchronousNonBlockingStepExecution.java:44)
          	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
          	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
          	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
          	at java.lang.Thread.run(Thread.java:745)
          
          pixman20 pixman20 added a comment - markewaite , I am having the same issue as magnusbaeck . We have a few hundred multi-branch pipeline jobs (created by the Bitbucket Branch Source Plugin) that all refer to a shared library. They all run during the night so it's easily possible that multiple jobs are trying to pull the shared library at the same time. It's worth noting that I am now using a git reference repository to speed up cloning, but I may have seen this issue prior to adding the reference repository as well. In theory the reference repo is irrelevant anyway since it's failing on /tmp/ssh.... My stack trace is slightly different than the above description, so I'm pasting it in here in case it helps: hudson.plugins.git.GitException: Command "git ls-remote -h -t ssh: //git@somegit:somport/someproject/somerepo.git" returned status code 128: stdout: stderr: fatal: cannot exec '/tmp/ssh1395686403629394139.sh' : Text file busy fatal: unable to fork at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1990) at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandWithCredentials(CliGitAPIImpl.java:1709) at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandWithCredentials(CliGitAPIImpl.java:1620) at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandWithCredentials(CliGitAPIImpl.java:1611) at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.getRemoteReferences(CliGitAPIImpl.java:2825) at jenkins.plugins.git.AbstractGitSCMSource.retrieve(AbstractGitSCMSource.java:708) at jenkins.scm.api.SCMSource.fetch(SCMSource.java:598) at org.jenkinsci.plugins.workflow.libs.SCMSourceRetriever.retrieve(SCMSourceRetriever.java:80) at org.jenkinsci.plugins.workflow.libs.LibraryAdder.retrieve(LibraryAdder.java:153) at org.jenkinsci.plugins.workflow.libs.LibraryStep$Execution.run(LibraryStep.java:205) at org.jenkinsci.plugins.workflow.libs.LibraryStep$Execution.run(LibraryStep.java:154) at org.jenkinsci.plugins.workflow.steps.AbstractSynchronousNonBlockingStepExecution$1$1.call(AbstractSynchronousNonBlockingStepExecution.java:47) at hudson.security.ACL.impersonate(ACL.java:274) at org.jenkinsci.plugins.workflow.steps.AbstractSynchronousNonBlockingStepExecution$1.run(AbstractSynchronousNonBlockingStepExecution.java:44) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang. Thread .run( Thread .java:745)
          markewaite Mark Waite added a comment - - edited

          pixman20 as far as I can tell, the references to Text file busy - unable to fork consistently indicate that the file being executed (/tmp/shXXXX.sh) is open for write. In the context of the git client plugin, that file is closed by the 'try with resources' code before the git command is invoked which will use the shell script. If it were consistently left open for write, I would expect much more frequent failures.

          Is the /tmp directory a network based mount, or is it using a file system configuration which might cause the file to remain open longer than expected?

          Is the Linux open file limit sufficiently large that the limit for open files is not being exceeded during that high load period? I doubt that is the problem, since the message would be "too many open files" rather than "Text file busy".

          markewaite Mark Waite added a comment - - edited pixman20 as far as I can tell, the references to Text file busy - unable to fork consistently indicate that the file being executed (/tmp/shXXXX.sh) is open for write. In the context of the git client plugin, that file is closed by the 'try with resources' code before the git command is invoked which will use the shell script. If it were consistently left open for write, I would expect much more frequent failures. Is the /tmp directory a network based mount, or is it using a file system configuration which might cause the file to remain open longer than expected? Is the Linux open file limit sufficiently large that the limit for open files is not being exceeded during that high load period? I doubt that is the problem, since the message would be "too many open files" rather than "Text file busy".

          Reopening, since this still occurs after I removed NFS from the equation.

          This might be easier to debug if the jenkins git plugin used more distinctive temp file names, submitted a pull request: https://github.com/jenkinsci/git-client-plugin/pull/293

          mbmop Mostyn Bramley-Moore added a comment - Reopening, since this still occurs after I removed NFS from the equation. This might be easier to debug if the jenkins git plugin used more distinctive temp file names, submitted a pull request:  https://github.com/jenkinsci/git-client-plugin/pull/293

          The issue here appears to be caused by a Java bug, https://bugs.openjdk.java.net/browse/JDK-8068370. The issue is possible when multiple threads open a file for writing, close it, and then execute them (each thread using its own file). Even if all files are closed "properly", due to how file handles work around fork/exec, a child process in one thread may inherit the handle to anther thread's open file, and thus break that thread's later subprocess call.

          The two usual ways to avoid this issue is to either open files with O_CLOEXEC so they get auto-closed in child processes, or by closing open file descriptors on the child side of fork(). Java appears to do neither of those things, and a simple test program like the ones in the linked openjdk bug show that the problem is still there, on at least OpenJDK Java 8 and Oracle Java 9.

          I had a go at possible workarounds, and the options don't look thrilling. GIT_SSH_COMMAND does the same thing as GIT_SSH, failing when someone has the file open for writing. Patching the JDK is not feasible, and it looks like all files are opened in the unsafe manner so there is no "advanced" api to try. One thing that does work is creating the file outside Java, in a child process, or making a copy of the file and using that – with a subprocess cp, Java's Files.copy triggers the same race too. I'll a testcase that shows how threads+files+subprocesses don't work properly, and shows these two workarounds.

          The nature of this problem is such that it will randomly trigger for people that have a pipeline that creates a large number of sub-builds at the exact same time, and there doesn't appear to be a good user workaround right now other than not using SSHUserPrivateKey credentials at all. I think there should be some workaround in the git plugin, even if it's ugly.

          markewaite do you think createUnixGitSSH could be made to do a cp internally to avoid this race? Something like

          $ diff -u a/CliGitAPIImpl.java b/CliGitAPIImpl.java 
          --- a/CliGitAPIImpl.java        2018-01-08 16:01:40.711864222 +0100
          +++ b/CliGitAPIImpl.java        2018-01-08 16:04:27.638664845 +0100
          @@ -1936,6 +1936,7 @@
           
               private File createUnixGitSSH(File key, String user) throws IOException {
                   File ssh = createTempFile("ssh", ".sh");
          +        File ssh_copy = new File(ssh.toString() + "-copy");
                   try (PrintWriter w = new PrintWriter(ssh, Charset.defaultCharset().toString())) {
                       w.println("#!/bin/sh");
                       // ${SSH_ASKPASS} might be ignored if ${DISPLAY} is not set
          @@ -1945,8 +1946,12 @@
                       w.println("fi");
                       w.println("ssh -i \"" + key.getAbsolutePath() + "\" -l \"" + user + "\" -o StrictHostKeyChecking=no \"$@\"");
                   }
          -        ssh.setExecutable(true, true);
          -        return ssh;
          +        // Java doesn't know how to create a file properly in a way that's safe when another thread runs a subprocess.
          +        // See JENKINS-48258, https://bugs.openjdk.java.net/browse/JDK-8068370
          +        new ProcessBuilder("cp", ssh.toString(), ssh_copy.toString()).start().waitFor();
          +        ssh.delete();
          +        ssh_copy.setExecutable(true, true);
          +        return ssh_copy;
               }
          

          (draft, untested)

          tsniatowski Tomasz Śniatowski added a comment - The issue here appears to be caused by a Java bug, https://bugs.openjdk.java.net/browse/JDK-8068370 . The issue is possible when multiple threads open a file for writing, close it, and then execute them (each thread using its own file). Even if all files are closed "properly", due to how file handles work around fork/exec, a child process in one thread may inherit the handle to anther thread's open file, and thus break that thread's later subprocess call. The two usual ways to avoid this issue is to either open files with O_CLOEXEC so they get auto-closed in child processes, or by closing open file descriptors on the child side of fork(). Java appears to do neither of those things, and a simple test program like the ones in the linked openjdk bug show that the problem is still there, on at least OpenJDK Java 8 and Oracle Java 9. I had a go at possible workarounds, and the options don't look thrilling. GIT_SSH_COMMAND does the same thing as GIT_SSH, failing when someone has the file open for writing. Patching the JDK is not feasible, and it looks like all files are opened in the unsafe manner so there is no "advanced" api to try. One thing that does work is creating the file outside Java, in a child process, or making a copy of the file and using that – with a subprocess cp , Java's Files.copy triggers the same race too. I'll a testcase that shows how threads+files+subprocesses don't work properly, and shows these two workarounds. The nature of this problem is such that it will randomly trigger for people that have a pipeline that creates a large number of sub-builds at the exact same time, and there doesn't appear to be a good user workaround right now other than not using SSHUserPrivateKey credentials at all. I think there should be some workaround in the git plugin, even if it's ugly. markewaite do you think createUnixGitSSH could be made to do a cp internally to avoid this race? Something like $ diff -u a/CliGitAPIImpl.java b/CliGitAPIImpl.java --- a/CliGitAPIImpl.java 2018-01-08 16:01:40.711864222 +0100 +++ b/CliGitAPIImpl.java 2018-01-08 16:04:27.638664845 +0100 @@ -1936,6 +1936,7 @@ private File createUnixGitSSH(File key, String user) throws IOException { File ssh = createTempFile( "ssh" , ".sh" ); + File ssh_copy = new File(ssh.toString() + "-copy" ); try (PrintWriter w = new PrintWriter(ssh, Charset.defaultCharset().toString())) { w.println( "#!/bin/sh" ); // ${SSH_ASKPASS} might be ignored if ${DISPLAY} is not set @@ -1945,8 +1946,12 @@ w.println( "fi" ); w.println( "ssh -i \" " + key.getAbsolutePath() + " \ " -l \" " + user + " \ " -o StrictHostKeyChecking=no \" $@\""); } - ssh.setExecutable( true , true ); - return ssh; + // Java doesn 't know how to create a file properly in a way that' s safe when another thread runs a subprocess. + // See JENKINS-48258, https://bugs.openjdk.java.net/browse/JDK-8068370 + new ProcessBuilder( "cp" , ssh.toString(), ssh_copy.toString()).start().waitFor(); + ssh.delete(); + ssh_copy.setExecutable( true , true ); + return ssh_copy; } (draft, untested)
          markewaite Mark Waite added a comment -

          tsniatowski I'm willing to provide a prototype git client plugin build which uses that technique so that you and others who see the problem can investigate if the technique helps the problem.

          markewaite Mark Waite added a comment - tsniatowski I'm willing to provide a prototype git client plugin build which uses that technique so that you and others who see the problem can investigate if the technique helps the problem.

          markewaite: sure.  We're trialing a shell script git wrapper that Tomasz wrote using this same idea- it seems to be working so far, but I want to wait another week or so to watch for errors.

          mbmop Mostyn Bramley-Moore added a comment - markewaite : sure.  We're trialing a shell script git wrapper that Tomasz wrote using this same idea- it seems to be working so far, but I want to wait another week or so to watch for errors.

          Things are still looking good with the test shell script git wrapper, so I'm fairly confident that tsniatowski has found the root cause and his suggested workaround would work (implemented in the git client plugin).

          mbmop Mostyn Bramley-Moore added a comment - Things are still looking good with the test shell script git wrapper, so I'm fairly confident that tsniatowski has found the root cause and his suggested workaround would work (implemented in the git client plugin).

          Code changed in jenkins
          User: Mostyn Bramley-Moore
          Path:
          src/main/java/org/jenkinsci/plugins/gitclient/CliGitAPIImpl.java
          http://jenkins-ci.org/commit/git-client-plugin/d4e55360e2dd5f41bc556d7c181ecc29e4258109
          Log:
          JENKINS-48258 add distinctive temp file prefix to aid debugging

          scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Mostyn Bramley-Moore Path: src/main/java/org/jenkinsci/plugins/gitclient/CliGitAPIImpl.java http://jenkins-ci.org/commit/git-client-plugin/d4e55360e2dd5f41bc556d7c181ecc29e4258109 Log: JENKINS-48258 add distinctive temp file prefix to aid debugging

          Code changed in jenkins
          User: Mark Waite
          Path:
          src/main/java/org/jenkinsci/plugins/gitclient/CliGitAPIImpl.java
          http://jenkins-ci.org/commit/git-client-plugin/4241e818dbe27fb78093094f29ebcd2203166610
          Log:
          Merge pull request #293 from mostynb/text_file_busy_debugging

          JENKINS-48258 add distinctive temp file prefix to aid debugging

          Compare: https://github.com/jenkinsci/git-client-plugin/compare/1bfc00f0d56a...4241e818dbe2

          scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Mark Waite Path: src/main/java/org/jenkinsci/plugins/gitclient/CliGitAPIImpl.java http://jenkins-ci.org/commit/git-client-plugin/4241e818dbe27fb78093094f29ebcd2203166610 Log: Merge pull request #293 from mostynb/text_file_busy_debugging JENKINS-48258 add distinctive temp file prefix to aid debugging Compare: https://github.com/jenkinsci/git-client-plugin/compare/1bfc00f0d56a...4241e818dbe2
          markewaite Mark Waite added a comment -

          mbmop what were the results of your tests with the change from tsniatowski? Is tsniatowski interested in submitting a pull request that could be used by others?

          markewaite Mark Waite added a comment - mbmop what were the results of your tests with the change from tsniatowski ? Is tsniatowski interested in submitting a pull request that could be used by others?

          We haven't seen the error since we implemented the local workaround, so that's over a month now. It looks like it has definitely helped.

          Unfortunately it's a shell wrapper and not really a usable patch:

          $ which git
          /usr/local/bin/git
          $ cat /usr/local/bin/git
          #!/bin/sh
          # Wrapper to avoid "fatal: cannot exec '/tmp/aa': Text file busy" errors when
          # git is called with GIT_SSH pointing to something that's still open for writing
          # (see https://issues.jenkins-ci.org/browse/JENKINS-48258)
          GIT_SSH_COPY=
          if [ -n "$GIT_SSH" ]; then
            GIT_SSH_COPY="$(mktemp "${GIT_SSH}.XXXXX")"
            cp -a "$GIT_SSH" "$GIT_SSH_COPY"
            GIT_SSH="$GIT_SSH_COPY"
          fi
          /usr/bin/git "$@"
          RET=$?
          if [ -n "$GIT_SSH_COPY" ]; then
            rm -f "$GIT_SSH_COPY"
          fi
          exit $RET
          

          I'm not quite sure how to solve this purely in Java in the git plugin code, as my tests indicate merely creating the file on the Java side introduces a race condition that can make this issue come back. A quick hacky workaround could make this go from a 0.1% issue to a 0.001% but still something that will occasionally break. Likely a more substantial change of approach on Linux is needed, to avoid the "write file then make GIT_SSH point to it" pattern altogether.

          tsniatowski Tomasz Śniatowski added a comment - We haven't seen the error since we implemented the local workaround, so that's over a month now. It looks like it has definitely helped. Unfortunately it's a shell wrapper and not really a usable patch: $ which git /usr/local/bin/git $ cat /usr/local/bin/git #!/bin/sh # Wrapper to avoid "fatal: cannot exec '/tmp/aa' : Text file busy" errors when # git is called with GIT_SSH pointing to something that's still open for writing # (see https: //issues.jenkins-ci.org/browse/JENKINS-48258) GIT_SSH_COPY= if [ -n "$GIT_SSH" ]; then GIT_SSH_COPY= "$(mktemp " ${GIT_SSH}.XXXXX ")" cp -a "$GIT_SSH" "$GIT_SSH_COPY" GIT_SSH= "$GIT_SSH_COPY" fi /usr/bin/git "$@" RET=$? if [ -n "$GIT_SSH_COPY" ]; then rm -f "$GIT_SSH_COPY" fi exit $RET I'm not quite sure how to solve this purely in Java in the git plugin code, as my tests indicate merely creating the file on the Java side introduces a race condition that can make this issue come back. A quick hacky workaround could make this go from a 0.1% issue to a 0.001% but still something that will occasionally break. Likely a more substantial change of approach on Linux is needed, to avoid the "write file then make GIT_SSH point to it" pattern altogether.

          It seems like it should be possible to eliminate the race with the use of a lock to prevent processes from being forked while the GIT_SSH target script is open for writing.

          Roughly:

          • In hudson.Launcher, add a static ReadWriteLock field using a ReentrantReadWriteLock.
          • Hold the read lock in hudson.Launcher.ProcStarter.start() when calling launch().
          • Also hold the read lock in the various deprecated final hudson.Launcher.launch() overloads.
          • Expose the (write) lock publicly for use in preventing launching.
          • When writing files that will be executed, hold the write lock from when that file is created to when it is closed.  Specifically, hold the lock for the lifetime of the PrintWriter in each of org.jenkinsci.plugins.gitclient.CliGitApiImpl.createUnixSshAskpass(), .createUnixStandardAskpass(), and .createUnixGitSSH().

          Does this seem like a viable approach?  Worth developing a patch?

           

          dscunkel Christopher Unkel added a comment - It seems like it should be possible to eliminate the race with the use of a lock to prevent processes from being forked while the GIT_SSH target script is open for writing. Roughly: In hudson.Launcher, add a static ReadWriteLock field using a ReentrantReadWriteLock. Hold the read lock in hudson.Launcher.ProcStarter.start() when calling launch(). Also hold the read lock in the various deprecated final hudson.Launcher.launch() overloads. Expose the (write) lock publicly for use in preventing launching. When writing files that will be executed, hold the write lock from when that file is created to when it is closed.  Specifically, hold the lock for the lifetime of the PrintWriter in each of org.jenkinsci.plugins.gitclient.CliGitApiImpl.createUnixSshAskpass(), .createUnixStandardAskpass(), and .createUnixGitSSH(). Does this seem like a viable approach?  Worth developing a patch?  
          markewaite Mark Waite added a comment -

          dscunkel the git client plugin already asks the Java libraries to close the script before the process which will reference it is forked. Unfortunately, a bug in the Java libraries causes that request to close the file to sometimes be ignored.

          Refer to the earlier comment from tsniatowski for more details.

          Refer to PR313 for the work-around as a pull request. Since I have never seen the problem, I can't reliably test that the patch fixes the problem. If you've seen the problerm, please download the pull request build, test it, and report your results on that pull request.

          markewaite Mark Waite added a comment - dscunkel the git client plugin already asks the Java libraries to close the script before the process which will reference it is forked. Unfortunately, a bug in the Java libraries causes that request to close the file to sometimes be ignored. Refer to the earlier comment from tsniatowski for more details. Refer to PR313 for the work-around as a pull request. Since I have never seen the problem, I can't reliably test that the patch fixes the problem. If you've seen the problerm, please download the pull request build , test it, and report your results on that pull request .

          markewaite: as I understand the issue as described in JDK-8068370, the issue isn't that the file descriptor for the fix isn't closed, but rather that the open file descriptor is inherited into a child process forked by another thread, so there's a race between writing the GIT_SSH file contents and launching processes:

          1. In thread 1: open GIT_SSH script file for writing.
          2. In thread 2: fork process; child inherits open file descriptor of GIT_SSH file.
          3. In thread 1: close GIT_SSH file.
          4. In thread 1: fork thread to run git.  GIT_SSH file is still open in previous child, so ETXTBSY results.

          So if a lock can preclude step 2 from happening between step 1 and step 3, the race would be fixed  That said, PR313 has a more local fix and would seem preferable if it works.

          I see this issue, but only intermittently, so it will probably take a month of testing to be confident the PR is a solution.

          dscunkel Christopher Unkel added a comment - markewaite : as I understand the issue as described in JDK-8068370, the issue isn't that the file descriptor for the fix isn't closed, but rather that the open file descriptor is inherited into a child process forked by another thread, so there's a race between writing the GIT_SSH file contents and launching processes: In thread 1: open GIT_SSH script file for writing. In thread 2: fork process; child inherits open file descriptor of GIT_SSH file. In thread 1: close GIT_SSH file. In thread 1: fork thread to run git.  GIT_SSH file is still open in previous child, so ETXTBSY results. So if a lock can preclude step 2 from happening between step 1 and step 3, the race would be fixed  That said, PR313 has a more local fix and would seem preferable if it works. I see this issue, but only intermittently, so it will probably take a month of testing to be confident the PR is a solution.
          markewaite Mark Waite added a comment -

          dscunkel as far as I understand the git client plugin code, the temporary file is opened, written, and closed in the same thread that runs the command line git process. As far as I can tell, there is only a single thread which creates the script and then runs command line git in a subprocess. I can't explain why the temporary file is occasionally still open when the script is executed by command line git.

          markewaite Mark Waite added a comment - dscunkel as far as I understand the git client plugin code, the temporary file is opened, written, and closed in the same thread that runs the command line git process. As far as I can tell, there is only a single thread which creates the script and then runs command line git in a subprocess. I can't explain why the temporary file is occasionally still open when the script is executed by command line git.

          markewaite: "thread 2" in the above could be any other thread in the Java virtual machine.   From a Java language point of view, the PrintWriter object and the underlying FileOutputStream would seem to be local to the thread that is executing createUnixGitSSH() and subsequently asks for the git CLI process to be launched.  However, on a Unix JVM implementation, underlying the FileOutputStream is an open file descriptor to the script.  File descriptors always have process scope. They are not thread local.

          So when I see the bug I'm guessing that thread 2 is a different thread launching a shell build step from some unrelated job.  On Unix Runtime.exec() is implemented with a fork(), producing a child process, followed by an exec() in the child process.  The child process inherits all open file descriptors.  If badly timed, this includes the open file descriptor for the SSH script, and that's how the script file is still open: it's open in a child process totally unrelated to what the git client plugin code is doing.  By the time the git client code runs git, the script is closed in the Jenkins process itself.

          dscunkel Christopher Unkel added a comment - markewaite : "thread 2" in the above could be any other thread in the Java virtual machine.   From a Java language point of view, the PrintWriter object and the underlying FileOutputStream would seem to be local to the thread that is executing createUnixGitSSH() and subsequently asks for the git CLI process to be launched.  However, on a Unix JVM implementation, underlying the FileOutputStream is an open file descriptor to the script.  File descriptors always have process scope. They are not thread local. So when I see the bug I'm guessing that thread 2 is a different thread launching a shell build step from some unrelated job.  On Unix Runtime.exec() is implemented with a fork(), producing a child process, followed by an exec() in the child process.  The child process inherits all open file descriptors.  If badly timed, this includes the open file descriptor for the SSH script, and that's how the script file is still open: it's open in a child process totally unrelated to what the git client plugin code is doing.  By the time the git client code runs git, the script is closed in the Jenkins process itself.
          markewaite Mark Waite added a comment - - edited

          dscunkel each invocation of command line git which needs a credentials file is allocated a unique temporary file by a call to Files.createTempFile(). The process which will consume that unique temporary file is not forked until after the JVM has been told to close the unique temporary file. I'm sure I must be missing something, but the sequence of calls tries very, very hard to assure that the unique temporary file is closed before the call to command line git.

          I think you're suggesting that the fork of a potentially unrelated process is holding file descriptors open which were opened by the git client plugin. If that's the case, then it will need someone with much better threading skills than mine to propose a change to the Jenkins core which does not damage process fork performance and still resolves this case.

          A stackoverflow article claims that Java on Unix sets the FD_CLOEXEC flag when it execs a process. I still don't have an explanation of the problem, just observations that seem to contradict the rational reasons this problem happens.

          markewaite Mark Waite added a comment - - edited dscunkel each invocation of command line git which needs a credentials file is allocated a unique temporary file by a call to Files.createTempFile(). The process which will consume that unique temporary file is not forked until after the JVM has been told to close the unique temporary file. I'm sure I must be missing something, but the sequence of calls tries very, very hard to assure that the unique temporary file is closed before the call to command line git. I think you're suggesting that the fork of a potentially unrelated process is holding file descriptors open which were opened by the git client plugin. If that's the case, then it will need someone with much better threading skills than mine to propose a change to the Jenkins core which does not damage process fork performance and still resolves this case. A stackoverflow article claims that Java on Unix sets the FD_CLOEXEC flag when it execs a process. I still don't have an explanation of the problem, just observations that seem to contradict the rational reasons this problem happens.

          markewaite: this bug cannot be understood by thinking about the behavior of one git client thread in isolation. 

          The straight-line code in the git client is correct and it does always close the unique temporary file before the call to command-line git.  The problem is that there are other threads in the JVM and they may also run commands and make subprocesses.  The mechanics of making subprocesses creates duplicates of open files.  It is one of the duplicates that is open, not the version opened by the git client code.

          To be more explicit, imagine that I have two build jobs running:  job 1 needs to do a git checkout, and job 2 needs to run make.  Say that Jenkins is running as process ID 1000, thread 1 is running the git checkout, and thread 2 is running the make.  Here's a thread/process execution interleaving in which the bug manifests:

          1. Process 1000, thread 1: open ssh123456.sh for writing, file descriptor 4
          2. Process 1000, thread 2: fork in preparation to run make, creating process 1001.  Inherits file descriptor 4 open for writing to ssh123456.sh.
          3. Process 1001: exec() make.
          4. Process 1000, thread 1: write contents of ssh123456.sh.
          5. Process 1000, thread 1: close ssh123456.sh.  Process 1000 no longer has ssh123456.sh open for writing.  However, this does not close file descriptor 4 in process 1001 (running make), hence ssh123456.sh is still open somewhere on the system for writing.
          6. Process 1000, thread 1: fork() in preparation to run git, creating process 1002.
          7. Process 1002: exec() git.
          8. Process 1002: fork in preparation to run SSH_AGENT script, creating process 1003.
          9. Process 1003: exec() ssh123456.sh --> ETXTBSY.  ssh123456.sh is open for writing as file descriptor 4 in process 1001 (make).

          So the script file is not open in the Jenkins process, but nonetheless it is open somewhere on the system, hence ETXTBSY.  And the fact that some other totally unrelated code can make a copy of the file descriptor and mess things up is why it's a Java runtime bug.  A combination of vfork() and the close-on-exec flag would ensure that the file descriptor 4 in process 1001 in step 3, thus closing the copy.  That's what's being contemplated as the fix in the JVM.

          One workaround is what's in PR313: copy the script using cp, which doesn't create children, so can't have stranded an open file descriptor to its destination.  Another is what I proposed, which is to use a lock to ensure that steps 2 and 3 above cannot happen between steps 1 and 5.

           

           

          dscunkel Christopher Unkel added a comment - markewaite : this bug cannot be understood by thinking about the behavior of one git client thread in isolation.  The straight-line code in the git client is correct and it  does  always close the unique temporary file before the call to command-line git.  The problem is that there are other threads in the JVM and they may also run commands and make subprocesses.  The mechanics of making subprocesses creates duplicates of open files.  It is one of the duplicates that is open, not the version opened by the git client code. To be more explicit, imagine that I have two build jobs running:  job 1 needs to do a git checkout, and job 2 needs to run make.  Say that Jenkins is running as process ID 1000, thread 1 is running the git checkout, and thread 2 is running the make.  Here's a thread/process execution interleaving in which the bug manifests: Process 1000, thread 1: open ssh123456.sh for writing, file descriptor 4 Process 1000, thread 2: fork in preparation to run make, creating process 1001.  Inherits file descriptor 4 open for writing to ssh123456.sh. Process 1001: exec() make. Process 1000, thread 1: write contents of ssh123456.sh. Process 1000, thread 1: close ssh123456.sh.  Process 1000 no longer has ssh123456.sh open for writing.  However, this does not close file descriptor 4 in process 1001 (running make), hence ssh123456.sh is still open somewhere on the system for writing. Process 1000, thread 1: fork() in preparation to run git, creating process 1002. Process 1002: exec() git. Process 1002: fork in preparation to run SSH_AGENT script, creating process 1003. Process 1003: exec() ssh123456.sh --> ETXTBSY.  ssh123456.sh is open for writing as file descriptor 4 in process 1001 (make). So the script file is not open in the Jenkins process, but nonetheless it is open somewhere on the system, hence ETXTBSY.  And the fact that some other totally unrelated code can make a copy of the file descriptor and mess things up is why it's a Java runtime bug.  A combination of vfork() and the close-on-exec flag would ensure that the file descriptor 4 in process 1001 in step 3, thus closing the copy.  That's what's being contemplated as the fix in the JVM. One workaround is what's in PR313: copy the script using cp, which doesn't create children, so can't have stranded an open file descriptor to its destination.  Another is what I proposed, which is to use a lock to ensure that steps 2 and 3 above cannot happen between steps 1 and 5.    

          Yup–that's it.

          Other discussion aside, I like the fix in PR313.  It's a bit ugly to have to execute 'cp', but I think it will solve the problem completely and do so with a fix local to the git client.

          As far as the stack overflow article, I'm not sure either, but I have at least two theories:

          • There's more than one JVM in the world: maybe openjdk doesn't set FD_CLOEXEC but other JVMs do.
          • Even with FD_CLOEXEC there's still a short race window between fork() in the parent and the exec() in the child.  On Linux this can be solved by using vfork() instead of fork(), but the POSIX semantics don't guarantee such.
          dscunkel Christopher Unkel added a comment - Yup–that's it. Other discussion aside, I like the fix in PR313.  It's a bit ugly to have to execute 'cp', but I think it will solve the problem completely and do so with a fix local to the git client. As far as the stack overflow article, I'm not sure either, but I have at least two theories: There's more than one JVM in the world: maybe openjdk doesn't set FD_CLOEXEC but other JVMs do. Even with FD_CLOEXEC there's still a short race window between fork() in the parent and the exec() in the child.  On Linux this can be solved by using vfork() instead of fork(), but the POSIX semantics don't guarantee such.
          joeferr joe ferr added a comment -

          Apologies if this is the wrong place for this.  I'm an end user, not a contributor.  Seems like the PR has been ready for quite some time.  This issue is actually huge for us.  We have a Jenkins based kubernetes continuous delivery system (homegrown, I grew it!) that uses the multi-scm plugin and uses around 5 git repos per environment.  We deploy to 8 envs for our production deploy so we are getting 40+ repos w. every deploy with up to 30 deploys a day....so this impacts us at least once a day, more often multiple times a day.  Any status updates on this?  At my company this is getting very high visibility

          joeferr joe ferr added a comment - Apologies if this is the wrong place for this.  I'm an end user, not a contributor.  Seems like the PR has been ready for quite some time.  This issue is actually huge for us.  We have a Jenkins based kubernetes continuous delivery system (homegrown, I grew it!) that uses the multi-scm plugin and uses around 5 git repos per environment.  We deploy to 8 envs for our production deploy so we are getting 40+ repos w. every deploy with up to 30 deploys a day....so this impacts us at least once a day, more often multiple times a day.  Any status updates on this?  At my company this is getting very high visibility
          markewaite Mark Waite added a comment -

          joeferr please download the pull request build and install it in your environment. If it resolves the problem, please note that in the pull request or in this thread. I can't duplicate the problem. I'm relying on users to report their success with the proposed change.

          Evaluating a pull request by running it in your environment is a great way to be a contributor without compiling anything and with an easy way to "fall back" to the previous version if the pull request build doesn't meet your needs.

          markewaite Mark Waite added a comment - joeferr please download the pull request build and install it in your environment. If it resolves the problem, please note that in the pull request or in this thread. I can't duplicate the problem. I'm relying on users to report their success with the proposed change. Evaluating a pull request by running it in your environment is a great way to be a contributor without compiling anything and with an easy way to "fall back" to the previous version if the pull request build doesn't meet your needs.
          joeferr joe ferr added a comment -

          markewaite perfect, will do.  I'll let you know the outcome.  Thanks

          joeferr joe ferr added a comment - markewaite perfect, will do.  I'll let you know the outcome.  Thanks

          Code changed in jenkins
          User: presPetkov
          Path:
          src/main/java/org/jenkinsci/plugins/gitclient/CliGitAPIImpl.java
          http://jenkins-ci.org/commit/git-client-plugin/1199beea0743e60e09c987b1306cc0a8fdfb6879
          Log:
          Jenkins 48258 ssh text file busy (#2)

          • Corrected exception handling as per bug report
          • Fixed error
          • Fixed error
          scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: presPetkov Path: src/main/java/org/jenkinsci/plugins/gitclient/CliGitAPIImpl.java http://jenkins-ci.org/commit/git-client-plugin/1199beea0743e60e09c987b1306cc0a8fdfb6879 Log: Jenkins 48258 ssh text file busy (#2) JENKINS-48258 Use a copy of ssh file Corrected exception handling as per bug report Fixed error Fixed error

          Code changed in jenkins
          User: Mark Waite
          Path:
          src/main/java/org/jenkinsci/plugins/gitclient/CliGitAPIImpl.java
          http://jenkins-ci.org/commit/git-client-plugin/315ad4e689897dedb20c900b8f32f1ce607fb32c
          Log:
          Merge pull request #313 from presPetkov/JENKINS-48258-busy-ssh-text-file

          Jenkins 48258 busy ssh text file fix

          Compare: https://github.com/jenkinsci/git-client-plugin/compare/98eea34dc4e2...315ad4e68989
          *NOTE:* This service been marked for deprecation: https://developer.github.com/changes/2018-04-25-github-services-deprecation/

          Functionality will be removed from GitHub.com on January 31st, 2019.

          scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Mark Waite Path: src/main/java/org/jenkinsci/plugins/gitclient/CliGitAPIImpl.java http://jenkins-ci.org/commit/git-client-plugin/315ad4e689897dedb20c900b8f32f1ce607fb32c Log: Merge pull request #313 from presPetkov/ JENKINS-48258 -busy-ssh-text-file Jenkins 48258 busy ssh text file fix Compare: https://github.com/jenkinsci/git-client-plugin/compare/98eea34dc4e2...315ad4e68989 * NOTE: * This service been marked for deprecation: https://developer.github.com/changes/2018-04-25-github-services-deprecation/ Functionality will be removed from GitHub.com on January 31st, 2019.
          joeferr joe ferr added a comment -

          markewaite I built/installed your PR code into two of our jenkins instances last week as a "does this break anything test".  Those instances get a lot of use but probably not enough parallel clones for us to have seen this issue.  Yesterday I installed into the instance where we saw the problem and so far so good.  I'll let you know if anything changes but it appears so far to have fixed the issue from our testing.  Thanks, Joe

          joeferr joe ferr added a comment - markewaite I built/installed your PR code into two of our jenkins instances last week as a "does this break anything test".  Those instances get a lot of use but probably not enough parallel clones for us to have seen this issue.  Yesterday I installed into the instance where we saw the problem and so far so good.  I'll let you know if anything changes but it appears so far to have fixed the issue from our testing.  Thanks, Joe
          markewaite Mark Waite added a comment -

          This change is included in git client plugin 3.0.0-beta3 and is available from the experimental update center

          markewaite Mark Waite added a comment - This change is included in git client plugin 3.0.0-beta3 and is available from the experimental update center
          akmjenkins ASHOK MOHANTY added a comment - - edited

          Any plan for GA Date ( V.3.x will be available) !!

          akmjenkins ASHOK MOHANTY added a comment - - edited Any plan for GA Date ( V.3.x will be available) !!
          markewaite Mark Waite added a comment -

          I've created a git client plugin 3.0 milestone and a git plugin 4.0 milestone in GitHub to show the pull requests that I intend to evaluate before release. I've set the due date for those milestones as Feb 28, 2019, in hopes that provides enough time to evaluate those pull requests, thoroughly.

          I intend to release another beta of the plugins within the next few days. Please help by testing the beta when it is released. Report your test results either in bug reports like this one or on the pull requests that were the focus of your testing.

          markewaite Mark Waite added a comment - I've created a git client plugin 3.0 milestone and a git plugin 4.0 milestone in GitHub to show the pull requests that I intend to evaluate before release. I've set the due date for those milestones as Feb 28, 2019, in hopes that provides enough time to evaluate those pull requests, thoroughly. I intend to release another beta of the plugins within the next few days. Please help by testing the beta when it is released. Report your test results either in bug reports like this one or on the pull requests that were the focus of your testing.
          pavenova Pavel Novak added a comment -

          Hi, I noticed this issue still occurs, mostly randomly, not so frequently. 

          Anyway, can you provide some status update?
          Is there a fix in plan?
          If, then when we can expect fix?

          Thanks in advance.

          pavenova Pavel Novak added a comment - Hi, I noticed this issue still occurs, mostly randomly, not so frequently.  Anyway, can you provide some status update? Is there a fix in plan? If, then when we can expect fix? Thanks in advance.
          markewaite Mark Waite added a comment - - edited

          pavenova I provided the status update in a preceding comment. That status report is still correct. The change that is intended to fix this problem is included in the current git client plugin 3.0.0 beta releases.

          I can't duplicate the problem and have never been able to duplicate the problem.

          Could you please provide a status report of your experiences using the git client plugin 3.0.0 beta release and the git plugin 4.0.0 beta release?

          If you haven't installed the beta release of the plugin, when can I expect that you will install the beta release of the plugin?

          How long will you run the beta release in your environment to assure that the issue is resolved?

          While I'm happy to hear that you're interested in the fix, I'm much more interested in test results from users that can duplicate the problem. The fix has been included in the beta release for multiple months.

          markewaite Mark Waite added a comment - - edited pavenova I provided the status update in a preceding comment . That status report is still correct. The change that is intended to fix this problem is included in the current git client plugin 3.0.0 beta releases. I can't duplicate the problem and have never been able to duplicate the problem. Could you please provide a status report of your experiences using the git client plugin 3.0.0 beta release and the git plugin 4.0.0 beta release? If you haven't installed the beta release of the plugin, when can I expect that you will install the beta release of the plugin? How long will you run the beta release in your environment to assure that the issue is resolved? While I'm happy to hear that you're interested in the fix, I'm much more interested in test results from users that can duplicate the problem. The fix has been included in the beta release for multiple months.

          FYI, I've been running a snapshot build of beta2 + this fix for nearly 8 months.  Prior to this fix I encountered the bug several times a week.  It has not happened a single time since installing the snapshot version.  It has completely resolved the issue for me.

          dscunkel Christopher Unkel added a comment - FYI, I've been running a snapshot build of beta2 + this fix for nearly 8 months.  Prior to this fix I encountered the bug several times a week.  It has not happened a single time since installing the snapshot version.  It has completely resolved the issue for me.
          markewaite Mark Waite added a comment -

          Thanks dscunkel!

          markewaite Mark Waite added a comment - Thanks dscunkel !
          pavenova Pavel Novak added a comment - - edited

          markewaite 

          Hello, Mark, 
          I apologize but I can't afford to install beta version.

          Anyway thanks for the feedback about the update.

          I noticed it looks like solution was found, good job.

           

          From my experience, I did not found any relation between environment, we are running on RHEL and tried with jenkins LTS 2.107.3 and 2.150.2, issue occurs on both machines, randomly and not so frequently.
          I am guessing if its not having eg. some relation to the credentials which are being used, but I am not sure (eg. stored on the global level and then being used in parallel)

          pavenova Pavel Novak added a comment - - edited markewaite   Hello, Mark,  I apologize but I can't afford to install beta version. Anyway thanks for the feedback about the update. I noticed it looks like solution was found, good job.   From my experience, I did not found any relation between environment, we are running on RHEL and tried with jenkins LTS 2.107.3 and 2.150.2, issue occurs on both machines, randomly and not so frequently. I am guessing if its not having eg. some relation to the credentials which are being used, but I am not sure (eg. stored on the global level and then being used in parallel)
          joeferr joe ferr added a comment -

          Way above in this thread I mentioned that I compiled/deployed from the Pull Request and it fixed the issue.  We have a large number of parallel gitlab activity going on in this jenkins instance....lots of pipeline builds to support many microservices and most jobs retrieve from multiple git repos.

          Just wanted to let you know that someone "upgraded" our instance to the latest released version and as-expected the problem came back.

          I installed the latest beta (4) a few days back and everything is great again....just another confirmation real-world that the fix works fine.

          Thanks,

          Joe

          joeferr joe ferr added a comment - Way above in this thread I mentioned that I compiled/deployed from the Pull Request and it fixed the issue.  We have a large number of parallel gitlab activity going on in this jenkins instance....lots of pipeline builds to support many microservices and most jobs retrieve from multiple git repos. Just wanted to let you know that someone "upgraded" our instance to the latest released version and as-expected the problem came back. I installed the latest beta (4) a few days back and everything is great again....just another confirmation real-world that the fix works fine. Thanks, Joe
          markewaite Mark Waite added a comment -

          Thanks very much joeferr!

          markewaite Mark Waite added a comment - Thanks very much joeferr !
          joeferr joe ferr added a comment -

          markewaite After installing beta4 our gitlab integration from jenkins to gitlab stopped working.  There seem to be lots of reported issues w. the gitlab plugin so I'm not sure what broke it (e.g may be unrelated to your stuff)...here's a ticket that we found

          https://github.com/jenkinsci/gitlab-plugin/issues/893.

          We have hundreds of pipeline jobs that use this and they all stopped reporting pipeline status back to gitlab.

           

          I turned on logging for the gitlab plugin and I'm seeing this.

          If you do think it's related to  your changes let me know if I can do any testing for you to  help troubleshoot.

           

          Build does not contain build data.
          Feb 05, 2019 9:46:44 AM INFO com.dabsquared.gitlabjenkins.util.CommitStatusUpdater retrieveGitlabProjectIds
           

          joeferr joe ferr added a comment - markewaite After installing beta4 our gitlab integration from jenkins to gitlab stopped working.  There seem to be lots of reported issues w. the gitlab plugin so I'm not sure what broke it (e.g may be unrelated to your stuff)...here's a ticket that we found https://github.com/jenkinsci/gitlab-plugin/issues/893. We have hundreds of pipeline jobs that use this and they all stopped reporting pipeline status back to gitlab.   I turned on logging for the gitlab plugin and I'm seeing this. If you do think it's related to  your changes let me know if I can do any testing for you to  help troubleshoot.   Build does not contain build data. Feb 05, 2019 9:46:44 AM INFO com.dabsquared.gitlabjenkins.util.CommitStatusUpdater retrieveGitlabProjectIds  
          pavenova Pavel Novak added a comment -

          joeferr

          Yes, I can confirm that, once we upgraded to the latest ( git plugin to 4.0.0-rc and git client plugin to 3.0.0-rc), git operations were not working correctly. 

          Downgrade to 3.9.1 and 2.7.2 solved the issue, but it seems (of course) busy file persist.

          pavenova Pavel Novak added a comment - joeferr Yes, I can confirm that, once we upgraded to the latest ( git plugin to 4.0.0-rc and git client plugin to 3.0.0-rc), git operations were not working correctly.  Downgrade to 3.9.1 and 2.7.2 solved the issue, but it seems (of course) busy file persist.
          sebawo Sebastian Wojas added a comment - - edited

          markewaite when you're planning to release fixed version (3.0, not beta) of git-plugin ?

          sebawo Sebastian Wojas added a comment - - edited markewaite when you're planning to release fixed version (3.0, not beta) of git-plugin ?
          markewaite Mark Waite added a comment - - edited

          sebawo when I've resolved the blocking bugs that are identified in the git client plugin 3.0 milestone and in the git plugin 4.0 milestone. You're welcome to help test the beta versions and report your results of testing the beta versions.

          markewaite Mark Waite added a comment - - edited sebawo when I've resolved the blocking bugs that are identified in the git client plugin 3.0 milestone and in the git plugin 4.0 milestone . You're welcome to help test the beta versions and report your results of testing the beta versions.
          markewaite Mark Waite added a comment -

          Released with git client plugin 3.0.0 and git plugin 4.0.0 on Nov 2, 2019.

          markewaite Mark Waite added a comment - Released with git client plugin 3.0.0 and git plugin 4.0.0 on Nov 2, 2019.
          akmjenkins ASHOK MOHANTY added a comment -

          Can I only upgrade git-client plugin to v3.0.0 (right now we have git plugin v3.9.1) !? OR we should upgrade git client plugin 3.0.0 and git plugin 4.0.0 together !!

          akmjenkins ASHOK MOHANTY added a comment - Can I only upgrade git-client plugin to v3.0.0 (right now we have git plugin v3.9.1) !? OR we should upgrade  git client plugin 3.0.0 and git plugin 4.0.0 together !!
          markewaite Mark Waite added a comment -

          Upgrade the two plugins together. Git plugin 3.x might work with git client plugin 3.0.0, but you'll be the first (and possibly only) person to test that configuration. It is intended that git plugin 4.0.0 and git client plugin 3.0.0 upgrade together.

          It is much better to travel with large groups of people in this case. Large groups of people have tested git client plugin 3.0.0 with git plugin 4.0.0. I'm not aware of anyone that has tested git client plugin 3.0.0 with git plugin 3.x releases.

          markewaite Mark Waite added a comment - Upgrade the two plugins together. Git plugin 3.x might work with git client plugin 3.0.0, but you'll be the first (and possibly only) person to test that configuration. It is intended that git plugin 4.0.0 and git client plugin 3.0.0 upgrade together. It is much better to travel with large groups of people in this case. Large groups of people have tested git client plugin 3.0.0 with git plugin 4.0.0. I'm not aware of anyone that has tested git client plugin 3.0.0 with git plugin 3.x releases.

          People

            Unassigned Unassigned
            mbmop Mostyn Bramley-Moore
            Votes:
            8 Vote for this issue
            Watchers:
            27 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: