Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-60430

stopping a job while in reconcile will not stop the reconcile (the job itself will however stop)

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • p4-plugin
    • None

      We had setup our sync to use

      AutoCleanImpl

      As the sync method for our project but noticed that when we stopped a job that was in the process of reconcile the job in Jenkins would stop normally but the agent would carry on with the reconcile.

      Here is a stack trace obtained by monitoring an agent that was effectively idle on which we had just cancelled a job during a reconcile :

      pool-1-thread-146 for JNLP4-connect connection to jenkis-server-url/10.144.6.28:20555 id=17503
      java.io.WinNTFileSystem.list(Native Method)
      java.io.File.list(File.java:1134)
      java.io.File.listFiles(File.java:1219)
      com.perforce.p4java.impl.mapbased.rpc.func.client.ClientSystemFileMatchCommands.traverseDirs(ClientSystemFileMatchCommands.java:726)
      com.perforce.p4java.impl.mapbased.rpc.func.client.ClientSystemFileMatchCommands.traverseDirs(ClientSystemFileMatchCommands.java:790)
      com.perforce.p4java.impl.mapbased.rpc.func.client.ClientSystemFileMatchCommands.traverseDirs(ClientSystemFileMatchCommands.java:790)
      com.perforce.p4java.impl.mapbased.rpc.func.client.ClientSystemFileMatchCommands.traverseDirs(ClientSystemFileMatchCommands.java:790)
      com.perforce.p4java.impl.mapbased.rpc.func.client.ClientSystemFileMatchCommands.traverseDirs(ClientSystemFileMatchCommands.java:790)
      com.perforce.p4java.impl.mapbased.rpc.func.client.ClientSystemFileMatchCommands.traverseDirs(ClientSystemFileMatchCommands.java:790)
      com.perforce.p4java.impl.mapbased.rpc.func.client.ClientSystemFileMatchCommands.reconcileAdd(ClientSystemFileMatchCommands.java:588)
      com.perforce.p4java.impl.mapbased.rpc.func.client.ClientFunctionDispatcher.dispatch(ClientFunctionDispatcher.java:220)
      com.perforce.p4java.impl.mapbased.rpc.packet.RpcPacketDispatcher.dispatch(RpcPacketDispatcher.java:160)
      com.perforce.p4java.impl.mapbased.rpc.OneShotServerImpl.execMapCmdList(OneShotServerImpl.java:363)
      com.perforce.p4java.impl.mapbased.rpc.OneShotServerImpl.execStreamingMapCommand(OneShotServerImpl.java:428)
      com.perforce.p4java.impl.mapbased.client.Client.reconcileFiles(Client.java:1806)
      org.jenkinsci.plugins.p4.client.ClientHelper.tidyClean(ClientHelper.java:570)
      org.jenkinsci.plugins.p4.client.ClientHelper.tidyAutoCleanImpl(ClientHelper.java:492)
      org.jenkinsci.plugins.p4.client.ClientHelper.tidyWorkspace(ClientHelper.java:436)
      org.jenkinsci.plugins.p4.tasks.CheckoutTask.task(CheckoutTask.java:163)
      org.jenkinsci.plugins.p4.tasks.AbstractTask.retryTask(AbstractTask.java:202)
      org.jenkinsci.plugins.p4.tasks.AbstractTask.tryTask(AbstractTask.java:185)
      org.jenkinsci.plugins.p4.tasks.CheckoutTask.invoke(CheckoutTask.java:157)
      org.jenkinsci.plugins.p4.tasks.CheckoutTask.invoke(CheckoutTask.java:32)
      hudson.FilePath$FileCallableWrapper.call(FilePath.java:3052)
      hudson.remoting.UserRequest.perform(UserRequest.java:212)
      hudson.remoting.UserRequest.perform(UserRequest.java:54)
      hudson.remoting.Request$2.run(Request.java:369)
      hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
      java.util.concurrent.FutureTask.run(FutureTask.java:264)
      java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
      java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
      hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:93)
      hudson.remoting.Engine$1$$Lambda$68/0x00000008001cd840.run(Unknown Source)
      java.lang.Thread.run(Thread.java:834)**

      Starting a new job on this agent would yield conflict as the agent would have file handle on some of the files while the next job would also attempt to perform a reconcile.
      ERROR: P4: Task Exception: com.perforce.p4java.exception.P4JavaException: com.perforce.p4java.exception.P4JavaException: hudson.AbortException: P4JAVA: Error(s):11:41:32 operating system will not allow deletion of file e:\jwrk\stg\workspace\XXXXX-Win64-Mono\Assets_Game\Environments\Architecture\Mesh\Keyhole_Tower_01\Base\Floor_Base_Interior_01\.Sources\Floor_Base_Interior_01.ZTL on client.
      when looking for this handle on the node we could find that the agent process itself was holding on to that file.

      Lastly we did notice that some tmp files were left in the source tree indicative of a perforce operation that did not clean itself properly.

       We worked around the problem by not doing reconciles anymore at all, we are force syncing systematically.  Which is faster anyways for large projects.

          [JENKINS-60430] stopping a job while in reconcile will not stop the reconcile (the job itself will however stop)

          Rob Petti added a comment -

          perforce-plugin is deprecated, and it looks like you are using p4-plugin anyway. Please double check the plugin name before filing tickets in the future.

          Rob Petti added a comment - perforce-plugin is deprecated, and it looks like you are using p4-plugin anyway. Please double check the plugin name before filing tickets in the future.

          Karl Wirth added a comment -

          Hi newtopian - Thanks for letting us know about this. Please let me know which version of the plugin you are using and which version of P4D it is connected to.

          Karl Wirth added a comment - Hi newtopian - Thanks for letting us know about this. Please let me know which version of the plugin you are using and which version of P4D it is connected to.

          of course :

          Jenkins 2.190.3 on Centos 7 with OpenJDK Runtime Environment, 1.8.0_232-b09

          Agent on Windows 10 on openjdk-hotspot-win64-11.0.4-11

          P4 plugin 1.10.7

          p4d :

          Server date: 2019/12/11 09:50:08 -0500 EST
          Server uptime: 936:31:10
          Server version: P4D/LINUX26X86_64/2019.1/1876401 (2019/10/30)

          Eric Daigneault added a comment - of course : Jenkins 2.190.3 on Centos 7 with OpenJDK Runtime Environment, 1.8.0_232-b09 Agent on Windows 10 on openjdk-hotspot-win64-11.0.4-11 P4 plugin 1.10.7 p4d : Server date: 2019/12/11 09:50:08 -0500 EST Server uptime: 936:31:10 Server version: P4D/LINUX26X86_64/2019.1/1876401 (2019/10/30)

          Karl Wirth added a comment -

          Hi newtopian - Thanks.

          For not stopping the reconcile I have tried some testing here and I think this is more a problem on the server than with p4-plugin. Perforce commands run in a loop and when the client side connection drops they need to get to a safe point in the command before they check if the network connection is still there and die if needed. This is usually when the database locks are released or at the end of processing an argument.

          If I kill (CTRL+C) a reconcile that takes 30 seconds run using P4 at the command line, I still see the command in p4 monitor for about 25 seconds in total.

          If I kill the Jenkins job I see similar timings (21 seconds approx) however I think it's easier to be accurate about the time you killed a command at the command line and more difficult to be consistent via Jenkins.

          So from this testing I think the command line and p4-plugin behavior is comparable.

           

          For the file handle, do you know if the file handle was released when the reconcile completed?

          Also why did you need to workaround it? Was killing the jobs a frequent need?

          I have tested here and was not able to reproduce the problem on Windows 10 but it may be related to the types/sizes of files etc. Was .Sources\Floor_Base_Interior_01.ZTL a special file? For example a symlink/junction?

           

          For the p4j*.tmp files, is it possible they were from an earlier plugib version? They used to be created and sometimes not cleaned up when using symlinks. That should have been fixed in 1.10.6:

              https://github.com/jenkinsci/p4-plugin/blob/master/RELEASE.md

          Karl Wirth added a comment - Hi newtopian - Thanks. For not stopping the reconcile I have tried some testing here and I think this is more a problem on the server than with p4-plugin. Perforce commands run in a loop and when the client side connection drops they need to get to a safe point in the command before they check if the network connection is still there and die if needed. This is usually when the database locks are released or at the end of processing an argument. If I kill (CTRL+C) a reconcile that takes 30 seconds run using P4 at the command line, I still see the command in p4 monitor for about 25 seconds in total. If I kill the Jenkins job I see similar timings (21 seconds approx) however I think it's easier to be accurate about the time you killed a command at the command line and more difficult to be consistent via Jenkins. So from this testing I think the command line and p4-plugin behavior is comparable.   For the file handle, do you know if the file handle was released when the reconcile completed? Also why did you need to workaround it? Was killing the jobs a frequent need? I have tested here and was not able to reproduce the problem on Windows 10 but it may be related to the types/sizes of files etc. Was .Sources\Floor_Base_Interior_01.ZTL a special file? For example a symlink/junction?   For the p4j*.tmp files, is it possible they were from an earlier plugib version? They used to be created and sometimes not cleaned up when using symlinks. That should have been fixed in 1.10.6:     https://github.com/jenkinsci/p4-plugin/blob/master/RELEASE.md

          Karl Wirth added a comment -

          Hi newtopian - I was going through my old cases and saw that this one is still open. Are you able to answer the questions above? Thanks in advance.

          Karl

          Karl Wirth added a comment - Hi newtopian - I was going through my old cases and saw that this one is still open. Are you able to answer the questions above? Thanks in advance. Karl

          Hi Karl,

          For the file handle, do you know if the file handle was released when the reconcile completed?

          The handle was not released and caused the next job to fail on sync as the next reconcile was trying to repair the file

           

          Also why did you need to workaround it? Was killing the jobs a frequent need?

          It is not a frequent need no, but anything that affects the next build is considered a blocking issue.  Getting builds jobs must be independent of each-other unless explicitly specified in the job's config.  Hence the workaround.  Besides the clean option is really not practical beyond the simplest hello-world project.  on very large projects a reconcile will take longer than a wipe and re-sync or just a force-sync which is the option we are currently using as a replacement (force sync with the occasional wipe and re-sync).  It is not as clean but we gained some precious minutes in the build process..

           

          For the p4j*.tmp files, is it possible they were from an earlier plugib version?

          it's possible but that would mean that the p4clean is not doing it's job as there have  been a great many jobs run on these machines since the last update.

           

          Was .Sources\Floor_Base_Interior_01.ZTL a special file? For example a symlink/junction?

          No it was a normal file (zbrush I beleive) weighing around 50MB.  As we are mostly under windows stay clear of symlinks and such, such a pain under windows !

          Eric Daigneault added a comment - Hi Karl, For the file handle, do you know if the file handle was released when the reconcile completed? The handle was not released and caused the next job to fail on sync as the next reconcile was trying to repair the file   Also why did you need to workaround it? Was killing the jobs a frequent need? It is not a frequent need no, but anything that affects the next build is considered a blocking issue.  Getting builds jobs must be independent of each-other unless explicitly specified in the job's config.  Hence the workaround.  Besides the clean option is really not practical beyond the simplest hello-world project.  on very large projects a reconcile will take longer than a wipe and re-sync or just a force-sync which is the option we are currently using as a replacement (force sync with the occasional wipe and re-sync).  It is not as clean but we gained some precious minutes in the build process..   For the p4j*.tmp files, is it possible they were from an earlier plugib version? it's possible but that would mean that the p4clean is not doing it's job as there have  been a great many jobs run on these machines since the last update.   Was .Sources\Floor_Base_Interior_01.ZTL a special file? For example a symlink/junction? No it was a normal file (zbrush I beleive) weighing around 50MB.  As we are mostly under windows stay clear of symlinks and such, such a pain under windows !

            Unassigned Unassigned
            newtopian Eric Daigneault
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: