Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-65418

Cancelling p4 sync results in locked files

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      Hello,

       

      When cancelling a p4 sync, the files are locked and it is not possible to run another build until restarting Jenkins service.

      Tested with latest P4 plugin (1.11.4) and the issue is reproducible 100% in our environment.

      This issue seems very similar to https://issues.jenkins.io/browse/JENKINS-37487 which has been marked as resolved.

       

      Could you please help fixing the issue?

       

      [Pipeline] End of Pipeline
      ERROR: P4: Task Exception: java.io.IOException: Unable to delete file: C:\Jenkins\workspace\Vulcan\1.7\Unity_2
      Finished: FAILURE
      

      thanks a lot,

      Marius

        Attachments

          Activity

          maneamarius Marius created issue -
          maneamarius Marius made changes -
          Field Original Value New Value
          Description Hello,

           

          When cancelling a p4 sync, the files are locked and it is not possible to run another build until restarting Jenkins service.

          Tested with latest P4 plugin (1.11.4) and the issue is reproducible 100% in our environment.

          Could you please help fixing the issue?

           
          {code:java}
          [Pipeline] End of Pipeline
          ERROR: P4: Task Exception: java.io.IOException: Unable to delete file: C:\Jenkins\workspace\Vulcan\1.7\Unity_2
          Finished: FAILURE
          {code}
          thanks a lot,

          Marius
          Hello,

           

          When cancelling a p4 sync, the files are locked and it is not possible to run another build until restarting Jenkins service.

          Tested with latest P4 plugin (1.11.4) and the issue is reproducible 100% in our environment.

          This issue seems very similar to https://issues.jenkins.io/browse/JENKINS-37487 which has been marked as resolved.

           

          Could you please help fixing the issue?

           
          {code:java}
          [Pipeline] End of Pipeline
          ERROR: P4: Task Exception: java.io.IOException: Unable to delete file: C:\Jenkins\workspace\Vulcan\1.7\Unity_2
          Finished: FAILURE
          {code}
          thanks a lot,

          Marius
          Hide
          p4karl Karl Wirth added a comment -

          Hi Marius  - Thanks for highlighting this. How did you test the cancelling the job? Do you abort the job during a sync of a big file?

          Whats the organic reason that this occurs in your environment? For example do you need to abort jobs in certain circumstances?

          Show
          p4karl Karl Wirth added a comment - Hi Marius   - Thanks for highlighting this. How did you test the cancelling the job? Do you abort the job during a sync of a big file? Whats the organic reason that this occurs in your environment? For example do you need to abort jobs in certain circumstances?
          Hide
          maneamarius Marius added a comment -

          We don't have a specific need for this feature, but it may happens sometimes people realize they start the job with wrong parameters and they try to cancel the job right away.

          It happened serveral times before and only by restarting the jenkins service we can fix it (which can be a problem, if we have builds that are running - some of them take hours to complete).

          Hence, the need to have this fixed.

          Regarding your other question, the sync is not on a big file, but it's a rather large repo (see below)

           

          13:01:28 P4 Task: syncing files at change: 41019
          13:01:28 (p4):cmd:... p4 sync -p -q C:\Jenkins\workspace\Vulcan\1.7\Unity/...@41019
          13:01:29 ... totalFileSize 20500235028
          13:01:29 ... totalFileCount 36087
          13:01:29
          13:01:29 ... totalFileSize 20500235028
          13:01:29 ... totalFileCount 36087

          Show
          maneamarius Marius added a comment - We don't have a specific need for this feature, but it may happens sometimes people realize they start the job with wrong parameters and they try to cancel the job right away. It happened serveral times before and only by restarting the jenkins service we can fix it (which can be a problem, if we have builds that are running - some of them take hours to complete). Hence, the need to have this fixed. Regarding your other question, the sync is not on a big file, but it's a rather large repo (see below)   13:01:28 P4 Task: syncing files at change: 41019 13:01:28 (p4):cmd:... p4 sync -p -q C:\Jenkins\workspace\Vulcan\1.7\Unity/...@41019 13:01:29 ... totalFileSize 20500235028 13:01:29 ... totalFileCount 36087 13:01:29 13:01:29 ... totalFileSize 20500235028 13:01:29 ... totalFileCount 36087
          Hide
          p4karl Karl Wirth added a comment -

          Hi Marius,

          Thanks for the information. I'll try and repro this here and will let you know if I can.

          Show
          p4karl Karl Wirth added a comment - Hi Marius , Thanks for the information. I'll try and repro this here and will let you know if I can.
          p4karl Karl Wirth made changes -
          Assignee Karl Wirth [ p4karl ]
          p4karl Karl Wirth made changes -
          Labels p4 p4-plugin p4plugin plugin P4_SUPPORT p4 p4-plugin p4plugin plugin
          Hide
          maneamarius Marius added a comment -

          Thank you, I appreciate your help with this!

          Show
          maneamarius Marius added a comment - Thank you, I appreciate your help with this!
          p4karl Karl Wirth made changes -
          Attachment HangingLock.png [ 54657 ]
          Hide
          p4karl Karl Wirth added a comment - - edited

          Easily able to reproduce this.

          (1) Point at P4D server with millions of files.

          (2) Start any job with a forced Sync using a Windows slave.

          (3) Click the red abort icon.

          (4) Try to delete the workspace folder get an error that a file is locked.

          (5) Use the 'SysInternals' program 'handle' shows the file that is locked:

           

          Show
          p4karl Karl Wirth added a comment - - edited Easily able to reproduce this. (1) Point at P4D server with millions of files. (2) Start any job with a forced Sync using a Windows slave. (3) Click the red abort icon. (4) Try to delete the workspace folder get an error that a file is locked. (5) Use the 'SysInternals' program 'handle' shows the file that is locked:  
          p4karl Karl Wirth made changes -
          Assignee Karl Wirth [ p4karl ]
          p4karl Karl Wirth made changes -
          Labels P4_SUPPORT p4 p4-plugin p4plugin plugin P4_A P4_VERIFY p4 p4-plugin p4plugin plugin
          Hide
          p4karl Karl Wirth added a comment -

          Hi Marius

          Thanks for again highlighting this. I was easily able to reproduce it so have marked it for the developers to review during their next sprint planning sessions. I dont know when this will be.

          Regards.

          Karl

          Show
          p4karl Karl Wirth added a comment - Hi Marius Thanks for again highlighting this. I was easily able to reproduce it so have marked it for the developers to review during their next sprint planning sessions. I dont know when this will be. Regards. Karl
          Hide
          maneamarius Marius added a comment -

          Hey Karl,

           

          Thanks a lot for helping out!

          I look forward to hearing from you soon, regarding a fix

          have a great weekend,

          Marius

          Show
          maneamarius Marius added a comment - Hey Karl,   Thanks a lot for helping out! I look forward to hearing from you soon, regarding a fix have a great weekend, Marius
          Hide
          p4karl Karl Wirth added a comment -

          Hi Marius - Thanks and happy to be of assistance.

          Show
          p4karl Karl Wirth added a comment - Hi Marius - Thanks and happy to be of assistance.
          Hide
          maneamarius Marius added a comment -

          Hey Karl,

          Just checking if you have any updates on this?

          thanks a lot,

          Marius

          Show
          maneamarius Marius added a comment - Hey Karl, Just checking if you have any updates on this? thanks a lot, Marius
          Hide
          p4karl Karl Wirth added a comment -

          Hi Marius - The developers looked at the code and could see that we tear down the connection correctly so the hypothesis is that this is a Windows JVM or P4Java problem. They asked me to do some extra testing that I just finished and the results are below.

          Realistically I dont expect a fix for this until at least the summer. However if this starts to become a recurrent problem please let me know.

          For developers:

          (1) In my testing it takes approx 7 minutes for the lock to be released on my test system (Windows 10, java version "1.8.0_261").

          (2) There is no open connection to the Perforce server once abort has occurred.

          Evidence:

          Console log:
          P4 Task: syncing files at change: 1377926
          (p4):cmd:... p4 sync -p -q E:\filestore\Jenkins\Windows10Swarm201\workspace\BigSwarmOnWindows/..___
          ... totalFileSize 13468377
          ... totalFileCount 66267
          
          Build was aborted
          Aborted by anonymous
          (p4):stop:exceptionP4: ABORT called!
          duration: 0m 57s
          
          
          Agent log:
          May 04, 2021 1:14:15 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Connected
          May 04, 2021 1:16:02 PM org.jenkinsci.plugins.p4.tasks.AbstractTask retryTask
          WARNING: P4: Task Aborted!
          
          OS loggging (netstat and handle):
          E:\filestore\Jenkins>capture.bat
          Press [CTRL+C] to stop...]
          
          04/05/2021 13:15:51.76
          java.exe pid: 11256 DESKTOP-E4NIR9M\karl
            TCP    10.153.120.240:64303   10.5.41.112:1666       ESTABLISHED
            <NOTE - NO HANDLE HERE>
          
          04/05/2021 13:16:21.67
          java.exe pid: 11256 DESKTOP-E4NIR9M\karl
            <NOTE - NO CONNECTION HERE>
            7C4: File  (RW-)   E:\filestore\Jenkins\Windows10Swarm201\workspace\BigSwarmOnWindows\longfiles_devbranch\cumque\ratione.txtNote: 5 mins later handle still there:04/05/2021 
          
          
          04/05/2021 13:20:42.20
          java.exe pid: 11256 DESKTOP-E4NIR9M\karl
            7C4: File  (RW-)   E:\filestore\Jenkins\Windows10Swarm201\workspace\BigSwarmOnWindows\longfiles_devbranch\cumque\ratione.txt04/05/2021 13:23:02.14
          
          04/05/2021 13:23:12.15
             <HANDLE GONE>
          

          Stack trace for same scenario taken for the next test (netstat still showed no connection and handle shows held lock):

          May 04, 2021 1:29:34 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Connected
          
          May 04, 2021 1:30:56 PM org.jenkinsci.plugins.p4.tasks.AbstractTask retryTask
          WARNING: P4: Task Aborted!
          May 04, 2021 1:30:56 PM org.jenkinsci.plugins.p4.tasks.AbstractTask retryTask
          SEVERE: P4 Task: attempt: 1
          May 04, 2021 1:30:59 PM org.jenkinsci.plugins.p4.tasks.AbstractTask tryTask
          SEVERE: P4: Task Exception: hudson.AbortException: P4: Task Aborted!
          
          2021-05-04 13:31:17
          Full thread dump Java HotSpot(TM) Client VM (25.261-b12 mixed mode):"AWT-Windows" #34 daemon prio=6 os_prio=0 tid=0x46b7f800 nid=0x312c runnable [0x485bf000]
             java.lang.Thread.State: RUNNABLE
                  at sun.awt.windows.WToolkit.eventLoop(Native Method)
                  at sun.awt.windows.WToolkit.run(Unknown Source)
                  at java.lang.Thread.run(Unknown Source)"Java2D Disposer" #32 daemon prio=10 os_prio=2 tid=0x46b7f400 nid=0x4440 in Object.wait() [0x4849f000]
             java.lang.Thread.State: WAITING (on object monitor)
                  at java.lang.Object.wait(Native Method)
                  - waiting on <0x1a88e820> (a java.lang.ref.ReferenceQueue$Lock)
                  at java.lang.ref.ReferenceQueue.remove(Unknown Source)
                  - locked <0x1a88e820> (a java.lang.ref.ReferenceQueue$Lock)
                  at java.lang.ref.ReferenceQueue.remove(Unknown Source)
                  at sun.java2d.Disposer.run(Unknown Source)
                  at java.lang.Thread.run(Unknown Source)"pool-1-thread-11" #27 daemon prio=5 os_prio=0 tid=0x4699f400 nid=0x4198 waiting on condition [0x476cf000]
             java.lang.Thread.State: TIMED_WAITING (parking)
                  at sun.misc.Unsafe.park(Native Method)
                  - parking to wait for  <0x1a7532e8> (a java.util.concurrent.SynchronousQueue$TransferStack)
                  at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source)
                  at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(Unknown Source)
                  at java.util.concurrent.SynchronousQueue$TransferStack.transfer(Unknown Source)
                  at java.util.concurrent.SynchronousQueue.poll(Unknown Source)
                  at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
                  at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
                  at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:117)
                  at hudson.remoting.Engine$1$$Lambda$14/19512613.run(Unknown Source)
                  at java.lang.Thread.run(Unknown Source)"pool-1-thread-7" #23 daemon prio=5 os_prio=0 tid=0x469a2800 nid=0x5374 waiting on condition [0x4751f000]
             java.lang.Thread.State: TIMED_WAITING (parking)
                  at sun.misc.Unsafe.park(Native Method)
                  - parking to wait for  <0x1a7532e8> (a java.util.concurrent.SynchronousQueue$TransferStack)
                  at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source)
                  at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(Unknown Source)
                  at java.util.concurrent.SynchronousQueue$TransferStack.transfer(Unknown Source)
                  at java.util.concurrent.SynchronousQueue.poll(Unknown Source)
                  at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
                  at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
                  at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:117)
                  at hudson.remoting.Engine$1$$Lambda$14/19512613.run(Unknown Source)
                  at java.lang.Thread.run(Unknown Source)"pool-1-thread-6" #22 daemon prio=5 os_prio=0 tid=0x469a1400 nid=0x2fcc waiting on condition [0x4748f000]
             java.lang.Thread.State: TIMED_WAITING (parking)
                  at sun.misc.Unsafe.park(Native Method)
                  - parking to wait for  <0x1a7532e8> (a java.util.concurrent.SynchronousQueue$TransferStack)
                  at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source)
                  at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(Unknown Source)
                  at java.util.concurrent.SynchronousQueue$TransferStack.transfer(Unknown Source)
                  at java.util.concurrent.SynchronousQueue.poll(Unknown Source)
                  at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
                  at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
                  at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:117)
                  at hudson.remoting.Engine$1$$Lambda$14/19512613.run(Unknown Source)
                  at java.lang.Thread.run(Unknown Source)"Ping thread for channel hudson.remoting.Channel@3d3b00:Windows10Swarm201" #21 daemon prio=5 os_prio=0 tid=0x469a1c00 nid=0xddc waiting on condition [0x473ff000]
             java.lang.Thread.State: TIMED_WAITING (sleeping)
                  at java.lang.Thread.sleep(Native Method)
                  at hudson.remoting.PingThread.run(PingThread.java:94)"pool-1-thread-3" #17 daemon prio=5 os_prio=0 tid=0x46900800 nid=0x5d38 waiting on condition [0x470bf000]
             java.lang.Thread.State: TIMED_WAITING (parking)
                  at sun.misc.Unsafe.park(Native Method)
                  - parking to wait for  <0x1a7532e8> (a java.util.concurrent.SynchronousQueue$TransferStack)
                  at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source)
                  at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(Unknown Source)
                  at java.util.concurrent.SynchronousQueue$TransferStack.transfer(Unknown Source)
                  at java.util.concurrent.SynchronousQueue.poll(Unknown Source)
                  at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
                  at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
                  at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:117)
                  at hudson.remoting.Engine$1$$Lambda$14/19512613.run(Unknown Source)
                  at java.lang.Thread.run(Unknown Source)"pool-1-thread-2" #16 daemon prio=5 os_prio=0 tid=0x45fee000 nid=0x5cfc waiting on condition [0x46fef000]
             java.lang.Thread.State: TIMED_WAITING (parking)
                  at sun.misc.Unsafe.park(Native Method)
                  - parking to wait for  <0x1a7532e8> (a java.util.concurrent.SynchronousQueue$TransferStack)
                  at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source)
                  at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(Unknown Source)
                  at java.util.concurrent.SynchronousQueue$TransferStack.transfer(Unknown Source)
                  at java.util.concurrent.SynchronousQueue.poll(Unknown Source)
                  at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
                  at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
                  at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:117)
                  at hudson.remoting.Engine$1$$Lambda$14/19512613.run(Unknown Source)
                  at java.lang.Thread.run(Unknown Source)"RemoteInvocationHandler [#1]" #14 daemon prio=5 os_prio=0 tid=0x45e83400 nid=0x64c in Object.wait() [0x46ecf000]
             java.lang.Thread.State: TIMED_WAITING (on object monitor)
                  at java.lang.Object.wait(Native Method)
                  - waiting on <0x1a71f678> (a java.lang.ref.ReferenceQueue$Lock)
                  at java.lang.ref.ReferenceQueue.remove(Unknown Source)
                  - locked <0x1a71f678> (a java.lang.ref.ReferenceQueue$Lock)
                  at hudson.remoting.RemoteInvocationHandler$Unexporter.run(RemoteInvocationHandler.java:600)
                  at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
                  at java.util.concurrent.FutureTask.run(Unknown Source)
                  at hudson.remoting.AtmostOneThreadExecutor$Worker.run(AtmostOneThreadExecutor.java:111)
                  at java.lang.Thread.run(Unknown Source)" tyrus-jdk-client-1" #12 daemon prio=5 os_prio=0 tid=0x46895400 nid=0x1284 runnable [0x46daf000]
             java.lang.Thread.State: RUNNABLE
                  at sun.nio.ch.Iocp.getQueuedCompletionStatus(Native Method)
                  at sun.nio.ch.Iocp.access$300(Unknown Source)
                  at sun.nio.ch.Iocp$EventHandlerTask.run(Unknown Source)
                  at sun.nio.ch.AsynchronousChannelGroupImpl$1.run(Unknown Source)
                  at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
                  at java.lang.Thread.run(Unknown Source)"Thread-2" #11 daemon prio=5 os_prio=0 tid=0x45fde400 nid=0xe94 runnable [0x46d1f000]
             java.lang.Thread.State: RUNNABLE
                  at sun.nio.ch.Iocp.getQueuedCompletionStatus(Native Method)
                  at sun.nio.ch.Iocp.access$300(Unknown Source)
                  at sun.nio.ch.Iocp$EventHandlerTask.run(Unknown Source)
                  at java.lang.Thread.run(Unknown Source)"Thread-1" #10 prio=5 os_prio=0 tid=0x45f39000 nid=0x34f4 in Object.wait() [0x4630f000]
             java.lang.Thread.State: TIMED_WAITING (on object monitor)
                  at java.lang.Object.wait(Native Method)
                  - waiting on <0x1a777b90> (a hudson.remoting.Channel)
                  at hudson.remoting.Channel.join(Channel.java:1182)
                  - locked <0x1a777b90> (a hudson.remoting.Channel)
                  at hudson.remoting.Engine.runWebSocket(Engine.java:633)
                  at hudson.remoting.Engine.run(Engine.java:469)"Service Thread" #7 daemon prio=9 os_prio=0 tid=0x45a2ec00 nid=0x5db0 runnable [0x00000000]
             java.lang.Thread.State: RUNNABLE"C1 CompilerThread0" #6 daemon prio=9 os_prio=2 tid=0x45a0d000 nid=0x6d0 waiting on condition [0x00000000]
             java.lang.Thread.State: RUNNABLE"Attach Listener" #5 daemon prio=5 os_prio=2 tid=0x45a0bc00 nid=0x3074 runnable [0x00000000]
             java.lang.Thread.State: RUNNABLE"Signal Dispatcher" #4 daemon prio=9 os_prio=2 tid=0x45a0a000 nid=0xe00 waiting on condition [0x00000000]
             java.lang.Thread.State: RUNNABLE"Finalizer" #3 daemon prio=8 os_prio=1 tid=0x0151b400 nid=0x5a8c in Object.wait() [0x458ef000]
             java.lang.Thread.State: WAITING (on object monitor)
                  at java.lang.Object.wait(Native Method)
                  - waiting on <0x1a570fb0> (a java.lang.ref.ReferenceQueue$Lock)
                  at java.lang.ref.ReferenceQueue.remove(Unknown Source)
                  - locked <0x1a570fb0> (a java.lang.ref.ReferenceQueue$Lock)
                  at java.lang.ref.ReferenceQueue.remove(Unknown Source)
                  at java.lang.ref.Finalizer$FinalizerThread.run(Unknown Source)"Reference Handler" #2 daemon prio=10 os_prio=2 tid=0x01505800 nid=0x59e4 in Object.wait() [0x4585f000]
             java.lang.Thread.State: WAITING (on object monitor)
                  at java.lang.Object.wait(Native Method)
                  - waiting on <0x1a571150> (a java.lang.ref.Reference$Lock)
                  at java.lang.Object.wait(Unknown Source)
                  at java.lang.ref.Reference.tryHandlePending(Unknown Source)
                  - locked <0x1a571150> (a java.lang.ref.Reference$Lock)
                  at java.lang.ref.Reference$ReferenceHandler.run(Unknown Source)"main" #1 prio=5 os_prio=0 tid=0x0144fc00 nid=0x2218 in Object.wait() [0x02eaf000]
             java.lang.Thread.State: WAITING (on object monitor)
                  at java.lang.Object.wait(Native Method)
          
          Show
          p4karl Karl Wirth added a comment - Hi Marius - The developers looked at the code and could see that we tear down the connection correctly so the hypothesis is that this is a Windows JVM or P4Java problem. They asked me to do some extra testing that I just finished and the results are below. Realistically I dont expect a fix for this until at least the summer. However if this starts to become a recurrent problem please let me know. For developers: (1) In my testing it takes approx 7 minutes for the lock to be released on my test system (Windows 10, java version "1.8.0_261"). (2) There is no open connection to the Perforce server once abort has occurred. Evidence: Console log: P4 Task: syncing files at change: 1377926 (p4):cmd:... p4 sync -p -q E:\filestore\Jenkins\Windows10Swarm201\workspace\BigSwarmOnWindows/..___ ... totalFileSize 13468377 ... totalFileCount 66267 Build was aborted Aborted by anonymous (p4):stop:exceptionP4: ABORT called! duration: 0m 57s Agent log: May 04, 2021 1:14:15 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Connected May 04, 2021 1:16:02 PM org.jenkinsci.plugins.p4.tasks.AbstractTask retryTask WARNING: P4: Task Aborted! OS loggging (netstat and handle): E:\filestore\Jenkins>capture.bat Press [CTRL+C] to stop...] 04/05/2021 13:15:51.76 java.exe pid: 11256 DESKTOP-E4NIR9M\karl TCP 10.153.120.240:64303 10.5.41.112:1666 ESTABLISHED <NOTE - NO HANDLE HERE> 04/05/2021 13:16:21.67 java.exe pid: 11256 DESKTOP-E4NIR9M\karl <NOTE - NO CONNECTION HERE> 7C4: File (RW-) E:\filestore\Jenkins\Windows10Swarm201\workspace\BigSwarmOnWindows\longfiles_devbranch\cumque\ratione.txtNote: 5 mins later handle still there:04/05/2021 04/05/2021 13:20:42.20 java.exe pid: 11256 DESKTOP-E4NIR9M\karl 7C4: File (RW-) E:\filestore\Jenkins\Windows10Swarm201\workspace\BigSwarmOnWindows\longfiles_devbranch\cumque\ratione.txt04/05/2021 13:23:02.14 04/05/2021 13:23:12.15 <HANDLE GONE> Stack trace for same scenario taken for the next test (netstat still showed no connection and handle shows held lock): May 04, 2021 1:29:34 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Connected May 04, 2021 1:30:56 PM org.jenkinsci.plugins.p4.tasks.AbstractTask retryTask WARNING: P4: Task Aborted! May 04, 2021 1:30:56 PM org.jenkinsci.plugins.p4.tasks.AbstractTask retryTask SEVERE: P4 Task: attempt: 1 May 04, 2021 1:30:59 PM org.jenkinsci.plugins.p4.tasks.AbstractTask tryTask SEVERE: P4: Task Exception: hudson.AbortException: P4: Task Aborted! 2021-05-04 13:31:17 Full thread dump Java HotSpot(TM) Client VM (25.261-b12 mixed mode): "AWT-Windows" #34 daemon prio=6 os_prio=0 tid=0x46b7f800 nid=0x312c runnable [0x485bf000] java.lang. Thread .State: RUNNABLE at sun.awt.windows.WToolkit.eventLoop(Native Method) at sun.awt.windows.WToolkit.run(Unknown Source) at java.lang. Thread .run(Unknown Source) "Java2D Disposer" #32 daemon prio=10 os_prio=2 tid=0x46b7f400 nid=0x4440 in Object .wait() [0x4849f000] java.lang. Thread .State: WAITING (on object monitor) at java.lang. Object .wait(Native Method) - waiting on <0x1a88e820> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(Unknown Source) - locked <0x1a88e820> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(Unknown Source) at sun.java2d.Disposer.run(Unknown Source) at java.lang. Thread .run(Unknown Source) "pool-1-thread-11" #27 daemon prio=5 os_prio=0 tid=0x4699f400 nid=0x4198 waiting on condition [0x476cf000] java.lang. Thread .State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x1a7532e8> (a java.util.concurrent.SynchronousQueue$TransferStack) at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source) at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(Unknown Source) at java.util.concurrent.SynchronousQueue$TransferStack.transfer(Unknown Source) at java.util.concurrent.SynchronousQueue.poll(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:117) at hudson.remoting.Engine$1$$Lambda$14/19512613.run(Unknown Source) at java.lang. Thread .run(Unknown Source) "pool-1-thread-7" #23 daemon prio=5 os_prio=0 tid=0x469a2800 nid=0x5374 waiting on condition [0x4751f000] java.lang. Thread .State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x1a7532e8> (a java.util.concurrent.SynchronousQueue$TransferStack) at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source) at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(Unknown Source) at java.util.concurrent.SynchronousQueue$TransferStack.transfer(Unknown Source) at java.util.concurrent.SynchronousQueue.poll(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:117) at hudson.remoting.Engine$1$$Lambda$14/19512613.run(Unknown Source) at java.lang. Thread .run(Unknown Source) "pool-1-thread-6" #22 daemon prio=5 os_prio=0 tid=0x469a1400 nid=0x2fcc waiting on condition [0x4748f000] java.lang. Thread .State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x1a7532e8> (a java.util.concurrent.SynchronousQueue$TransferStack) at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source) at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(Unknown Source) at java.util.concurrent.SynchronousQueue$TransferStack.transfer(Unknown Source) at java.util.concurrent.SynchronousQueue.poll(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:117) at hudson.remoting.Engine$1$$Lambda$14/19512613.run(Unknown Source) at java.lang. Thread .run(Unknown Source) "Ping thread for channel hudson.remoting.Channel@3d3b00:Windows10Swarm201" #21 daemon prio=5 os_prio=0 tid=0x469a1c00 nid=0xddc waiting on condition [0x473ff000] java.lang. Thread .State: TIMED_WAITING (sleeping) at java.lang. Thread .sleep(Native Method) at hudson.remoting.PingThread.run(PingThread.java:94) "pool-1-thread-3" #17 daemon prio=5 os_prio=0 tid=0x46900800 nid=0x5d38 waiting on condition [0x470bf000] java.lang. Thread .State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x1a7532e8> (a java.util.concurrent.SynchronousQueue$TransferStack) at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source) at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(Unknown Source) at java.util.concurrent.SynchronousQueue$TransferStack.transfer(Unknown Source) at java.util.concurrent.SynchronousQueue.poll(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:117) at hudson.remoting.Engine$1$$Lambda$14/19512613.run(Unknown Source) at java.lang. Thread .run(Unknown Source) "pool-1-thread-2" #16 daemon prio=5 os_prio=0 tid=0x45fee000 nid=0x5cfc waiting on condition [0x46fef000] java.lang. Thread .State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x1a7532e8> (a java.util.concurrent.SynchronousQueue$TransferStack) at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source) at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(Unknown Source) at java.util.concurrent.SynchronousQueue$TransferStack.transfer(Unknown Source) at java.util.concurrent.SynchronousQueue.poll(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:117) at hudson.remoting.Engine$1$$Lambda$14/19512613.run(Unknown Source) at java.lang. Thread .run(Unknown Source) "RemoteInvocationHandler [#1]" #14 daemon prio=5 os_prio=0 tid=0x45e83400 nid=0x64c in Object .wait() [0x46ecf000] java.lang. Thread .State: TIMED_WAITING (on object monitor) at java.lang. Object .wait(Native Method) - waiting on <0x1a71f678> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(Unknown Source) - locked <0x1a71f678> (a java.lang.ref.ReferenceQueue$Lock) at hudson.remoting.RemoteInvocationHandler$Unexporter.run(RemoteInvocationHandler.java:600) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at hudson.remoting.AtmostOneThreadExecutor$Worker.run(AtmostOneThreadExecutor.java:111) at java.lang. Thread .run(Unknown Source) " tyrus-jdk-client-1" #12 daemon prio=5 os_prio=0 tid=0x46895400 nid=0x1284 runnable [0x46daf000] java.lang. Thread .State: RUNNABLE at sun.nio.ch.Iocp.getQueuedCompletionStatus(Native Method) at sun.nio.ch.Iocp.access$300(Unknown Source) at sun.nio.ch.Iocp$EventHandlerTask.run(Unknown Source) at sun.nio.ch.AsynchronousChannelGroupImpl$1.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang. Thread .run(Unknown Source) " Thread -2" #11 daemon prio=5 os_prio=0 tid=0x45fde400 nid=0xe94 runnable [0x46d1f000] java.lang. Thread .State: RUNNABLE at sun.nio.ch.Iocp.getQueuedCompletionStatus(Native Method) at sun.nio.ch.Iocp.access$300(Unknown Source) at sun.nio.ch.Iocp$EventHandlerTask.run(Unknown Source) at java.lang. Thread .run(Unknown Source) " Thread -1" #10 prio=5 os_prio=0 tid=0x45f39000 nid=0x34f4 in Object .wait() [0x4630f000] java.lang. Thread .State: TIMED_WAITING (on object monitor) at java.lang. Object .wait(Native Method) - waiting on <0x1a777b90> (a hudson.remoting.Channel) at hudson.remoting.Channel.join(Channel.java:1182) - locked <0x1a777b90> (a hudson.remoting.Channel) at hudson.remoting.Engine.runWebSocket(Engine.java:633) at hudson.remoting.Engine.run(Engine.java:469) "Service Thread " #7 daemon prio=9 os_prio=0 tid=0x45a2ec00 nid=0x5db0 runnable [0x00000000] java.lang. Thread .State: RUNNABLE "C1 CompilerThread0" #6 daemon prio=9 os_prio=2 tid=0x45a0d000 nid=0x6d0 waiting on condition [0x00000000] java.lang. Thread .State: RUNNABLE "Attach Listener" #5 daemon prio=5 os_prio=2 tid=0x45a0bc00 nid=0x3074 runnable [0x00000000] java.lang. Thread .State: RUNNABLE "Signal Dispatcher" #4 daemon prio=9 os_prio=2 tid=0x45a0a000 nid=0xe00 waiting on condition [0x00000000] java.lang. Thread .State: RUNNABLE "Finalizer" #3 daemon prio=8 os_prio=1 tid=0x0151b400 nid=0x5a8c in Object .wait() [0x458ef000] java.lang. Thread .State: WAITING (on object monitor) at java.lang. Object .wait(Native Method) - waiting on <0x1a570fb0> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(Unknown Source) - locked <0x1a570fb0> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(Unknown Source) at java.lang.ref.Finalizer$FinalizerThread.run(Unknown Source) "Reference Handler" #2 daemon prio=10 os_prio=2 tid=0x01505800 nid=0x59e4 in Object .wait() [0x4585f000] java.lang. Thread .State: WAITING (on object monitor) at java.lang. Object .wait(Native Method) - waiting on <0x1a571150> (a java.lang.ref.Reference$Lock) at java.lang. Object .wait(Unknown Source) at java.lang.ref.Reference.tryHandlePending(Unknown Source) - locked <0x1a571150> (a java.lang.ref.Reference$Lock) at java.lang.ref.Reference$ReferenceHandler.run(Unknown Source) "main" #1 prio=5 os_prio=0 tid=0x0144fc00 nid=0x2218 in Object .wait() [0x02eaf000] java.lang. Thread .State: WAITING (on object monitor) at java.lang. Object .wait(Native Method)
          Hide
          maneamarius Marius added a comment -

          Hi Karl,

           

          Thanks for the detailed explanation!

          Yes, this problem is still happening and it's very annoying, because we need to restart Jenkins slave service to get rid of it

          If there's any way to prioritize the fix for it, we'd really appreciate it..

           

          thanks a lot,

          Marius

          Show
          maneamarius Marius added a comment - Hi Karl,   Thanks for the detailed explanation! Yes, this problem is still happening and it's very annoying, because we need to restart Jenkins slave service to get rid of it If there's any way to prioritize the fix for it, we'd really appreciate it..   thanks a lot, Marius
          p4karl Karl Wirth made changes -
          Priority Major [ 3 ] Blocker [ 1 ]
          Hide
          p4karl Karl Wirth added a comment -

          Hi Marius - Thanks. I dont have any say on the scheduling but I will raise this to the product managers again.

          Show
          p4karl Karl Wirth added a comment - Hi Marius - Thanks. I dont have any say on the scheduling but I will raise this to the product managers again.
          Hide
          p4karl Karl Wirth added a comment -

          Note - have seen a new case of this. In that case the sync was still running in the background so overlapped the new sync when the job was rerun.

          Show
          p4karl Karl Wirth added a comment - Note - have seen a new case of this. In that case the sync was still running in the background so overlapped the new sync when the job was rerun.
          Hide
          p4karl Karl Wirth added a comment -

          Please consider this as a top priority when planning next sprint.

          Show
          p4karl Karl Wirth added a comment - Please consider this as a top priority when planning next sprint.
          Hide
          maneamarius Marius added a comment -

          hey guys, I appreciate your help and I look forward for the fix!

          Please keep us posted

          Show
          maneamarius Marius added a comment - hey guys, I appreciate your help and I look forward for the fix! Please keep us posted
          Hide
          maneamarius Marius added a comment -

          hello,

          Just checking to see if you have any updates regarding this issue?

          thanks a lot,

          Show
          maneamarius Marius added a comment - hello, Just checking to see if you have any updates regarding this issue? thanks a lot,

            People

            Assignee:
            Unassigned Unassigned
            Reporter:
            maneamarius Marius
            Votes:
            3 Vote for this issue
            Watchers:
            5 Start watching this issue

              Dates

              Created:
              Updated: