Hello,

       

      When cancelling a p4 sync, the files are locked and it is not possible to run another build until restarting Jenkins service.

      Tested with latest P4 plugin (1.11.4) and the issue is reproducible 100% in our environment.

      This issue seems very similar to https://issues.jenkins.io/browse/JENKINS-37487 which has been marked as resolved.

       

      Could you please help fixing the issue?

       

      [Pipeline] End of Pipeline
      ERROR: P4: Task Exception: java.io.IOException: Unable to delete file: C:\Jenkins\workspace\Vulcan\1.7\Unity_2
      Finished: FAILURE
      

      thanks a lot,

      Marius

          [JENKINS-65418] Cancelling p4 sync results in locked files

          Karl Wirth added a comment -

          Hi maneamarius  - Thanks for highlighting this. How did you test the cancelling the job? Do you abort the job during a sync of a big file?

          Whats the organic reason that this occurs in your environment? For example do you need to abort jobs in certain circumstances?

          Karl Wirth added a comment - Hi maneamarius   - Thanks for highlighting this. How did you test the cancelling the job? Do you abort the job during a sync of a big file? Whats the organic reason that this occurs in your environment? For example do you need to abort jobs in certain circumstances?

          Marius added a comment -

          We don't have a specific need for this feature, but it may happens sometimes people realize they start the job with wrong parameters and they try to cancel the job right away.

          It happened serveral times before and only by restarting the jenkins service we can fix it (which can be a problem, if we have builds that are running - some of them take hours to complete).

          Hence, the need to have this fixed.

          Regarding your other question, the sync is not on a big file, but it's a rather large repo (see below)

           

          13:01:28 P4 Task: syncing files at change: 41019
          13:01:28 (p4):cmd:... p4 sync -p -q C:\Jenkins\workspace\Vulcan\1.7\Unity/...@41019
          13:01:29 ... totalFileSize 20500235028
          13:01:29 ... totalFileCount 36087
          13:01:29
          13:01:29 ... totalFileSize 20500235028
          13:01:29 ... totalFileCount 36087

          Marius added a comment - We don't have a specific need for this feature, but it may happens sometimes people realize they start the job with wrong parameters and they try to cancel the job right away. It happened serveral times before and only by restarting the jenkins service we can fix it (which can be a problem, if we have builds that are running - some of them take hours to complete). Hence, the need to have this fixed. Regarding your other question, the sync is not on a big file, but it's a rather large repo (see below)   13:01:28 P4 Task: syncing files at change: 41019 13:01:28 (p4):cmd:... p4 sync -p -q C:\Jenkins\workspace\Vulcan\1.7\Unity/...@41019 13:01:29 ... totalFileSize 20500235028 13:01:29 ... totalFileCount 36087 13:01:29 13:01:29 ... totalFileSize 20500235028 13:01:29 ... totalFileCount 36087

          Karl Wirth added a comment -

          Hi maneamarius,

          Thanks for the information. I'll try and repro this here and will let you know if I can.

          Karl Wirth added a comment - Hi maneamarius , Thanks for the information. I'll try and repro this here and will let you know if I can.

          Marius added a comment -

          Thank you, I appreciate your help with this!

          Marius added a comment - Thank you, I appreciate your help with this!

          Karl Wirth added a comment - - edited

          Easily able to reproduce this.

          (1) Point at P4D server with millions of files.

          (2) Start any job with a forced Sync using a Windows slave.

          (3) Click the red abort icon.

          (4) Try to delete the workspace folder get an error that a file is locked.

          (5) Use the 'SysInternals' program 'handle' shows the file that is locked:

           

          Karl Wirth added a comment - - edited Easily able to reproduce this. (1) Point at P4D server with millions of files. (2) Start any job with a forced Sync using a Windows slave. (3) Click the red abort icon. (4) Try to delete the workspace folder get an error that a file is locked. (5) Use the 'SysInternals' program 'handle' shows the file that is locked:  

          Karl Wirth added a comment -

          Hi maneamarius

          Thanks for again highlighting this. I was easily able to reproduce it so have marked it for the developers to review during their next sprint planning sessions. I dont know when this will be.

          Regards.

          Karl

          Karl Wirth added a comment - Hi maneamarius Thanks for again highlighting this. I was easily able to reproduce it so have marked it for the developers to review during their next sprint planning sessions. I dont know when this will be. Regards. Karl

          Marius added a comment -

          Hey Karl,

           

          Thanks a lot for helping out!

          I look forward to hearing from you soon, regarding a fix

          have a great weekend,

          Marius

          Marius added a comment - Hey Karl,   Thanks a lot for helping out! I look forward to hearing from you soon, regarding a fix have a great weekend, Marius

          Karl Wirth added a comment -

          Hi maneamarius - Thanks and happy to be of assistance.

          Karl Wirth added a comment - Hi maneamarius - Thanks and happy to be of assistance.

          Marius added a comment -

          Hey Karl,

          Just checking if you have any updates on this?

          thanks a lot,

          Marius

          Marius added a comment - Hey Karl, Just checking if you have any updates on this? thanks a lot, Marius

          Karl Wirth added a comment -

          Hi maneamarius - The developers looked at the code and could see that we tear down the connection correctly so the hypothesis is that this is a Windows JVM or P4Java problem. They asked me to do some extra testing that I just finished and the results are below.

          Realistically I dont expect a fix for this until at least the summer. However if this starts to become a recurrent problem please let me know.

          For developers:

          (1) In my testing it takes approx 7 minutes for the lock to be released on my test system (Windows 10, java version "1.8.0_261").

          (2) There is no open connection to the Perforce server once abort has occurred.

          Evidence:

          Console log:
          P4 Task: syncing files at change: 1377926
          (p4):cmd:... p4 sync -p -q E:\filestore\Jenkins\Windows10Swarm201\workspace\BigSwarmOnWindows/..___
          ... totalFileSize 13468377
          ... totalFileCount 66267
          
          Build was aborted
          Aborted by anonymous
          (p4):stop:exceptionP4: ABORT called!
          duration: 0m 57s
          
          
          Agent log:
          May 04, 2021 1:14:15 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Connected
          May 04, 2021 1:16:02 PM org.jenkinsci.plugins.p4.tasks.AbstractTask retryTask
          WARNING: P4: Task Aborted!
          
          OS loggging (netstat and handle):
          E:\filestore\Jenkins>capture.bat
          Press [CTRL+C] to stop...]
          
          04/05/2021 13:15:51.76
          java.exe pid: 11256 DESKTOP-E4NIR9M\karl
            TCP    10.153.120.240:64303   10.5.41.112:1666       ESTABLISHED
            <NOTE - NO HANDLE HERE>
          
          04/05/2021 13:16:21.67
          java.exe pid: 11256 DESKTOP-E4NIR9M\karl
            <NOTE - NO CONNECTION HERE>
            7C4: File  (RW-)   E:\filestore\Jenkins\Windows10Swarm201\workspace\BigSwarmOnWindows\longfiles_devbranch\cumque\ratione.txtNote: 5 mins later handle still there:04/05/2021 
          
          
          04/05/2021 13:20:42.20
          java.exe pid: 11256 DESKTOP-E4NIR9M\karl
            7C4: File  (RW-)   E:\filestore\Jenkins\Windows10Swarm201\workspace\BigSwarmOnWindows\longfiles_devbranch\cumque\ratione.txt04/05/2021 13:23:02.14
          
          04/05/2021 13:23:12.15
             <HANDLE GONE>
          

          Stack trace for same scenario taken for the next test (netstat still showed no connection and handle shows held lock):

          May 04, 2021 1:29:34 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Connected
          
          May 04, 2021 1:30:56 PM org.jenkinsci.plugins.p4.tasks.AbstractTask retryTask
          WARNING: P4: Task Aborted!
          May 04, 2021 1:30:56 PM org.jenkinsci.plugins.p4.tasks.AbstractTask retryTask
          SEVERE: P4 Task: attempt: 1
          May 04, 2021 1:30:59 PM org.jenkinsci.plugins.p4.tasks.AbstractTask tryTask
          SEVERE: P4: Task Exception: hudson.AbortException: P4: Task Aborted!
          
          2021-05-04 13:31:17
          Full thread dump Java HotSpot(TM) Client VM (25.261-b12 mixed mode):"AWT-Windows" #34 daemon prio=6 os_prio=0 tid=0x46b7f800 nid=0x312c runnable [0x485bf000]
             java.lang.Thread.State: RUNNABLE
                  at sun.awt.windows.WToolkit.eventLoop(Native Method)
                  at sun.awt.windows.WToolkit.run(Unknown Source)
                  at java.lang.Thread.run(Unknown Source)"Java2D Disposer" #32 daemon prio=10 os_prio=2 tid=0x46b7f400 nid=0x4440 in Object.wait() [0x4849f000]
             java.lang.Thread.State: WAITING (on object monitor)
                  at java.lang.Object.wait(Native Method)
                  - waiting on <0x1a88e820> (a java.lang.ref.ReferenceQueue$Lock)
                  at java.lang.ref.ReferenceQueue.remove(Unknown Source)
                  - locked <0x1a88e820> (a java.lang.ref.ReferenceQueue$Lock)
                  at java.lang.ref.ReferenceQueue.remove(Unknown Source)
                  at sun.java2d.Disposer.run(Unknown Source)
                  at java.lang.Thread.run(Unknown Source)"pool-1-thread-11" #27 daemon prio=5 os_prio=0 tid=0x4699f400 nid=0x4198 waiting on condition [0x476cf000]
             java.lang.Thread.State: TIMED_WAITING (parking)
                  at sun.misc.Unsafe.park(Native Method)
                  - parking to wait for  <0x1a7532e8> (a java.util.concurrent.SynchronousQueue$TransferStack)
                  at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source)
                  at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(Unknown Source)
                  at java.util.concurrent.SynchronousQueue$TransferStack.transfer(Unknown Source)
                  at java.util.concurrent.SynchronousQueue.poll(Unknown Source)
                  at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
                  at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
                  at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:117)
                  at hudson.remoting.Engine$1$$Lambda$14/19512613.run(Unknown Source)
                  at java.lang.Thread.run(Unknown Source)"pool-1-thread-7" #23 daemon prio=5 os_prio=0 tid=0x469a2800 nid=0x5374 waiting on condition [0x4751f000]
             java.lang.Thread.State: TIMED_WAITING (parking)
                  at sun.misc.Unsafe.park(Native Method)
                  - parking to wait for  <0x1a7532e8> (a java.util.concurrent.SynchronousQueue$TransferStack)
                  at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source)
                  at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(Unknown Source)
                  at java.util.concurrent.SynchronousQueue$TransferStack.transfer(Unknown Source)
                  at java.util.concurrent.SynchronousQueue.poll(Unknown Source)
                  at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
                  at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
                  at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:117)
                  at hudson.remoting.Engine$1$$Lambda$14/19512613.run(Unknown Source)
                  at java.lang.Thread.run(Unknown Source)"pool-1-thread-6" #22 daemon prio=5 os_prio=0 tid=0x469a1400 nid=0x2fcc waiting on condition [0x4748f000]
             java.lang.Thread.State: TIMED_WAITING (parking)
                  at sun.misc.Unsafe.park(Native Method)
                  - parking to wait for  <0x1a7532e8> (a java.util.concurrent.SynchronousQueue$TransferStack)
                  at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source)
                  at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(Unknown Source)
                  at java.util.concurrent.SynchronousQueue$TransferStack.transfer(Unknown Source)
                  at java.util.concurrent.SynchronousQueue.poll(Unknown Source)
                  at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
                  at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
                  at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:117)
                  at hudson.remoting.Engine$1$$Lambda$14/19512613.run(Unknown Source)
                  at java.lang.Thread.run(Unknown Source)"Ping thread for channel hudson.remoting.Channel@3d3b00:Windows10Swarm201" #21 daemon prio=5 os_prio=0 tid=0x469a1c00 nid=0xddc waiting on condition [0x473ff000]
             java.lang.Thread.State: TIMED_WAITING (sleeping)
                  at java.lang.Thread.sleep(Native Method)
                  at hudson.remoting.PingThread.run(PingThread.java:94)"pool-1-thread-3" #17 daemon prio=5 os_prio=0 tid=0x46900800 nid=0x5d38 waiting on condition [0x470bf000]
             java.lang.Thread.State: TIMED_WAITING (parking)
                  at sun.misc.Unsafe.park(Native Method)
                  - parking to wait for  <0x1a7532e8> (a java.util.concurrent.SynchronousQueue$TransferStack)
                  at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source)
                  at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(Unknown Source)
                  at java.util.concurrent.SynchronousQueue$TransferStack.transfer(Unknown Source)
                  at java.util.concurrent.SynchronousQueue.poll(Unknown Source)
                  at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
                  at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
                  at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:117)
                  at hudson.remoting.Engine$1$$Lambda$14/19512613.run(Unknown Source)
                  at java.lang.Thread.run(Unknown Source)"pool-1-thread-2" #16 daemon prio=5 os_prio=0 tid=0x45fee000 nid=0x5cfc waiting on condition [0x46fef000]
             java.lang.Thread.State: TIMED_WAITING (parking)
                  at sun.misc.Unsafe.park(Native Method)
                  - parking to wait for  <0x1a7532e8> (a java.util.concurrent.SynchronousQueue$TransferStack)
                  at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source)
                  at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(Unknown Source)
                  at java.util.concurrent.SynchronousQueue$TransferStack.transfer(Unknown Source)
                  at java.util.concurrent.SynchronousQueue.poll(Unknown Source)
                  at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
                  at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
                  at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:117)
                  at hudson.remoting.Engine$1$$Lambda$14/19512613.run(Unknown Source)
                  at java.lang.Thread.run(Unknown Source)"RemoteInvocationHandler [#1]" #14 daemon prio=5 os_prio=0 tid=0x45e83400 nid=0x64c in Object.wait() [0x46ecf000]
             java.lang.Thread.State: TIMED_WAITING (on object monitor)
                  at java.lang.Object.wait(Native Method)
                  - waiting on <0x1a71f678> (a java.lang.ref.ReferenceQueue$Lock)
                  at java.lang.ref.ReferenceQueue.remove(Unknown Source)
                  - locked <0x1a71f678> (a java.lang.ref.ReferenceQueue$Lock)
                  at hudson.remoting.RemoteInvocationHandler$Unexporter.run(RemoteInvocationHandler.java:600)
                  at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
                  at java.util.concurrent.FutureTask.run(Unknown Source)
                  at hudson.remoting.AtmostOneThreadExecutor$Worker.run(AtmostOneThreadExecutor.java:111)
                  at java.lang.Thread.run(Unknown Source)" tyrus-jdk-client-1" #12 daemon prio=5 os_prio=0 tid=0x46895400 nid=0x1284 runnable [0x46daf000]
             java.lang.Thread.State: RUNNABLE
                  at sun.nio.ch.Iocp.getQueuedCompletionStatus(Native Method)
                  at sun.nio.ch.Iocp.access$300(Unknown Source)
                  at sun.nio.ch.Iocp$EventHandlerTask.run(Unknown Source)
                  at sun.nio.ch.AsynchronousChannelGroupImpl$1.run(Unknown Source)
                  at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
                  at java.lang.Thread.run(Unknown Source)"Thread-2" #11 daemon prio=5 os_prio=0 tid=0x45fde400 nid=0xe94 runnable [0x46d1f000]
             java.lang.Thread.State: RUNNABLE
                  at sun.nio.ch.Iocp.getQueuedCompletionStatus(Native Method)
                  at sun.nio.ch.Iocp.access$300(Unknown Source)
                  at sun.nio.ch.Iocp$EventHandlerTask.run(Unknown Source)
                  at java.lang.Thread.run(Unknown Source)"Thread-1" #10 prio=5 os_prio=0 tid=0x45f39000 nid=0x34f4 in Object.wait() [0x4630f000]
             java.lang.Thread.State: TIMED_WAITING (on object monitor)
                  at java.lang.Object.wait(Native Method)
                  - waiting on <0x1a777b90> (a hudson.remoting.Channel)
                  at hudson.remoting.Channel.join(Channel.java:1182)
                  - locked <0x1a777b90> (a hudson.remoting.Channel)
                  at hudson.remoting.Engine.runWebSocket(Engine.java:633)
                  at hudson.remoting.Engine.run(Engine.java:469)"Service Thread" #7 daemon prio=9 os_prio=0 tid=0x45a2ec00 nid=0x5db0 runnable [0x00000000]
             java.lang.Thread.State: RUNNABLE"C1 CompilerThread0" #6 daemon prio=9 os_prio=2 tid=0x45a0d000 nid=0x6d0 waiting on condition [0x00000000]
             java.lang.Thread.State: RUNNABLE"Attach Listener" #5 daemon prio=5 os_prio=2 tid=0x45a0bc00 nid=0x3074 runnable [0x00000000]
             java.lang.Thread.State: RUNNABLE"Signal Dispatcher" #4 daemon prio=9 os_prio=2 tid=0x45a0a000 nid=0xe00 waiting on condition [0x00000000]
             java.lang.Thread.State: RUNNABLE"Finalizer" #3 daemon prio=8 os_prio=1 tid=0x0151b400 nid=0x5a8c in Object.wait() [0x458ef000]
             java.lang.Thread.State: WAITING (on object monitor)
                  at java.lang.Object.wait(Native Method)
                  - waiting on <0x1a570fb0> (a java.lang.ref.ReferenceQueue$Lock)
                  at java.lang.ref.ReferenceQueue.remove(Unknown Source)
                  - locked <0x1a570fb0> (a java.lang.ref.ReferenceQueue$Lock)
                  at java.lang.ref.ReferenceQueue.remove(Unknown Source)
                  at java.lang.ref.Finalizer$FinalizerThread.run(Unknown Source)"Reference Handler" #2 daemon prio=10 os_prio=2 tid=0x01505800 nid=0x59e4 in Object.wait() [0x4585f000]
             java.lang.Thread.State: WAITING (on object monitor)
                  at java.lang.Object.wait(Native Method)
                  - waiting on <0x1a571150> (a java.lang.ref.Reference$Lock)
                  at java.lang.Object.wait(Unknown Source)
                  at java.lang.ref.Reference.tryHandlePending(Unknown Source)
                  - locked <0x1a571150> (a java.lang.ref.Reference$Lock)
                  at java.lang.ref.Reference$ReferenceHandler.run(Unknown Source)"main" #1 prio=5 os_prio=0 tid=0x0144fc00 nid=0x2218 in Object.wait() [0x02eaf000]
             java.lang.Thread.State: WAITING (on object monitor)
                  at java.lang.Object.wait(Native Method)
          

          Karl Wirth added a comment - Hi maneamarius - The developers looked at the code and could see that we tear down the connection correctly so the hypothesis is that this is a Windows JVM or P4Java problem. They asked me to do some extra testing that I just finished and the results are below. Realistically I dont expect a fix for this until at least the summer. However if this starts to become a recurrent problem please let me know. For developers: (1) In my testing it takes approx 7 minutes for the lock to be released on my test system (Windows 10, java version "1.8.0_261"). (2) There is no open connection to the Perforce server once abort has occurred. Evidence: Console log: P4 Task: syncing files at change: 1377926 (p4):cmd:... p4 sync -p -q E:\filestore\Jenkins\Windows10Swarm201\workspace\BigSwarmOnWindows/..___ ... totalFileSize 13468377 ... totalFileCount 66267 Build was aborted Aborted by anonymous (p4):stop:exceptionP4: ABORT called! duration: 0m 57s Agent log: May 04, 2021 1:14:15 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Connected May 04, 2021 1:16:02 PM org.jenkinsci.plugins.p4.tasks.AbstractTask retryTask WARNING: P4: Task Aborted! OS loggging (netstat and handle): E:\filestore\Jenkins>capture.bat Press [CTRL+C] to stop...] 04/05/2021 13:15:51.76 java.exe pid: 11256 DESKTOP-E4NIR9M\karl TCP 10.153.120.240:64303 10.5.41.112:1666 ESTABLISHED <NOTE - NO HANDLE HERE> 04/05/2021 13:16:21.67 java.exe pid: 11256 DESKTOP-E4NIR9M\karl <NOTE - NO CONNECTION HERE> 7C4: File (RW-) E:\filestore\Jenkins\Windows10Swarm201\workspace\BigSwarmOnWindows\longfiles_devbranch\cumque\ratione.txtNote: 5 mins later handle still there:04/05/2021 04/05/2021 13:20:42.20 java.exe pid: 11256 DESKTOP-E4NIR9M\karl 7C4: File (RW-) E:\filestore\Jenkins\Windows10Swarm201\workspace\BigSwarmOnWindows\longfiles_devbranch\cumque\ratione.txt04/05/2021 13:23:02.14 04/05/2021 13:23:12.15 <HANDLE GONE> Stack trace for same scenario taken for the next test (netstat still showed no connection and handle shows held lock): May 04, 2021 1:29:34 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Connected May 04, 2021 1:30:56 PM org.jenkinsci.plugins.p4.tasks.AbstractTask retryTask WARNING: P4: Task Aborted! May 04, 2021 1:30:56 PM org.jenkinsci.plugins.p4.tasks.AbstractTask retryTask SEVERE: P4 Task: attempt: 1 May 04, 2021 1:30:59 PM org.jenkinsci.plugins.p4.tasks.AbstractTask tryTask SEVERE: P4: Task Exception: hudson.AbortException: P4: Task Aborted! 2021-05-04 13:31:17 Full thread dump Java HotSpot(TM) Client VM (25.261-b12 mixed mode): "AWT-Windows" #34 daemon prio=6 os_prio=0 tid=0x46b7f800 nid=0x312c runnable [0x485bf000] java.lang. Thread .State: RUNNABLE at sun.awt.windows.WToolkit.eventLoop(Native Method) at sun.awt.windows.WToolkit.run(Unknown Source) at java.lang. Thread .run(Unknown Source) "Java2D Disposer" #32 daemon prio=10 os_prio=2 tid=0x46b7f400 nid=0x4440 in Object .wait() [0x4849f000] java.lang. Thread .State: WAITING (on object monitor) at java.lang. Object .wait(Native Method) - waiting on <0x1a88e820> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(Unknown Source) - locked <0x1a88e820> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(Unknown Source) at sun.java2d.Disposer.run(Unknown Source) at java.lang. Thread .run(Unknown Source) "pool-1-thread-11" #27 daemon prio=5 os_prio=0 tid=0x4699f400 nid=0x4198 waiting on condition [0x476cf000] java.lang. Thread .State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x1a7532e8> (a java.util.concurrent.SynchronousQueue$TransferStack) at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source) at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(Unknown Source) at java.util.concurrent.SynchronousQueue$TransferStack.transfer(Unknown Source) at java.util.concurrent.SynchronousQueue.poll(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:117) at hudson.remoting.Engine$1$$Lambda$14/19512613.run(Unknown Source) at java.lang. Thread .run(Unknown Source) "pool-1-thread-7" #23 daemon prio=5 os_prio=0 tid=0x469a2800 nid=0x5374 waiting on condition [0x4751f000] java.lang. Thread .State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x1a7532e8> (a java.util.concurrent.SynchronousQueue$TransferStack) at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source) at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(Unknown Source) at java.util.concurrent.SynchronousQueue$TransferStack.transfer(Unknown Source) at java.util.concurrent.SynchronousQueue.poll(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:117) at hudson.remoting.Engine$1$$Lambda$14/19512613.run(Unknown Source) at java.lang. Thread .run(Unknown Source) "pool-1-thread-6" #22 daemon prio=5 os_prio=0 tid=0x469a1400 nid=0x2fcc waiting on condition [0x4748f000] java.lang. Thread .State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x1a7532e8> (a java.util.concurrent.SynchronousQueue$TransferStack) at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source) at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(Unknown Source) at java.util.concurrent.SynchronousQueue$TransferStack.transfer(Unknown Source) at java.util.concurrent.SynchronousQueue.poll(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:117) at hudson.remoting.Engine$1$$Lambda$14/19512613.run(Unknown Source) at java.lang. Thread .run(Unknown Source) "Ping thread for channel hudson.remoting.Channel@3d3b00:Windows10Swarm201" #21 daemon prio=5 os_prio=0 tid=0x469a1c00 nid=0xddc waiting on condition [0x473ff000] java.lang. Thread .State: TIMED_WAITING (sleeping) at java.lang. Thread .sleep(Native Method) at hudson.remoting.PingThread.run(PingThread.java:94) "pool-1-thread-3" #17 daemon prio=5 os_prio=0 tid=0x46900800 nid=0x5d38 waiting on condition [0x470bf000] java.lang. Thread .State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x1a7532e8> (a java.util.concurrent.SynchronousQueue$TransferStack) at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source) at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(Unknown Source) at java.util.concurrent.SynchronousQueue$TransferStack.transfer(Unknown Source) at java.util.concurrent.SynchronousQueue.poll(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:117) at hudson.remoting.Engine$1$$Lambda$14/19512613.run(Unknown Source) at java.lang. Thread .run(Unknown Source) "pool-1-thread-2" #16 daemon prio=5 os_prio=0 tid=0x45fee000 nid=0x5cfc waiting on condition [0x46fef000] java.lang. Thread .State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x1a7532e8> (a java.util.concurrent.SynchronousQueue$TransferStack) at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source) at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(Unknown Source) at java.util.concurrent.SynchronousQueue$TransferStack.transfer(Unknown Source) at java.util.concurrent.SynchronousQueue.poll(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:117) at hudson.remoting.Engine$1$$Lambda$14/19512613.run(Unknown Source) at java.lang. Thread .run(Unknown Source) "RemoteInvocationHandler [#1]" #14 daemon prio=5 os_prio=0 tid=0x45e83400 nid=0x64c in Object .wait() [0x46ecf000] java.lang. Thread .State: TIMED_WAITING (on object monitor) at java.lang. Object .wait(Native Method) - waiting on <0x1a71f678> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(Unknown Source) - locked <0x1a71f678> (a java.lang.ref.ReferenceQueue$Lock) at hudson.remoting.RemoteInvocationHandler$Unexporter.run(RemoteInvocationHandler.java:600) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at hudson.remoting.AtmostOneThreadExecutor$Worker.run(AtmostOneThreadExecutor.java:111) at java.lang. Thread .run(Unknown Source) " tyrus-jdk-client-1" #12 daemon prio=5 os_prio=0 tid=0x46895400 nid=0x1284 runnable [0x46daf000] java.lang. Thread .State: RUNNABLE at sun.nio.ch.Iocp.getQueuedCompletionStatus(Native Method) at sun.nio.ch.Iocp.access$300(Unknown Source) at sun.nio.ch.Iocp$EventHandlerTask.run(Unknown Source) at sun.nio.ch.AsynchronousChannelGroupImpl$1.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang. Thread .run(Unknown Source) " Thread -2" #11 daemon prio=5 os_prio=0 tid=0x45fde400 nid=0xe94 runnable [0x46d1f000] java.lang. Thread .State: RUNNABLE at sun.nio.ch.Iocp.getQueuedCompletionStatus(Native Method) at sun.nio.ch.Iocp.access$300(Unknown Source) at sun.nio.ch.Iocp$EventHandlerTask.run(Unknown Source) at java.lang. Thread .run(Unknown Source) " Thread -1" #10 prio=5 os_prio=0 tid=0x45f39000 nid=0x34f4 in Object .wait() [0x4630f000] java.lang. Thread .State: TIMED_WAITING (on object monitor) at java.lang. Object .wait(Native Method) - waiting on <0x1a777b90> (a hudson.remoting.Channel) at hudson.remoting.Channel.join(Channel.java:1182) - locked <0x1a777b90> (a hudson.remoting.Channel) at hudson.remoting.Engine.runWebSocket(Engine.java:633) at hudson.remoting.Engine.run(Engine.java:469) "Service Thread " #7 daemon prio=9 os_prio=0 tid=0x45a2ec00 nid=0x5db0 runnable [0x00000000] java.lang. Thread .State: RUNNABLE "C1 CompilerThread0" #6 daemon prio=9 os_prio=2 tid=0x45a0d000 nid=0x6d0 waiting on condition [0x00000000] java.lang. Thread .State: RUNNABLE "Attach Listener" #5 daemon prio=5 os_prio=2 tid=0x45a0bc00 nid=0x3074 runnable [0x00000000] java.lang. Thread .State: RUNNABLE "Signal Dispatcher" #4 daemon prio=9 os_prio=2 tid=0x45a0a000 nid=0xe00 waiting on condition [0x00000000] java.lang. Thread .State: RUNNABLE "Finalizer" #3 daemon prio=8 os_prio=1 tid=0x0151b400 nid=0x5a8c in Object .wait() [0x458ef000] java.lang. Thread .State: WAITING (on object monitor) at java.lang. Object .wait(Native Method) - waiting on <0x1a570fb0> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(Unknown Source) - locked <0x1a570fb0> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(Unknown Source) at java.lang.ref.Finalizer$FinalizerThread.run(Unknown Source) "Reference Handler" #2 daemon prio=10 os_prio=2 tid=0x01505800 nid=0x59e4 in Object .wait() [0x4585f000] java.lang. Thread .State: WAITING (on object monitor) at java.lang. Object .wait(Native Method) - waiting on <0x1a571150> (a java.lang.ref.Reference$Lock) at java.lang. Object .wait(Unknown Source) at java.lang.ref.Reference.tryHandlePending(Unknown Source) - locked <0x1a571150> (a java.lang.ref.Reference$Lock) at java.lang.ref.Reference$ReferenceHandler.run(Unknown Source) "main" #1 prio=5 os_prio=0 tid=0x0144fc00 nid=0x2218 in Object .wait() [0x02eaf000] java.lang. Thread .State: WAITING (on object monitor) at java.lang. Object .wait(Native Method)

          Marius added a comment -

          Hi Karl,

           

          Thanks for the detailed explanation!

          Yes, this problem is still happening and it's very annoying, because we need to restart Jenkins slave service to get rid of it

          If there's any way to prioritize the fix for it, we'd really appreciate it..

           

          thanks a lot,

          Marius

          Marius added a comment - Hi Karl,   Thanks for the detailed explanation! Yes, this problem is still happening and it's very annoying, because we need to restart Jenkins slave service to get rid of it If there's any way to prioritize the fix for it, we'd really appreciate it..   thanks a lot, Marius

          Karl Wirth added a comment -

          Hi maneamarius - Thanks. I dont have any say on the scheduling but I will raise this to the product managers again.

          Karl Wirth added a comment - Hi maneamarius - Thanks. I dont have any say on the scheduling but I will raise this to the product managers again.

          Karl Wirth added a comment -

          Note - have seen a new case of this. In that case the sync was still running in the background so overlapped the new sync when the job was rerun.

          Karl Wirth added a comment - Note - have seen a new case of this. In that case the sync was still running in the background so overlapped the new sync when the job was rerun.

          Karl Wirth added a comment -

          Please consider this as a top priority when planning next sprint.

          Karl Wirth added a comment - Please consider this as a top priority when planning next sprint.

          Marius added a comment -

          hey guys, I appreciate your help and I look forward for the fix!

          Please keep us posted

          Marius added a comment - hey guys, I appreciate your help and I look forward for the fix! Please keep us posted

          Marius added a comment -

          hello,

          Just checking to see if you have any updates regarding this issue?

          thanks a lot,

          Marius added a comment - hello, Just checking to see if you have any updates regarding this issue? thanks a lot,

          Karl Wirth added a comment -

          Requested that this is investigated in upcoming sprint.

          Karl Wirth added a comment - Requested that this is investigated in upcoming sprint.

          Elliot added a comment - - edited

          p4karl  I investigated this issue when I experienced it and found it to be P4Java not properly disposing of streams... 

          I wrote an issue about it here: https://github.com/perforce/p4java/issues/2

           

          I also sent in a support ticket, they responded with "that's definitely interesting we will investigate" and they haven't gotten back to me. Here's my case info: 

          Resource leak in p4java causing issues with Jenkins p4-plugin - Case# 00947568

          edit: they just replied and said they have been unable to reproduce... this is definitely a tough one to reproduce, it took me a while to get it to happen and get to the agent before a GC ran, helps to give the jvm a lot of memory so it is running GC less often

          Elliot added a comment - - edited p4karl   I investigated this issue when I experienced it and found it to be P4Java not properly disposing of streams...  I wrote an issue about it here: https://github.com/perforce/p4java/issues/2   I also sent in a support ticket, they responded with "that's definitely interesting we will investigate" and they haven't gotten back to me. Here's my case info:  Resource leak in p4java causing issues with Jenkins p4-plugin - Case# 00947568 edit: they just replied and said they have been unable to reproduce... this is definitely a tough one to reproduce, it took me a while to get it to happen and get to the agent before a GC ran, helps to give the jvm a lot of memory so it is running GC less often

          Sandeep Kumar added a comment - - edited

          maneamarius, elliotdematteis could you please try this with the P4 version 1.14.4 or later? ?

          Sandeep Kumar added a comment - - edited maneamarius , elliotdematteis could you please try this with the P4 version 1.14.4 or later? ?

          Sandeep Kumar added a comment -

          Please feel free to open it if this issue occurs in p4 plugin version >= 1.15.1

          Sandeep Kumar added a comment - Please feel free to open it if this issue occurs in p4 plugin version >= 1.15.1

            Unassigned Unassigned
            maneamarius Marius
            Votes:
            6 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: