Hello,
When cancelling a p4 sync, the files are locked and it is not possible to run another build until restarting Jenkins service.
Tested with latest P4 plugin (1.11.4) and the issue is reproducible 100% in our environment.
This issue seems very similar to https://issues.jenkins.io/browse/JENKINS-37487 which has been marked as resolved.
Could you please help fixing the issue?
[Pipeline] End of Pipeline ERROR: P4: Task Exception: java.io.IOException: Unable to delete file: C:\Jenkins\workspace\Vulcan\1.7\Unity_2 Finished: FAILURE
thanks a lot,
Marius
[JENKINS-65418] Cancelling p4 sync results in locked files
We don't have a specific need for this feature, but it may happens sometimes people realize they start the job with wrong parameters and they try to cancel the job right away.
It happened serveral times before and only by restarting the jenkins service we can fix it (which can be a problem, if we have builds that are running - some of them take hours to complete).
Hence, the need to have this fixed.
Regarding your other question, the sync is not on a big file, but it's a rather large repo (see below)
13:01:28 P4 Task: syncing files at change: 41019
13:01:28 (p4):cmd:... p4 sync -p -q C:\Jenkins\workspace\Vulcan\1.7\Unity/...@41019
13:01:29 ... totalFileSize 20500235028
13:01:29 ... totalFileCount 36087
13:01:29
13:01:29 ... totalFileSize 20500235028
13:01:29 ... totalFileCount 36087
Hi maneamarius,
Thanks for the information. I'll try and repro this here and will let you know if I can.
Easily able to reproduce this.
(1) Point at P4D server with millions of files.
(2) Start any job with a forced Sync using a Windows slave.
(3) Click the red abort icon.
(4) Try to delete the workspace folder get an error that a file is locked.
(5) Use the 'SysInternals' program 'handle' shows the file that is locked:
Hi maneamarius
Thanks for again highlighting this. I was easily able to reproduce it so have marked it for the developers to review during their next sprint planning sessions. I dont know when this will be.
Regards.
Karl
Hey Karl,
Thanks a lot for helping out!
I look forward to hearing from you soon, regarding a fix
have a great weekend,
Marius
Hey Karl,
Just checking if you have any updates on this?
thanks a lot,
Marius
Hi maneamarius - The developers looked at the code and could see that we tear down the connection correctly so the hypothesis is that this is a Windows JVM or P4Java problem. They asked me to do some extra testing that I just finished and the results are below.
Realistically I dont expect a fix for this until at least the summer. However if this starts to become a recurrent problem please let me know.
For developers:
(1) In my testing it takes approx 7 minutes for the lock to be released on my test system (Windows 10, java version "1.8.0_261").
(2) There is no open connection to the Perforce server once abort has occurred.
Evidence:
Console log: P4 Task: syncing files at change: 1377926 (p4):cmd:... p4 sync -p -q E:\filestore\Jenkins\Windows10Swarm201\workspace\BigSwarmOnWindows/..___ ... totalFileSize 13468377 ... totalFileCount 66267 Build was aborted Aborted by anonymous (p4):stop:exceptionP4: ABORT called! duration: 0m 57s Agent log: May 04, 2021 1:14:15 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Connected May 04, 2021 1:16:02 PM org.jenkinsci.plugins.p4.tasks.AbstractTask retryTask WARNING: P4: Task Aborted! OS loggging (netstat and handle): E:\filestore\Jenkins>capture.bat Press [CTRL+C] to stop...] 04/05/2021 13:15:51.76 java.exe pid: 11256 DESKTOP-E4NIR9M\karl TCP 10.153.120.240:64303 10.5.41.112:1666 ESTABLISHED <NOTE - NO HANDLE HERE> 04/05/2021 13:16:21.67 java.exe pid: 11256 DESKTOP-E4NIR9M\karl <NOTE - NO CONNECTION HERE> 7C4: File (RW-) E:\filestore\Jenkins\Windows10Swarm201\workspace\BigSwarmOnWindows\longfiles_devbranch\cumque\ratione.txtNote: 5 mins later handle still there:04/05/2021 04/05/2021 13:20:42.20 java.exe pid: 11256 DESKTOP-E4NIR9M\karl 7C4: File (RW-) E:\filestore\Jenkins\Windows10Swarm201\workspace\BigSwarmOnWindows\longfiles_devbranch\cumque\ratione.txt04/05/2021 13:23:02.14 04/05/2021 13:23:12.15 <HANDLE GONE>
Stack trace for same scenario taken for the next test (netstat still showed no connection and handle shows held lock):
May 04, 2021 1:29:34 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Connected May 04, 2021 1:30:56 PM org.jenkinsci.plugins.p4.tasks.AbstractTask retryTask WARNING: P4: Task Aborted! May 04, 2021 1:30:56 PM org.jenkinsci.plugins.p4.tasks.AbstractTask retryTask SEVERE: P4 Task: attempt: 1 May 04, 2021 1:30:59 PM org.jenkinsci.plugins.p4.tasks.AbstractTask tryTask SEVERE: P4: Task Exception: hudson.AbortException: P4: Task Aborted! 2021-05-04 13:31:17 Full thread dump Java HotSpot(TM) Client VM (25.261-b12 mixed mode):"AWT-Windows" #34 daemon prio=6 os_prio=0 tid=0x46b7f800 nid=0x312c runnable [0x485bf000] java.lang.Thread.State: RUNNABLE at sun.awt.windows.WToolkit.eventLoop(Native Method) at sun.awt.windows.WToolkit.run(Unknown Source) at java.lang.Thread.run(Unknown Source)"Java2D Disposer" #32 daemon prio=10 os_prio=2 tid=0x46b7f400 nid=0x4440 in Object.wait() [0x4849f000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x1a88e820> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(Unknown Source) - locked <0x1a88e820> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(Unknown Source) at sun.java2d.Disposer.run(Unknown Source) at java.lang.Thread.run(Unknown Source)"pool-1-thread-11" #27 daemon prio=5 os_prio=0 tid=0x4699f400 nid=0x4198 waiting on condition [0x476cf000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x1a7532e8> (a java.util.concurrent.SynchronousQueue$TransferStack) at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source) at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(Unknown Source) at java.util.concurrent.SynchronousQueue$TransferStack.transfer(Unknown Source) at java.util.concurrent.SynchronousQueue.poll(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:117) at hudson.remoting.Engine$1$$Lambda$14/19512613.run(Unknown Source) at java.lang.Thread.run(Unknown Source)"pool-1-thread-7" #23 daemon prio=5 os_prio=0 tid=0x469a2800 nid=0x5374 waiting on condition [0x4751f000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x1a7532e8> (a java.util.concurrent.SynchronousQueue$TransferStack) at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source) at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(Unknown Source) at java.util.concurrent.SynchronousQueue$TransferStack.transfer(Unknown Source) at java.util.concurrent.SynchronousQueue.poll(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:117) at hudson.remoting.Engine$1$$Lambda$14/19512613.run(Unknown Source) at java.lang.Thread.run(Unknown Source)"pool-1-thread-6" #22 daemon prio=5 os_prio=0 tid=0x469a1400 nid=0x2fcc waiting on condition [0x4748f000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x1a7532e8> (a java.util.concurrent.SynchronousQueue$TransferStack) at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source) at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(Unknown Source) at java.util.concurrent.SynchronousQueue$TransferStack.transfer(Unknown Source) at java.util.concurrent.SynchronousQueue.poll(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:117) at hudson.remoting.Engine$1$$Lambda$14/19512613.run(Unknown Source) at java.lang.Thread.run(Unknown Source)"Ping thread for channel hudson.remoting.Channel@3d3b00:Windows10Swarm201" #21 daemon prio=5 os_prio=0 tid=0x469a1c00 nid=0xddc waiting on condition [0x473ff000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at hudson.remoting.PingThread.run(PingThread.java:94)"pool-1-thread-3" #17 daemon prio=5 os_prio=0 tid=0x46900800 nid=0x5d38 waiting on condition [0x470bf000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x1a7532e8> (a java.util.concurrent.SynchronousQueue$TransferStack) at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source) at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(Unknown Source) at java.util.concurrent.SynchronousQueue$TransferStack.transfer(Unknown Source) at java.util.concurrent.SynchronousQueue.poll(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:117) at hudson.remoting.Engine$1$$Lambda$14/19512613.run(Unknown Source) at java.lang.Thread.run(Unknown Source)"pool-1-thread-2" #16 daemon prio=5 os_prio=0 tid=0x45fee000 nid=0x5cfc waiting on condition [0x46fef000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x1a7532e8> (a java.util.concurrent.SynchronousQueue$TransferStack) at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source) at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(Unknown Source) at java.util.concurrent.SynchronousQueue$TransferStack.transfer(Unknown Source) at java.util.concurrent.SynchronousQueue.poll(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:117) at hudson.remoting.Engine$1$$Lambda$14/19512613.run(Unknown Source) at java.lang.Thread.run(Unknown Source)"RemoteInvocationHandler [#1]" #14 daemon prio=5 os_prio=0 tid=0x45e83400 nid=0x64c in Object.wait() [0x46ecf000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x1a71f678> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(Unknown Source) - locked <0x1a71f678> (a java.lang.ref.ReferenceQueue$Lock) at hudson.remoting.RemoteInvocationHandler$Unexporter.run(RemoteInvocationHandler.java:600) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at hudson.remoting.AtmostOneThreadExecutor$Worker.run(AtmostOneThreadExecutor.java:111) at java.lang.Thread.run(Unknown Source)" tyrus-jdk-client-1" #12 daemon prio=5 os_prio=0 tid=0x46895400 nid=0x1284 runnable [0x46daf000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.Iocp.getQueuedCompletionStatus(Native Method) at sun.nio.ch.Iocp.access$300(Unknown Source) at sun.nio.ch.Iocp$EventHandlerTask.run(Unknown Source) at sun.nio.ch.AsynchronousChannelGroupImpl$1.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source)"Thread-2" #11 daemon prio=5 os_prio=0 tid=0x45fde400 nid=0xe94 runnable [0x46d1f000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.Iocp.getQueuedCompletionStatus(Native Method) at sun.nio.ch.Iocp.access$300(Unknown Source) at sun.nio.ch.Iocp$EventHandlerTask.run(Unknown Source) at java.lang.Thread.run(Unknown Source)"Thread-1" #10 prio=5 os_prio=0 tid=0x45f39000 nid=0x34f4 in Object.wait() [0x4630f000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x1a777b90> (a hudson.remoting.Channel) at hudson.remoting.Channel.join(Channel.java:1182) - locked <0x1a777b90> (a hudson.remoting.Channel) at hudson.remoting.Engine.runWebSocket(Engine.java:633) at hudson.remoting.Engine.run(Engine.java:469)"Service Thread" #7 daemon prio=9 os_prio=0 tid=0x45a2ec00 nid=0x5db0 runnable [0x00000000] java.lang.Thread.State: RUNNABLE"C1 CompilerThread0" #6 daemon prio=9 os_prio=2 tid=0x45a0d000 nid=0x6d0 waiting on condition [0x00000000] java.lang.Thread.State: RUNNABLE"Attach Listener" #5 daemon prio=5 os_prio=2 tid=0x45a0bc00 nid=0x3074 runnable [0x00000000] java.lang.Thread.State: RUNNABLE"Signal Dispatcher" #4 daemon prio=9 os_prio=2 tid=0x45a0a000 nid=0xe00 waiting on condition [0x00000000] java.lang.Thread.State: RUNNABLE"Finalizer" #3 daemon prio=8 os_prio=1 tid=0x0151b400 nid=0x5a8c in Object.wait() [0x458ef000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x1a570fb0> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(Unknown Source) - locked <0x1a570fb0> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(Unknown Source) at java.lang.ref.Finalizer$FinalizerThread.run(Unknown Source)"Reference Handler" #2 daemon prio=10 os_prio=2 tid=0x01505800 nid=0x59e4 in Object.wait() [0x4585f000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x1a571150> (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Unknown Source) at java.lang.ref.Reference.tryHandlePending(Unknown Source) - locked <0x1a571150> (a java.lang.ref.Reference$Lock) at java.lang.ref.Reference$ReferenceHandler.run(Unknown Source)"main" #1 prio=5 os_prio=0 tid=0x0144fc00 nid=0x2218 in Object.wait() [0x02eaf000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method)
Hi Karl,
Thanks for the detailed explanation!
Yes, this problem is still happening and it's very annoying, because we need to restart Jenkins slave service to get rid of it
If there's any way to prioritize the fix for it, we'd really appreciate it..
thanks a lot,
Marius
Hi maneamarius - Thanks. I dont have any say on the scheduling but I will raise this to the product managers again.
Note - have seen a new case of this. In that case the sync was still running in the background so overlapped the new sync when the job was rerun.
hey guys, I appreciate your help and I look forward for the fix!
Please keep us posted
hello,
Just checking to see if you have any updates regarding this issue?
thanks a lot,
p4karl I investigated this issue when I experienced it and found it to be P4Java not properly disposing of streams...
I wrote an issue about it here: https://github.com/perforce/p4java/issues/2
I also sent in a support ticket, they responded with "that's definitely interesting we will investigate" and they haven't gotten back to me. Here's my case info:
Resource leak in p4java causing issues with Jenkins p4-plugin - Case# 00947568
edit: they just replied and said they have been unable to reproduce... this is definitely a tough one to reproduce, it took me a while to get it to happen and get to the agent before a GC ran, helps to give the jvm a lot of memory so it is running GC less often
Please feel free to open it if this issue occurs in p4 plugin version >= 1.15.1
Hi maneamarius - Thanks for highlighting this. How did you test the cancelling the job? Do you abort the job during a sync of a big file?
Whats the organic reason that this occurs in your environment? For example do you need to abort jobs in certain circumstances?