-
Bug
-
Resolution: Unresolved
-
Blocker
-
None
-
Jenkins 2.332.3
Throttle plugin 2.8
Plugins as latest as possible
The builds slow down to a crawl. Every 'node(...} { }' statement takes multiple seconds to enter and other operations also appear very slow. The UI itself is rather responsive, until you try to change some configuration, then it takes up to multiple minutes to adjust something (e.g. to put a node offline).
Thread dump shows that executors mostly keep waiting for Queue.maintain, I assume on every step judging by their slowness.
AtmostOneTaskExecutor[Periodic Jenkins queue maintenance] [#425841]"AtmostOneTaskExecutor[Periodic Jenkins queue maintenance] [#425841]" Id=1116475 Group=main RUNNABLE at java.base@11.0.11/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1323) at java.base@11.0.11/java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:738) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$TimingFlowNodeStorage.getNode(CpsFlowExecution.java:1841) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.getNode(CpsFlowExecution.java:1174) at hudson.plugins.throttleconcurrents.ThrottleJobProperty.getThrottledPipelineRunsForCategory(ThrottleJobProperty.java:340) at hudson.plugins.throttleconcurrents.ThrottleQueueTaskDispatcher.throttleCheckForCategoriesOnNode(ThrottleQueueTaskDispatcher.java:132) at hudson.plugins.throttleconcurrents.ThrottleQueueTaskDispatcher.canTakeImpl(ThrottleQueueTaskDispatcher.java:101) at hudson.plugins.throttleconcurrents.ThrottleQueueTaskDispatcher.canTake(ThrottleQueueTaskDispatcher.java:62) at hudson.model.queue.QueueTaskDispatcher.canTake(QueueTaskDispatcher.java:101) at hudson.model.Queue$JobOffer.getCauseOfBlockage(Queue.java:276) at hudson.model.Queue.maintain(Queue.java:1637) at hudson.model.Queue$1.call(Queue.java:330) at hudson.model.Queue$1.call(Queue.java:327) at jenkins.util.AtmostOneTaskExecutor$1.call(AtmostOneTaskExecutor.java:109) at jenkins.util.AtmostOneTaskExecutor$1.call(AtmostOneTaskExecutor.java:99) at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:80) at java.base@11.0.11/java.util.concurrent.FutureTask.run(FutureTask.java:264) at hudson.remoting.AtmostOneThreadExecutor$Worker.run(AtmostOneThreadExecutor.java:121) at java.base@11.0.11/java.lang.Thread.run(Thread.java:829) Number of locked synchronizers = 1 - java.util.concurrent.locks.ReentrantLock$NonfairSync@6a4b9f26
Periodic Jenkins queue maintenance waiting for Periodic Jenkins queue maintenance like this:
Executor #7 for Win-CrashReports-Node"Executor #7 for Win-CrashReports-Node" Id=1116435 Group=main WAITING on java.util.concurrent.locks.ReentrantLock$NonfairSync@6a4b9f26 owned by "AtmostOneTaskExecutor[Periodic Jenkins queue maintenance] [#425841]" Id=1116475 at java.base@11.0.11/jdk.internal.misc.Unsafe.park(Native Method) - waiting on java.util.concurrent.locks.ReentrantLock$NonfairSync@6a4b9f26 at java.base@11.0.11/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194) at java.base@11.0.11/java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:885) at java.base@11.0.11/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:917) at java.base@11.0.11/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1240) at java.base@11.0.11/java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:267) at hudson.model.Queue._withLock(Queue.java:1454) at hudson.model.Queue.withLock(Queue.java:1312) at hudson.model.Executor.run(Executor.java:352)
For some reason there are timers maintaining the Queue in parallel with Periodic queue maintenance thread:
jenkins.util.Timer [#7]"jenkins.util.Timer [#7]" Id=57 Group=main WAITING on java.util.concurrent.locks.ReentrantLock$NonfairSync@6a4b9f26 owned by "AtmostOneTaskExecutor[Periodic Jenkins queue maintenance] [#425841]" Id=1116475 at java.base@11.0.11/jdk.internal.misc.Unsafe.park(Native Method) - waiting on java.util.concurrent.locks.ReentrantLock$NonfairSync@6a4b9f26 at java.base@11.0.11/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194) at java.base@11.0.11/java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:885) at java.base@11.0.11/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:917) at java.base@11.0.11/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1240) at java.base@11.0.11/java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:267) at hudson.model.Queue.schedule2(Queue.java:570) at jenkins.model.ParameterizedJobMixIn.scheduleBuild2(ParameterizedJobMixIn.java:159) at jenkins.model.ParameterizedJobMixIn.scheduleBuild(ParameterizedJobMixIn.java:117) at jenkins.model.ParameterizedJobMixIn$ParameterizedJob.scheduleBuild(ParameterizedJobMixIn.java:391) at hudson.triggers.TimerTrigger.run(TimerTrigger.java:67) at hudson.triggers.Trigger.checkTriggers(Trigger.java:292) at hudson.triggers.Trigger$Cron.doRun(Trigger.java:234) at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:92) at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:67) at java.base@11.0.11/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base@11.0.11/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) at java.base@11.0.11/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) at java.base@11.0.11/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base@11.0.11/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base@11.0.11/java.lang.Thread.run(Thread.java:829) Number of locked synchronizers = 1 - java.util.concurrent.ThreadPoolExecutor$Worker@4ff95144
We use throttle in some areas of our scripted pipeline, we do a lot of node steps. There are also many freestyle builds. In total, the queue contains 1200 items right now and we have around 30 static build nodes with around 6 executors each. Jenkins server has 8 CPU cores and 16 GB of RAM. It is very responsive, but the builds and any node config operations are extremely slow, barely even running. There are around 20 throttle categories and category overrides for 20 nodes for 3 categories each.
I tried using throttle plugin sparingly, but still the scripted pipeline node step was really slow, so I am not sure if the problem is the plugin or jenkins core.
UPDATE: I removed most of the plugins which could interfere with the queue that I could think of and stopped using throttle plugin at some crucial points (we can't really abandon it), but still the performance keeps degrading more the bigger the queue. There seem to be periodic CPU usage spikes caused by queue maintenance, which blocks all the builds periodically. I removed Priority Sorter and Scoring Load Balancer.
- relates to
-
JENKINS-69132 nested 'node' step always jumps to a new node unnecessarily
- Open