[JENKINS-20046] Massive Jenkins slowdown when jobs in Queue (due to Queue.maintain())

Type: Bug
Resolution: Fixed
Priority: Major
Component/s: core
Labels:
- performance
- queue
Environment:
Ubuntu 12.04
Jenkins 1.509.3
Up-to-date plugins

Similar Issues:
Powered by SuggestiMate

Show

As soon as more than a handful builds get queued, the entire GUI crawls to a halt.

The reason is that the executor thread running the "Queue.maintain()" method is holding the exclusive lock on the queue, but starts a very time-consuming loop on creating the list of applicable hosts matching a certain label.

Due to this, every Jenkins GUI page and every method that needs access to the Queue gets delayed by ~30 seconds; with the delay rising the more builds are in the queue, due to Queue.maintain() being called more often.

The server only becomes responsive again, once the entire queue is empty. Setting the server to "shutdown now" does not help.

A usual stack trace when this occurs looks like this (first from /threadDump; the second from jstack during a different time):
{{
"Executor #6 for musxbird038" prio=10 tid=0x00007fe108024800 nid=0x7008 runnable [0x00007fe0f5a99000]
java.lang.Thread.State: RUNNABLE
at hudson.model.Slave.getLabelString(Slave.java:245)
at hudson.model.Node.getAssignedLabels(Node.java:241)
at hudson.model.Label.matches(Label.java:168)
at hudson.model.Label.getNodes(Label.java:193)
at hudson.model.Label.contains(Label.java:405)
at hudson.model.Node.canTake(Node.java:322)
at hudson.model.Queue$JobOffer.canTake(Queue.java:250)
at hudson.model.Queue.maintain(Queue.java:1032)

locked <0x00000000e01d3490> (a hudson.model.Queue)
at hudson.model.Queue.pop(Queue.java:863)
locked <0x00000000e01d3490> (a hudson.model.Queue)
at hudson.model.Executor.grabJob(Executor.java:285)
at hudson.model.Executor.run(Executor.java:206)
locked <0x00000000e01d3490> (a hudson.model.Queue)

"Executor #0 for musxbird006" Id=591 Group=main RUNNABLE
at java.util.TreeMap.successor(TreeMap.java:1975)
at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1101)
at java.util.TreeMap$KeyIterator.next(TreeMap.java:1154)
at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
at hudson.model.Label$2.resolve(Label.java:159)
at hudson.model.Label$2.resolve(Label.java:157)
at hudson.model.labels.LabelAtom.matches(LabelAtom.java:149)
at hudson.model.labels.LabelExpression$Binary.matches(LabelExpression.java:124)
at hudson.model.Label.matches(Label.java:157)
at hudson.model.Label.matches(Label.java:168)
at hudson.model.Label.getNodes(Label.java:193)
at hudson.model.Label.contains(Label.java:405)
at hudson.model.Node.canTake(Node.java:322)
at hudson.model.Queue$JobOffer.canTake(Queue.java:250)
at hudson.model.Queue.maintain(Queue.java:1032)

locked hudson.model.Queue@2962c1e0
at hudson.model.Queue.pop(Queue.java:863)
locked hudson.model.Queue@2962c1e0
at hudson.model.Executor.grabJob(Executor.java:285)
at hudson.model.Executor.run(Executor.java:206)
locked hudson.model.Queue@2962c1e0
}}

As you can see, the Queue.maintain() method does finish successfully, but needs more than 30 seconds for it. The server does not stop working and will return to normal once the queue has been fully processed.

We have ~20 nodes with 12 executor slots each running (= 240 executor threads). There is an equal amount of jobs running, but not all of them consume CPU time on the host (most are idling and waiting for certain events).

This issue has occurred since upgrading from 1.509.1 to 1.509.3.

Thanks in advance.

depends on

JENKINS-27565 Nodes can be removed as idle before the assigned tasks have started

Closed

is blocked by

JENKINS-47517 2.85: StackOverflowError between Queue$Task.getCauseOfBlockage and workflow.job.AfterRestartTask.isBuildBlocked

Resolved

is related to

JENKINS-19691 Make the lifetime of queue items cache configurable

Resolved

JENKINS-27650 Page loads slow with hundreds of throttled builds in queue

Open

JENKINS-2487 "Please wait, getting ready to work" takes very long

Resolved

relates to

JENKINS-58101 jenkins slowdown with many offline nodes

Closed

(1 relates to)

Daniel Beck added a comment - 2014-05-18 11:24

Do you still experience this issue? Are you still on 1.509.3? What happens when you temporarily reduce online executor count so the total is around ~50 or so?

Cannot reproduce on 1.532.3 with the queue sometimes ~7000 items long. So either it's resolved, or it really seems to be more related to the number of executors (and possibly labels/nodes) you have, of which I only have ~15 and 4 respectively. Another instance has ~14 nodes with ~40 executors total, and sometimes queues of up to 10 items, and also doesn't have that problem. Same Jenkins version.

Daniel Beck added a comment - 2014-05-18 11:24 Do you still experience this issue? Are you still on 1.509.3? What happens when you temporarily reduce online executor count so the total is around ~50 or so? Cannot reproduce on 1.532.3 with the queue sometimes ~7000 items long. So either it's resolved, or it really seems to be more related to the number of executors (and possibly labels/nodes) you have, of which I only have ~15 and 4 respectively. Another instance has ~14 nodes with ~40 executors total, and sometimes queues of up to 10 items, and also doesn't have that problem. Same Jenkins version.

Oleg Nenashev added a comment - 2014-05-19 05:50

The queue handling time depends on extensions you use: TaskQueueDispathers, JobProperties, Nodes, ...
As example, there was a severe performance issue in Throttle Concurrent Builds Plugin several months ago.
Operations like canTake() lock the queue and may take much time, hence the issue is valid.

BTW, the only way to fix it inside the core is to implement support of multiple queues within the core (there was a feature request for this case). It would be useful to features like priorities, but the implementation in the current core seems to be extremely hard

@Martin
If the issue is actual to you, please provide a list of plugins you use.

Oleg Nenashev added a comment - 2014-05-19 05:50 The queue handling time depends on extensions you use: TaskQueueDispathers, JobProperties, Nodes, ... As example, there was a severe performance issue in Throttle Concurrent Builds Plugin several months ago. Operations like canTake() lock the queue and may take much time, hence the issue is valid. BTW, the only way to fix it inside the core is to implement support of multiple queues within the core (there was a feature request for this case). It would be useful to features like priorities, but the implementation in the current core seems to be extremely hard @Martin If the issue is actual to you, please provide a list of plugins you use.

Martin Schröder added a comment - 2014-05-21 09:27

Hi Daniel, hi Oleg.

We are currently using Jenkins 1.509.4 on our servers. The servers have up to 50 build hosts connected to them, with roughly 1000 executors spawned in total.

During peaks in test execution load, a significant queue can accumulate. During these times, we found that even a handful of builds in the queue could lock up the servers so badly, that it did not not start new builds in a timely manner, causing the queue to become increasingly longer, thus worsening the problem.

In short: (High number of executors = Expensive Queue Maintenance) + Queued builds = Constant synchronization on Queue.

This will also cause the entire GUI to become unresponsive, as it waits for the Queue to be free, before rendering the "nodes" side-panel.

We are not currently experiencing this issue, because we are running a modified version of the Jenkins servers, where the majority of the "synchronized" calls have been replaced with read/write locks, that allow simultaneous access of several methods into the queue.

You can find the patches against 1.509.4 and 1.554.1 here:

https://github.com/HedAurabesh/jenkins/tree/queue-1.509.4
https://github.com/HedAurabesh/jenkins/tree/queue-1.554.1

We intend to release a formal pull request to Jenkins soon.

The only side-effect of this change is, that the "scheduled builds" side-panel on the Job overview sometimes renders a build to be BOTH scheduled AND in progress. But that is a little price to pay for the server not getting stuck. Additionally, it is nothing a quick "F5" can't fix.

Best regards,
Martin

Martin Schröder added a comment - 2014-05-21 09:27 Hi Daniel, hi Oleg. We are currently using Jenkins 1.509.4 on our servers. The servers have up to 50 build hosts connected to them, with roughly 1000 executors spawned in total. During peaks in test execution load, a significant queue can accumulate. During these times, we found that even a handful of builds in the queue could lock up the servers so badly, that it did not not start new builds in a timely manner, causing the queue to become increasingly longer, thus worsening the problem. In short: (High number of executors = Expensive Queue Maintenance) + Queued builds = Constant synchronization on Queue. This will also cause the entire GUI to become unresponsive, as it waits for the Queue to be free, before rendering the "nodes" side-panel. We are not currently experiencing this issue, because we are running a modified version of the Jenkins servers, where the majority of the "synchronized" calls have been replaced with read/write locks, that allow simultaneous access of several methods into the queue. You can find the patches against 1.509.4 and 1.554.1 here: https://github.com/HedAurabesh/jenkins/tree/queue-1.509.4 https://github.com/HedAurabesh/jenkins/tree/queue-1.554.1 We intend to release a formal pull request to Jenkins soon. The only side-effect of this change is, that the "scheduled builds" side-panel on the Job overview sometimes renders a build to be BOTH scheduled AND in progress. But that is a little price to pay for the server not getting stuck. Additionally, it is nothing a quick "F5" can't fix. Best regards, Martin

Oleg Nenashev added a comment - 2014-05-21 10:10 - edited

Hi Martin,

The PR for such change would be useful in any case.

Regarding the web UI...
On our installations we have increased the refresh timeout of the internal queue cache (~~JENKINS-19691~~, see https://github.com/jenkinsci/jenkins/pull/1221).
Together with a periodic cache refresh via a kick-starter task, such solution greatly improves the responsiveness of the UI.
BTW, it also leads to glitches in the side-panel
Unfortunately, "F5" does not help in such case

Oleg Nenashev added a comment - 2014-05-21 10:10 - edited Hi Martin, The PR for such change would be useful in any case. Regarding the web UI... On our installations we have increased the refresh timeout of the internal queue cache ( JENKINS-19691 , see https://github.com/jenkinsci/jenkins/pull/1221 ). Together with a periodic cache refresh via a kick-starter task, such solution greatly improves the responsiveness of the UI. BTW, it also leads to glitches in the side-panel Unfortunately, "F5" does not help in such case

Hans-Juergen Hafner added a comment - 2014-11-20 17:53

Hello Martin,

are there any news regarding pull request?

Hans-Juergen Hafner added a comment - 2014-11-20 17:53 Hello Martin, are there any news regarding pull request?

Martin Schröder added a comment - 2014-11-21 17:14 - edited

Hi Hans-Jürgen.

Yes, we're currently preparing a patch against 1.580.1.

It will include the improved locking mechanism and some further changes to optimize the speed of the Queue maintenance (Queue.maintain()).

Additionally, we're preparing a (completely separate) patch and pull request to completely overhaul the scheduling logic. Basically, the current Queue.maintain() algorithm scales O(m*n), where m is the number of (non-blocked) jobs in the queue and n is the number of (non-busy) executors in the system.

When you have a long queue and many executors, this function essentially life-locks, taking several seconds to finish. It is also inherently sequential, only one thread can do this at a time.
And then, to complete the 1-2-knockout punch, you must remember that EVERY executor needs to run a Queue.maintain() before it can mark itself as free.

Taking all of this together, this means that Queue.maintain() is currently the biggest hurdle for scaling up Jenkins. And without the patch above, the Web UI also completely locks up, as even the read functions needed to display the index page have to wait for Queue.maintain() to finish!

Best regards,
Martin.

Martin Schröder added a comment - 2014-11-21 17:14 - edited Hi Hans-Jürgen. Yes, we're currently preparing a patch against 1.580.1. It will include the improved locking mechanism and some further changes to optimize the speed of the Queue maintenance (Queue.maintain()). Additionally, we're preparing a (completely separate) patch and pull request to completely overhaul the scheduling logic. Basically, the current Queue.maintain() algorithm scales O(m*n), where m is the number of (non-blocked) jobs in the queue and n is the number of (non-busy) executors in the system. When you have a long queue and many executors, this function essentially life-locks, taking several seconds to finish. It is also inherently sequential, only one thread can do this at a time. And then, to complete the 1-2-knockout punch, you must remember that EVERY executor needs to run a Queue.maintain() before it can mark itself as free. Taking all of this together, this means that Queue.maintain() is currently the biggest hurdle for scaling up Jenkins. And without the patch above, the Web UI also completely locks up, as even the read functions needed to display the index page have to wait for Queue.maintain() to finish! Best regards, Martin.

Jesse Glick added a comment - 2015-03-24 21:55

The patch in ~~JENKINS-27565~~ means that the web UI will never block waiting for Queue.maintain. There may still be a need to optimize the implementation of that method to reduce server load, of course.

Jesse Glick added a comment - 2015-03-24 21:55 The patch in JENKINS-27565 means that the web UI will never block waiting for Queue.maintain . There may still be a need to optimize the implementation of that method to reduce server load, of course.

Hans-Juergen Hafner added a comment - 2016-03-01 10:08

Any news regarding this issue?

We added some timestamps and observed that RunExclusiveRunListener sometimes takes very long:

jenkins.advancedqueue.RunExclusiveThrottler$RunExclusiveRunListener@60aea85e started on Mar 1, 2016 10:16:08 AM
jenkins.advancedqueue.RunExclusiveThrottler$RunExclusiveRunListener@60aea85e done. Took 1 min 13 sec

Is this behaviour related to Queue maintenance algorithm?

Hans-Juergen Hafner added a comment - 2016-03-01 10:08 Any news regarding this issue? We added some timestamps and observed that RunExclusiveRunListener sometimes takes very long: jenkins.advancedqueue.RunExclusiveThrottler$RunExclusiveRunListener@60aea85e started on Mar 1, 2016 10:16:08 AM jenkins.advancedqueue.RunExclusiveThrottler$RunExclusiveRunListener@60aea85e done. Took 1 min 13 sec Is this behaviour related to Queue maintenance algorithm?

Daniel Beck added a comment - 2016-04-25 12:59

Does this issue still occur in recent Jenkins versions? If so, are plugins installed that do complex operations on the queue, e.g. Priority Sorter?

Daniel Beck added a comment - 2016-04-25 12:59 Does this issue still occur in recent Jenkins versions? If so, are plugins installed that do complex operations on the queue, e.g. Priority Sorter?

Martin Schröder added a comment - 2016-04-25 15:33

Hi Daniel.

The new LTS versions work much better, given that they include the general gist of my earlier patchset against Jenkins 1.509.4:

dropping of "synchronized" in the queue in favour of locks
unifying the running of the maintenance into only one thread, where "pile-ups" are avoided, because one maintenance can handle all requests that have accumulated during the last maintenance.

Both changes together sum up to a huge performance benefit. Additionally, we have internally tweaked the source code even more:

we use read/write locks instead of only exclusive locks
we made the Queue maintenance lazy, so that it only checks the bottom X items constantly and the others only irregularly
we made it so that the "canTake()" calls are cached, since each executor asks the same thing for each project for its node and this answer never changes during one Queue.maintain().
we unchained the "Cloud" feature from the Queue, which can dynamically bring nodes online when a label is overloaded. It is needless to lock the Queue for it to work right.

These, plus a few other, smaller changes have greatly improved UI responsiveness and overall job throughput.

Unfortunately, despite all this, a single Queue.maintain() can still take up to 2 minutes, once the Queue exceeds some 1000 entries on 150 build hosts (with about 3000 executors). While this is not purely Jenkins Core's fault (we use Queue priorization and a much more detailed and flexible job scheduler), it still shows that the maintenance algorithm itself is still woefully exponential in some areas.

Martin Schröder added a comment - 2016-04-25 15:33 Hi Daniel. The new LTS versions work much better, given that they include the general gist of my earlier patchset against Jenkins 1.509.4: dropping of "synchronized" in the queue in favour of locks unifying the running of the maintenance into only one thread, where "pile-ups" are avoided, because one maintenance can handle all requests that have accumulated during the last maintenance. Both changes together sum up to a huge performance benefit. Additionally, we have internally tweaked the source code even more: we use read/write locks instead of only exclusive locks we made the Queue maintenance lazy, so that it only checks the bottom X items constantly and the others only irregularly we made it so that the "canTake()" calls are cached, since each executor asks the same thing for each project for its node and this answer never changes during one Queue.maintain(). we unchained the "Cloud" feature from the Queue, which can dynamically bring nodes online when a label is overloaded. It is needless to lock the Queue for it to work right. These, plus a few other, smaller changes have greatly improved UI responsiveness and overall job throughput. Unfortunately, despite all this, a single Queue.maintain() can still take up to 2 minutes, once the Queue exceeds some 1000 entries on 150 build hosts (with about 3000 executors). While this is not purely Jenkins Core's fault (we use Queue priorization and a much more detailed and flexible job scheduler), it still shows that the maintenance algorithm itself is still woefully exponential in some areas.

Daniel Beck added a comment - 2016-04-25 15:38

Pinging stephenconnolly maybe some of the suggestions from mhschroe's comment could be implemented in Jenkins?

Daniel Beck added a comment - 2016-04-25 15:38 Pinging stephenconnolly maybe some of the suggestions from mhschroe 's comment could be implemented in Jenkins?

Tidhar Klein Orbach added a comment - 2016-08-07 13:17

Hi
any update regarding this?
we want to upgrade to 2.7.1, does this issue still occur on that version?

thanks

Tidhar Klein Orbach added a comment - 2016-08-07 13:17 Hi any update regarding this? we want to upgrade to 2.7.1, does this issue still occur on that version? thanks

Oleg Nenashev added a comment - 2016-08-07 14:55

tizkiko The queue engine has been significantly optimized in 1.609.x.
There is still an area for improvement, but current versions have much better throughput with default settings

Oleg Nenashev added a comment - 2016-08-07 14:55 tizkiko The queue engine has been significantly optimized in 1.609.x. There is still an area for improvement, but current versions have much better throughput with default settings

Florian Manschwetus added a comment - 2017-03-20 09:55

Hi,

it seems that we hit this issue too, we schedule all of our integration test in one parallel statement (644 so far), we have done this quite often for a smaller block of tasks (<100) without serious problems, last weekend we extended to run full regression tests including 644 individual test suites. Doing so we observed, Jenkins getting to time-laps like execution, each individual pipeline statement takes minutes till the output makes it to build console, delaying the next step, therefore the full test execution is slowed down massively, leaving the test system which should be loaded as much as possible almost idle!

Regards,

Florian

Florian Manschwetus added a comment - 2017-03-20 09:55 Hi, it seems that we hit this issue too, we schedule all of our integration test in one parallel statement (644 so far), we have done this quite often for a smaller block of tasks (<100) without serious problems, last weekend we extended to run full regression tests including 644 individual test suites. Doing so we observed, Jenkins getting to time-laps like execution, each individual pipeline statement takes minutes till the output makes it to build console, delaying the next step, therefore the full test execution is slowed down massively, leaving the test system which should be loaded as much as possible almost idle! Regards, Florian

Oleg Nenashev added a comment - 2017-03-20 10:02

manschwetus I am not convinced this issue is related to the Queue. "each individual pipeline statement takes minutes till the output makes it to build console" likely means that the build gets to the executor (Queue is passed), but then hangs on the startup. I am not sure what is the root cause, maybe requires another ticket to Pipeline (CC jglick)

Oleg Nenashev added a comment - 2017-03-20 10:02 manschwetus I am not convinced this issue is related to the Queue. "each individual pipeline statement takes minutes till the output makes it to build console" likely means that the build gets to the executor (Queue is passed), but then hangs on the startup. I am not sure what is the root cause, maybe requires another ticket to Pipeline (CC jglick )

Florian Manschwetus added a comment - 2017-03-20 10:06

Hi Oleg,

maybe I missed an important detail, the master has a java process running at 25%, which essentially means, as this is a 4 core system, that a single thread of the master loads one core at 100% and runs therefore CPU limited, the slave really executing the task is completely idle.

Florian Manschwetus added a comment - 2017-03-20 10:06 Hi Oleg, maybe I missed an important detail, the master has a java process running at 25%, which essentially means, as this is a 4 core system, that a single thread of the master loads one core at 100% and runs therefore CPU limited, the slave really executing the task is completely idle.

Oleg Nenashev added a comment - 2017-03-20 10:10

manschwetus maybe. But I need to know which thread consumes CPU (and to get the stacktrace of course). E.g. you can get this infofrom the monitoring plugin or from JDK tools. Before that I cannot say anything

Oleg Nenashev added a comment - 2017-03-20 10:10 manschwetus maybe. But I need to know which thread consumes CPU (and to get the stacktrace of course). E.g. you can get this infofrom the monitoring plugin or from JDK tools. Before that I cannot say anything

Florian Manschwetus added a comment - 2017-03-20 10:52 - edited

Hi Oleg,

could you elaborate a bit on how to collect the information, maybe there is a readme or FAQ link you could provide?

Regards,

Florian

Update, I installed Monitoring plugin, works

Florian Manschwetus added a comment - 2017-03-20 10:52 - edited Hi Oleg, could you elaborate a bit on how to collect the information, maybe there is a readme or FAQ link you could provide? Regards, Florian Update, I installed Monitoring plugin, works

Florian Manschwetus added a comment - 2017-03-20 11:00

Hi Oleg,

the copyLogs seems to be extremely busy, is it possible that it iterates over all branches of the pipeline and not only the active ones?

Florian Manschwetus added a comment - 2017-03-20 11:00 Hi Oleg, the copyLogs seems to be extremely busy, is it possible that it iterates over all branches of the pipeline and not only the active ones?

Florian Manschwetus added a comment - 2017-03-20 11:14

Here is the Stack trace of the significantly most busy thread:

WorkflowRun.copyLogs [#5]
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(Unknown Source)
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(Unknown Source)
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(Unknown Source)
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(Unknown Source)
java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
java.lang.Thread.run(Unknown Source)

Florian Manschwetus added a comment - 2017-03-20 11:14 Here is the Stack trace of the significantly most busy thread: WorkflowRun.copyLogs [#5] sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(Unknown Source) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(Unknown Source) java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(Unknown Source) java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(Unknown Source) java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source) java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) java.lang.Thread.run(Unknown Source)

Florian Manschwetus added a comment - 2017-03-20 11:46

Hi Oleg,

it seems to spend a lot of time in LinearBlockHoppingScanner:120-123

WorkflowRun.copyLogs [#5] (test/dev #127)
org.jenkinsci.plugins.workflow.graphanalysis.LinearBlockHoppingScanner.next(LinearBlockHoppingScanner.java:123)
org.jenkinsci.plugins.workflow.graphanalysis.AbstractFlowScanner.next(AbstractFlowScanner.java:212)
org.jenkinsci.plugins.workflow.graphanalysis.AbstractFlowScanner.next(AbstractFlowScanner.java:94)
org.jenkinsci.plugins.workflow.graphanalysis.AbstractFlowScanner.findFirstMatch(AbstractFlowScanner.java:255)
org.jenkinsci.plugins.workflow.graphanalysis.LinearScanner.findFirstMatch(LinearScanner.java:135)
org.jenkinsci.plugins.workflow.graphanalysis.AbstractFlowScanner.findFirstMatch(AbstractFlowScanner.java:274)
org.jenkinsci.plugins.workflow.support.actions.LogActionImpl.isRunning(LogActionImpl.java:153)
org.jenkinsci.plugins.workflow.support.actions.LogActionImpl.getLogText(LogActionImpl.java:128)
org.jenkinsci.plugins.workflow.job.WorkflowRun.copyLogs(WorkflowRun.java:441)
org.jenkinsci.plugins.workflow.job.WorkflowRun.access$600(WorkflowRun.java:125)
org.jenkinsci.plugins.workflow.job.WorkflowRun$3.run(WorkflowRun.java:313)
java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
java.util.concurrent.FutureTask.runAndReset(Unknown Source)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown Source)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
java.lang.Thread.run(Unknown Source)

Florian Manschwetus added a comment - 2017-03-20 11:46 Hi Oleg, it seems to spend a lot of time in LinearBlockHoppingScanner:120-123 WorkflowRun.copyLogs [#5] (test/dev #127) org.jenkinsci.plugins.workflow.graphanalysis.LinearBlockHoppingScanner.next(LinearBlockHoppingScanner.java:123) org.jenkinsci.plugins.workflow.graphanalysis.AbstractFlowScanner.next(AbstractFlowScanner.java:212) org.jenkinsci.plugins.workflow.graphanalysis.AbstractFlowScanner.next(AbstractFlowScanner.java:94) org.jenkinsci.plugins.workflow.graphanalysis.AbstractFlowScanner.findFirstMatch(AbstractFlowScanner.java:255) org.jenkinsci.plugins.workflow.graphanalysis.LinearScanner.findFirstMatch(LinearScanner.java:135) org.jenkinsci.plugins.workflow.graphanalysis.AbstractFlowScanner.findFirstMatch(AbstractFlowScanner.java:274) org.jenkinsci.plugins.workflow.support.actions.LogActionImpl.isRunning(LogActionImpl.java:153) org.jenkinsci.plugins.workflow.support.actions.LogActionImpl.getLogText(LogActionImpl.java:128) org.jenkinsci.plugins.workflow.job.WorkflowRun.copyLogs(WorkflowRun.java:441) org.jenkinsci.plugins.workflow.job.WorkflowRun.access$600(WorkflowRun.java:125) org.jenkinsci.plugins.workflow.job.WorkflowRun$3.run(WorkflowRun.java:313) java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) java.util.concurrent.FutureTask.runAndReset(Unknown Source) java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown Source) java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) java.lang.Thread.run(Unknown Source)

Florian Manschwetus added a comment - 2017-03-20 11:58

Hi Oleg,

maybe calling Collection.contains in a loop is not a good idea if this is done quite often for larger sets O(n^2)?

https://github.com/jenkinsci/workflow-api-plugin/blob/master/src/main/java/org/jenkinsci/plugins/workflow/graphanalysis/LinearBlockHoppingScanner.java#L123

Regards,

Florian

Florian Manschwetus added a comment - 2017-03-20 11:58 Hi Oleg, maybe calling Collection.contains in a loop is not a good idea if this is done quite often for larger sets O(n^2)? https://github.com/jenkinsci/workflow-api-plugin/blob/master/src/main/java/org/jenkinsci/plugins/workflow/graphanalysis/LinearBlockHoppingScanner.java#L123 Regards, Florian

Jesse Glick added a comment - 2017-03-21 13:23

manschwetus your issue is totally unrelated: ~~JENKINS-40934~~

Jesse Glick added a comment - 2017-03-21 13:23 manschwetus your issue is totally unrelated: JENKINS-40934

Oleg Nenashev added a comment - 2017-10-14 09:42

I doubt it makes sense to keep this issue open. The queue has significantly changed in 1.609.x, and there are other performance tweaks like the recent https://github.com/jenkinsci/jenkins/pull/3038/files jimilian. The inputs here are probably not relevant anymore

I would just close this issue as Resolved

Oleg Nenashev added a comment - 2017-10-14 09:42 I doubt it makes sense to keep this issue open. The queue has significantly changed in 1.609.x, and there are other performance tweaks like the recent https://github.com/jenkinsci/jenkins/pull/3038/files jimilian . The inputs here are probably not relevant anymore I would just close this issue as Resolved

SCM/JIRA link daemon added a comment - 2017-10-15 07:21

Code changed in jenkins
User: Akbashev Alexander
Path:
core/src/main/java/hudson/model/Queue.java
test/src/test/java/hudson/model/QueueTest.java
http://jenkins-ci.org/commit/jenkins/be0238644911948da4123b5338f0299198dcc048
Log:
~~JENKINS-20046~~ - Do not query queue dispatchers from UI (#3038)

Do not query queue dispatchers from UI

Address comments from review

Restore old constructors and mark them as @deprecated
Optimise query from UI even more

Check non-concurrent builds in getCauseOfBlockageForItem

SCM/JIRA link daemon added a comment - 2017-10-15 07:21 Code changed in jenkins User: Akbashev Alexander Path: core/src/main/java/hudson/model/Queue.java test/src/test/java/hudson/model/QueueTest.java http://jenkins-ci.org/commit/jenkins/be0238644911948da4123b5338f0299198dcc048 Log: JENKINS-20046 - Do not query queue dispatchers from UI (#3038) Do not query queue dispatchers from UI Address comments from review Restore old constructors and mark them as @deprecated Optimise query from UI even more Check non-concurrent builds in getCauseOfBlockageForItem

Daniel Beck added a comment - 2017-10-23 13:07

Does jimilian's change towards 2.85 resolve this issue?

Daniel Beck added a comment - 2017-10-23 13:07 Does jimilian 's change towards 2.85 resolve this issue?

Florian Straub added a comment - 2017-12-04 13:53 - edited

We still have this behavior running v2.92 ...

Update: ... but actually we managed to fix it by enabling LDAP caching in "global security"!

Florian Straub added a comment - 2017-12-04 13:53 - edited We still have this behavior running v2.92 ... Update: ... but actually we managed to fix it by enabling LDAP caching in "global security"!

Oleg Nenashev added a comment - 2019-03-19 14:25

I am closing this issue since there were A LOT of queue performance patches applied since 1.509.x . If anybody sees performance degradation on recent versions of Jenkins, I suggest to proceed in new tickets so that we can handle other cases separately

Oleg Nenashev added a comment - 2019-03-19 14:25 I am closing this issue since there were A LOT of queue performance patches applied since 1.509.x . If anybody sees performance degradation on recent versions of Jenkins, I suggest to proceed in new tickets so that we can handle other cases separately

Assignee:: Unassigned

Reporter:: Martin Schröder

Votes:: 13 Vote for this issue

Watchers:: 22 Start watching this issue

Created:: 2013-10-15 10:21

Updated:: 2019-06-19 16:11

Resolved:: 2019-03-19 14:25

Jenkins

Details

Description

Attachments

Issue Links

Activity

Collapse comment: Daniel Beck added a comment - 2014-05-18 11:24

Expand comment: Daniel Beck added a comment - 2014-05-18 11:24

Collapse comment: Oleg Nenashev added a comment - 2014-05-19 05:50

Expand comment: Oleg Nenashev added a comment - 2014-05-19 05:50

Collapse comment: Martin Schröder added a comment - 2014-05-21 09:27

Expand comment: Martin Schröder added a comment - 2014-05-21 09:27

Collapse comment: Oleg Nenashev added a comment - 2014-05-21 10:10, Edited by Oleg Nenashev - 2014-05-21 10:11

Expand comment: Oleg Nenashev added a comment - 2014-05-21 10:10, Edited by Oleg Nenashev - 2014-05-21 10:11

Collapse comment: Hans-Juergen Hafner added a comment - 2014-11-20 17:53

Expand comment: Hans-Juergen Hafner added a comment - 2014-11-20 17:53

Collapse comment: Martin Schröder added a comment - 2014-11-21 17:14, Edited by Martin Schröder - 2014-11-21 17:16

Expand comment: Martin Schröder added a comment - 2014-11-21 17:14, Edited by Martin Schröder - 2014-11-21 17:16

Collapse comment: Jesse Glick added a comment - 2015-03-24 21:55

Expand comment: Jesse Glick added a comment - 2015-03-24 21:55

Collapse comment: Hans-Juergen Hafner added a comment - 2016-03-01 10:08

Expand comment: Hans-Juergen Hafner added a comment - 2016-03-01 10:08

Collapse comment: Daniel Beck added a comment - 2016-04-25 12:59

Expand comment: Daniel Beck added a comment - 2016-04-25 12:59

Collapse comment: Martin Schröder added a comment - 2016-04-25 15:33

Expand comment: Martin Schröder added a comment - 2016-04-25 15:33

Collapse comment: Daniel Beck added a comment - 2016-04-25 15:38

Expand comment: Daniel Beck added a comment - 2016-04-25 15:38

Collapse comment: Tidhar Klein Orbach added a comment - 2016-08-07 13:17

Expand comment: Tidhar Klein Orbach added a comment - 2016-08-07 13:17

Collapse comment: Oleg Nenashev added a comment - 2016-08-07 14:55

Expand comment: Oleg Nenashev added a comment - 2016-08-07 14:55

Collapse comment: Florian Manschwetus added a comment - 2017-03-20 09:55

Expand comment: Florian Manschwetus added a comment - 2017-03-20 09:55

Collapse comment: Oleg Nenashev added a comment - 2017-03-20 10:02

Expand comment: Oleg Nenashev added a comment - 2017-03-20 10:02

Collapse comment: Florian Manschwetus added a comment - 2017-03-20 10:06

Expand comment: Florian Manschwetus added a comment - 2017-03-20 10:06

Collapse comment: Oleg Nenashev added a comment - 2017-03-20 10:10

Expand comment: Oleg Nenashev added a comment - 2017-03-20 10:10

Collapse comment: Florian Manschwetus added a comment - 2017-03-20 10:52, Edited by Florian Manschwetus - 2017-03-20 10:57

Expand comment: Florian Manschwetus added a comment - 2017-03-20 10:52, Edited by Florian Manschwetus - 2017-03-20 10:57

Collapse comment: Florian Manschwetus added a comment - 2017-03-20 11:00

Expand comment: Florian Manschwetus added a comment - 2017-03-20 11:00

Collapse comment: Florian Manschwetus added a comment - 2017-03-20 11:14

Expand comment: Florian Manschwetus added a comment - 2017-03-20 11:14

Collapse comment: Florian Manschwetus added a comment - 2017-03-20 11:46

Expand comment: Florian Manschwetus added a comment - 2017-03-20 11:46

Collapse comment: Florian Manschwetus added a comment - 2017-03-20 11:58

Expand comment: Florian Manschwetus added a comment - 2017-03-20 11:58

Collapse comment: Jesse Glick added a comment - 2017-03-21 13:23

Expand comment: Jesse Glick added a comment - 2017-03-21 13:23

Collapse comment: Oleg Nenashev added a comment - 2017-10-14 09:42

Expand comment: Oleg Nenashev added a comment - 2017-10-14 09:42

Collapse comment: SCM/JIRA link daemon added a comment - 2017-10-15 07:21

Expand comment: SCM/JIRA link daemon added a comment - 2017-10-15 07:21

Collapse comment: Daniel Beck added a comment - 2017-10-23 13:07

Expand comment: Daniel Beck added a comment - 2017-10-23 13:07

Collapse comment: Florian Straub added a comment - 2017-12-04 13:53, Edited by Florian Straub - 2018-06-08 07:10

Expand comment: Florian Straub added a comment - 2017-12-04 13:53, Edited by Florian Straub - 2018-06-08 07:10

Collapse comment: Oleg Nenashev added a comment - 2019-03-19 14:25

Expand comment: Oleg Nenashev added a comment - 2019-03-19 14:25

People

Dates