Status: Open (View Workflow)
Environment:openjdk version "1.8.0_191"
OpenJDK Runtime Environment (build 1.8.0_191-b12)
OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode)
OS: Amazon Linux 2018.03
Bitbucket branch source: 2.2.12, 2.4.0, 2.4.2
A few times a day the executors in our build agents are filling up and not "releasing" completed tasks. The job will finish and be marked as success or failure, but the executors are still listed as active in the node, causing other jobs to queue up waiting for a free slot.
After tracing through this for a while I think the problem is being caused by the Bitbucket Branch Source plugin, this thought is also being backed up by the fact the none of the jobs that have their code in Github are displaying this behaviour.
From looking at a thread dump when this happens I can see we will have 10 threads stuck in TIMED_WAITING "com.cloudbees.jenkins.plugins.bitbucket.hooks.PullRequestHookProcessor$1 Thu Mar 07 15:45:43 UTC 2019 / jenkins.util.Timer 1" Id=20 Group=main TIMED_WAITING" (I have attached a partial thread dump from when this happened yesterday (taken when we rolled back to plugin version 2.2.12)).
I'm not sure what is causing this behaviour or how to resolve it, so any help would be greatly appreciated. I have looked at the plugin source code referenced in the stacktrace in the thread dump and can see that it's happening because of an API rate limit, but our jobs shouldn't be going to Bitbucket more than two to thrree times per build (initial receipt of webook, checkout of code, and a possible push of a tag), definitely not every task will need bitbucket, but each individual task is holding on to it's executor. The attached image Screenshot-stuck-executors.png shows a section of the executor status when this is happening (some of our older build jobs have node blocks every few commands, the newer jobs have one node block wrapping the whole job but they will still hang in the executor list).
This has been happening to us for over a month or so now and I can't think what config change other than plugin version we would have made. I have tried rolling back to a previous version of the plugin to see if that will fix our issue, but nothing has made a difference so far.
I can workaround this by increasing the number of executors in the affected build agent to a really large number and waiting for this to resolve itself in a few hours.