Details
-
Type:
Bug
-
Status: Open (View Workflow)
-
Priority:
Major
-
Resolution: Unresolved
-
Component/s: disk-usage-plugin
-
Labels:
-
Similar Issues:
Description
The attached thread dump shows two deadlocked threads:
- "jenkins.util.Timer 7" daemon prio=5 BLOCKED
- "Executor #1 for master : executing DSL Job Builder #390" daemon prio=5 BLOCKED)
Jenkins 1.650
This happens as follows:
- I use the Job DSL plugin to create a bunch of jobs. Many have SCM Poll trigger enabled, causing them to run instantly as soon as they are created (possibly also when updated, looking at the stacktrace).
- As part of job creation, Job.updateNextBuildNumber is called (Next Build Number plugin integrated with Job DSL)
The two events above create deadlock.
The first one locks Job since updateNextBuildNumber() is synchronized. Then it calls AbstractLazyLoadRunMap.getByNumber(). The synchronized block in this method is new as of 1.646(https://github.com/jenkinsci/jenkins/commit/d5167025a204750633c931ea8c1fff8d7561ab9c#diff-383116e240993025e5b727359e61db09)
The second one calls AbstractLazyLoadRunMap.getByNumber() first, which causes a call to AbstractProject.save(), but Job is an AbstractProject... so, deadlock.
Here are the relevant parts for convenience, bold elements are causing deadlock:
"Executor #1 for master : executing DSL Job Builder #390" daemon prio=5 BLOCKED
jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:356)
jenkins.model.lazy.AbstractLazyLoadRunMap.search(AbstractLazyLoadRunMap.java:332)
jenkins.model.lazy.AbstractLazyLoadRunMap.newestBuild(AbstractLazyLoadRunMap.java:274)
jenkins.model.lazy.LazyBuildMixIn.getLastBuild(LazyBuildMixIn.java:238)
hudson.model.AbstractProject.getLastBuild(AbstractProject.java:993)
hudson.model.AbstractProject.getLastBuild(AbstractProject.java:144)
hudson.model.Job.updateNextBuildNumber(Job.java:422)
org.jvnet.hudson.plugins.nextbuildnumber.JobDslExtension.notifyItemUpdated(JobDslExtension.java:30)
"jenkins.util.Timer 7" daemon prio=5 BLOCKED
hudson.model.AbstractProject.save(AbstractProject.java:305)
hudson.model.Job.addProperty(Job.java:523)
hudson.model.AbstractProject.addProperty(AbstractProject.java:785)
hudson.plugins.disk_usage.DiskUsageUtil.addProperty(DiskUsageUtil.java:58)
hudson.plugins.disk_usage.BuildDiskUsageAction.<init>(BuildDiskUsageAction.java:38)
hudson.plugins.disk_usage.DiskUsageBuildActionFactory.createFor(DiskUsageBuildActionFactory.java:31)
hudson.plugins.disk_usage.DiskUsageBuildActionFactory.createFor(DiskUsageBuildActionFactory.java:21)
hudson.model.Actionable.createFor(Actionable.java:107)
hudson.model.Actionable.getAllActions(Actionable.java:98)
hudson.model.Run.onLoad(Run.java:343)
hudson.model.RunMap.retrieve(RunMap.java:224)
hudson.model.RunMap.retrieve(RunMap.java:56)
jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:479)
jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:461)
jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:367)
jenkins.model.lazy.AbstractLazyLoadRunMap.search(AbstractLazyLoadRunMap.java:332)
jenkins.model.lazy.AbstractLazyLoadRunMap.newestBuild(AbstractLazyLoadRunMap.java:274)
jenkins.model.lazy.LazyBuildMixIn.getLastBuild(LazyBuildMixIn.java:238)
hudson.model.AbstractProject.getLastBuild(AbstractProject.java:993)
hudson.model.AbstractProject.getLastBuild(AbstractProject.java:144)
hudson.views.AbstractBuildTrendFilter.matches(AbstractBuildTrendFilter.java:71)
hudson.views.AbstractIncludeExcludeJobFilter.doFilter(AbstractIncludeExcludeJobFilter.java:68)
hudson.views.AbstractIncludeExcludeJobFilter.filter(AbstractIncludeExcludeJobFilter.java:57)
hudson.model.ListView.getItems(ListView.java:195)
hudson.model.ListView.getItems(ListView.java:67)
jenkins.advancedqueue.jobinclusion.strategy.ViewBasedJobInclusionStrategy.isJobInView(ViewBasedJobInclusionStrategy.java:182)
jenkins.advancedqueue.jobinclusion.strategy.ViewBasedJobInclusionStrategy.contains(ViewBasedJobInclusionStrategy.java:149)
jenkins.advancedqueue.PriorityConfiguration.getJobGroup(PriorityConfiguration.java:241)
jenkins.advancedqueue.PriorityConfiguration.getPriorityInternal(PriorityConfiguration.java:225)
jenkins.advancedqueue.PriorityConfiguration.getPriority(PriorityConfiguration.java:203)
jenkins.advancedqueue.sorter.AdvancedQueueSorter.onNewItem(AdvancedQueueSorter.java:136)
jenkins.advancedqueue.sorter.AdvancedQueueSorterQueueListener.onEnterWaiting(AdvancedQueueSorterQueueListener.java:46)
hudson.model.Queue$WaitingItem.enter(Queue.java:2348)
hudson.model.Queue.scheduleInternal(Queue.java:599)
hudson.model.Queue.schedule2(Queue.java:555)
jenkins.model.ParameterizedJobMixIn.scheduleBuild2(ParameterizedJobMixIn.java:138)
jenkins.model.ParameterizedJobMixIn.scheduleBuild(ParameterizedJobMixIn.java:94)
hudson.model.AbstractProject.scheduleBuild(AbstractProject.java:836)
My Job DSL setup is rather complicated and I have not been able to extract a simple test for this - if I do, I'll attach it.
Update: 1.645 also deadlocks. AbstractLazyLoadRunMap.load() is synchronized. I am confused about why this started happening once I upgraded from 1.645 but continues to happen once I downgraded back. Maybe the newer version of some other plugin is making this more likely (ie SVN plugin polls more agressively, etc)
Attachments
Issue Links
- is blocking
-
JENKINS-22767 AbstractLazyLoadRunMap.getById subject to race condition with .load
-
- Resolved
-
I've had similar situation (Jenkins 1.642.3) and have thread dump with information of acquired locks (attached).
disk-usage-plugin thread so some "computation" in synchronized section
while other requests are blocked on synchronized section:
I don't see a classic deadlock there. My suspicion is rather that the reason (in additional to newly added synchronized block to fix
JENKINS-22767) was large number of historical builds (7K+). It could just take some time for disk-usage-plugin to calculate required data (which the other several request threads for that project just was waiting to the first one to finish).However I'm not sure if it is related only to disk-usage-plugin. After its removal (and restart) I observed the same situation with the thread handling side panel request (below). I'm not sure why sidepanel.jelly should also take much time in that place.