• Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • core
    • None
    • Jenkins 1.611, prioritysorter 3.2

      Unknown if core or plugin issue.
      Had a group of jobs stuck waiting in the queue.
      Hovering over them shows triggers and wait time as usual, but no indication of what they're waiting for.

      Suspicious log entry:
      May 21, 2015 4:43:19 PM hudson.util.DescribableList buildDependencyGraph
      SEVERE: Failed to build dependency graph for hudson.model.FreeStyleProject@57fdabe[MyStuckJob]
      java.lang.NullPointerException
      at hudson.tasks.Fingerprinter$FingerprintAction.getFingerprints(Fingerprinter.java:373)
      at hudson.tasks.Fingerprinter$FingerprintAction.getDependencies(Fingerprinter.java:403)
      at hudson.tasks.Fingerprinter$FingerprintAction.getDependencies(Fingerprinter.java:390)
      at hudson.tasks.Fingerprinter.buildDependencyGraph(Fingerprinter.java:157)
      at hudson.util.DescribableList.buildDependencyGraph(DescribableList.java:219)
      at hudson.model.Project.buildDependencyGraph(Project.java:207)
      at hudson.model.DependencyGraph.build(DependencyGraph.java:95)
      at jenkins.model.Jenkins.rebuildDependencyGraph(Jenkins.java:3748)
      at jenkins.model.Jenkins$25.call(Jenkins.java:3770)
      at jenkins.model.Jenkins$25.call(Jenkins.java:3766)
      at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      at java.lang.Thread.run(Thread.java:745)

      Possibly caused by deleting some other jobs - they weren't supposed to be in the dependency tree, but coincidentally included identical files that were fingerprinted at some point.

      Queuing strategy is Weighted Fair Queuing.
      Weren't in the queue on startup, unless lack of JENKINS-28486 pulling them in would count.

          [JENKINS-28532] Jobs get stuck in the queue

          PrioritySorter will only order the Jobs in the Queue and will not interfere with execution/scheduling (if you do not use "Run Exclusive") so this does look like a problem somewhere else.

          Totally of the topic: You are the first one I see that uses anything else besides "Absolute" sorting - I would be happy if you can share your thought on the Queueing strategy, feel free to email me. Thanks,

          Magnus Sandberg added a comment - PrioritySorter will only order the Jobs in the Queue and will not interfere with execution/scheduling (if you do not use "Run Exclusive") so this does look like a problem somewhere else. Totally of the topic: You are the first one I see that uses anything else besides "Absolute" sorting - I would be happy if you can share your thought on the Queueing strategy, feel free to email me. Thanks,

          James Howe added a comment - - edited

          Downgraded plugin to 2.12. Restarted.
          Dependency graph errors not seen. Items already stuck in queue went through, though a few hours later they were stuck again.
          Status now shows as (pending—???)

          Manually deleted from disk fingerprints that mentioned jobs I'd recently deleted. Reloaded.
          As above.

          Opened and saved the config of a job that was stuck without making changes.
          Immediately all jobs that were stuck began executing.
          Next time the job came through the queue, it and some others got stuck again.

          James Howe added a comment - - edited Downgraded plugin to 2.12. Restarted. Dependency graph errors not seen. Items already stuck in queue went through, though a few hours later they were stuck again. Status now shows as (pending—???) Manually deleted from disk fingerprints that mentioned jobs I'd recently deleted. Reloaded. As above. Opened and saved the config of a job that was stuck without making changes. Immediately all jobs that were stuck began executing. Next time the job came through the queue, it and some others got stuck again.

          James Howe added a comment - - edited

          Doesn't appear to have gotten stuck again since then.

          Jobs got stuck again for a few hours. Fewer than before.
          This time clicking on one of them was enough to have them immediately execute.

          James Howe added a comment - - edited Doesn't appear to have gotten stuck again since then. Jobs got stuck again for a few hours. Fewer than before. This time clicking on one of them was enough to have them immediately execute.

          James Howe added a comment -

          Got stuck with a blank status, and a config save no longer pushes them though.
          Maybe there are two related issues here.

          May 28, 2015 2:12:32 PM FINE PrioritySorter.Queue.Items
          Blocking: Id: 1093, JobName: MyJob, jobGroupId: 1, reason: <none>, priority: 4, weight: 5.9999997E-4, status: BLOCKED

          James Howe added a comment - Got stuck with a blank status, and a config save no longer pushes them though. Maybe there are two related issues here. May 28, 2015 2:12:32 PM FINE PrioritySorter.Queue.Items Blocking: Id: 1093, JobName: MyJob, jobGroupId: 1, reason: <none>, priority: 4, weight: 5.9999997E-4, status: BLOCKED

          The log-line you have is telling us that the jenkins-core has changed status of the Job to BLOCKED - not more than that.

          Magnus Sandberg added a comment - The log-line you have is telling us that the jenkins-core has changed status of the Job to BLOCKED - not more than that.

          James Howe added a comment -

          I think the non-fixable blocking may be due to a detected circular dependency.
          Job A has job B listed as upstream, but job B should actually be a few layers downstream of job A.
          Both jobs, and all jobs downstream and in-between, are blocked with no message.

          This would have been computed using historical fingerprints?
          Any way to sort this out?
          At the moment I just have to cancel them all and manually trigger the bottom one.

          James Howe added a comment - I think the non-fixable blocking may be due to a detected circular dependency. Job A has job B listed as upstream, but job B should actually be a few layers downstream of job A. Both jobs, and all jobs downstream and in-between, are blocked with no message. This would have been computed using historical fingerprints? Any way to sort this out? At the moment I just have to cancel them all and manually trigger the bottom one.

          James Howe added a comment -

          I disabled hudson.tasks.Fingerprinter.enableFingerprintsInDependencyGraph, which I had not realised had been set.
          No further problems.

          I suspect it's purely using the timestamp of a fingerprint rather than any other relationship, which lead to my nonsensical dependencies.
          Perhaps there should be a way to detect these cycles, to remove the offending fingerprint, or to override the detected dependency.

          James Howe added a comment - I disabled hudson.tasks.Fingerprinter.enableFingerprintsInDependencyGraph, which I had not realised had been set. No further problems. I suspect it's purely using the timestamp of a fingerprint rather than any other relationship, which lead to my nonsensical dependencies. Perhaps there should be a way to detect these cycles, to remove the offending fingerprint, or to override the detected dependency.

          Daniel Beck added a comment -

          jameshowe Could you explain how to reproduce the issue on a new Jenkins instance? Even if that happened, there should be a user-visible explanation of the behavior. "???" isn't exactly helpful.

          Daniel Beck added a comment - jameshowe Could you explain how to reproduce the issue on a new Jenkins instance? Even if that happened, there should be a user-visible explanation of the behavior. "???" isn't exactly helpful.

          James Howe added a comment -

          Not easily, this instance has been up for many years, and anything could have happened to mix up the fingerprints.

          There are also two different cases I've detailed above.
          The one with the blank reason, and the one with "???".
          Each has a different workaround to recover.

          James Howe added a comment - Not easily, this instance has been up for many years, and anything could have happened to mix up the fingerprints. There are also two different cases I've detailed above. The one with the blank reason, and the one with "???". Each has a different workaround to recover.

            emsa23 Magnus Sandberg
            jameshowe James Howe
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: