Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-23244

Slave build history page has no data and spawns a ton of very long-lived blocking threads on the master

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      So I went to try to see the usage for a slave on builds.apache.org, and the page had no builds on it. I eventually noticed the "Calculation in progress" bit and thought "Oh, ok, I'll leave this up and check again later". That was a mistake. Now there are 30+ threads on the master like the ones in https://gist.github.com/abayer/88e390e3f0859f8b64e2 - i.e., a whole ton of HTTP POST requests to /computer/foo/timeline/data, all but one blocking on the one that's running, and the one that's running takes a long time to finish.

      This means (a) that the build history page for a slave is useless and (b) that we're churning CPU/IO and, I'm guessing, doing so repeatedly without caching, since when I check it now, even an hour and a half later, there's no data on the page.

        Attachments

          Issue Links

            Activity

            Hide
            kohsuke Kohsuke Kawaguchi added a comment -

            Adjusting the priority since it only affects relatively unvisited pages of large deployments.

            Show
            kohsuke Kohsuke Kawaguchi added a comment - Adjusting the priority since it only affects relatively unvisited pages of large deployments.
            Hide
            kohsuke Kohsuke Kawaguchi added a comment -

            Looking at the thread dump, the call stack indicates this call resulted in loading all the build records (via AbstractLazyLoadRunMap.all), which looks suspicious.
            I'd think this operation would only require walking newer build records.

            "Handling POST /computer/hadoop4/timeline/data/ : http-bio-8090-exec-895" Id=22685 Group=main RUNNABLE
            	at java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
            	at java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:242)
            	at java.io.File.exists(File.java:813)
            	at hudson.model.RunMap.retrieve(RunMap.java:219)
            	at hudson.model.RunMap.retrieve(RunMap.java:59)
            	at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:687)
            	-  locked hudson.model.RunMap@3fa6ce65
            	at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:649)
            	-  locked hudson.model.RunMap@3fa6ce65
            	at jenkins.model.lazy.AbstractLazyLoadRunMap.search(AbstractLazyLoadRunMap.java:381)
            	at hudson.model.AbstractBuild.getPreviousBuild(AbstractBuild.java:219)
            	at hudson.tasks.Fingerprinter$FingerprintAction.compact(Fingerprinter.java:360)
            	at hudson.tasks.Fingerprinter$FingerprintAction.onLoad(Fingerprinter.java:349)
            	at hudson.model.Run.onLoad(Run.java:337)
            	at hudson.model.RunMap.retrieve(RunMap.java:223)
            	at hudson.model.RunMap.retrieve(RunMap.java:59)
            	at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:687)
            	at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:670)
            	at jenkins.model.lazy.AbstractLazyLoadRunMap.all(AbstractLazyLoadRunMap.java:622)
            	-  locked hudson.model.RunMap@3fa6ce65
            	at jenkins.model.lazy.AbstractLazyLoadRunMap.entrySet(AbstractLazyLoadRunMap.java:277)
            	at java.util.AbstractMap$2$1.<init>(AbstractMap.java:378)
            	at java.util.AbstractMap$2.iterator(AbstractMap.java:377)
            	at hudson.util.RunList.iterator(RunList.java:97)
            	at com.google.common.collect.Iterables$15.apply(Iterables.java:1128)
            	at com.google.common.collect.Iterables$15.apply(Iterables.java:1125)
            	at com.google.common.collect.Iterators$8.next(Iterators.java:812)
            	at com.google.common.collect.Iterators$MergingIterator.<init>(Iterators.java:1306)
            	at com.google.common.collect.Iterators.mergeSorted(Iterators.java:1274)
            	at com.google.common.collect.Iterables$14.iterator(Iterables.java:1113)
            	at com.google.common.collect.Iterables$UnmodifiableIterable.iterator(Iterables.java:94)
            	at com.google.common.collect.Iterables$6.iterator(Iterables.java:585)
            	at hudson.util.RunList$2.iterator(RunList.java:210)
            	at hudson.util.RunList$2.iterator(RunList.java:210)
            	at com.google.common.collect.Iterables$6.iterator(Iterables.java:585)
            	at hudson.util.RunList.iterator(RunList.java:97)
            	at hudson.model.BuildTimelineWidget.doData(BuildTimelineWidget.java:63)
            
            Show
            kohsuke Kohsuke Kawaguchi added a comment - Looking at the thread dump, the call stack indicates this call resulted in loading all the build records (via AbstractLazyLoadRunMap.all ), which looks suspicious. I'd think this operation would only require walking newer build records. "Handling POST /computer/hadoop4/timeline/data/ : http-bio-8090-exec-895" Id=22685 Group=main RUNNABLE at java.io.UnixFileSystem.getBooleanAttributes0(Native Method) at java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:242) at java.io.File.exists(File.java:813) at hudson.model.RunMap.retrieve(RunMap.java:219) at hudson.model.RunMap.retrieve(RunMap.java:59) at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:687) - locked hudson.model.RunMap@3fa6ce65 at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:649) - locked hudson.model.RunMap@3fa6ce65 at jenkins.model.lazy.AbstractLazyLoadRunMap.search(AbstractLazyLoadRunMap.java:381) at hudson.model.AbstractBuild.getPreviousBuild(AbstractBuild.java:219) at hudson.tasks.Fingerprinter$FingerprintAction.compact(Fingerprinter.java:360) at hudson.tasks.Fingerprinter$FingerprintAction.onLoad(Fingerprinter.java:349) at hudson.model.Run.onLoad(Run.java:337) at hudson.model.RunMap.retrieve(RunMap.java:223) at hudson.model.RunMap.retrieve(RunMap.java:59) at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:687) at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:670) at jenkins.model.lazy.AbstractLazyLoadRunMap.all(AbstractLazyLoadRunMap.java:622) - locked hudson.model.RunMap@3fa6ce65 at jenkins.model.lazy.AbstractLazyLoadRunMap.entrySet(AbstractLazyLoadRunMap.java:277) at java.util.AbstractMap$2$1.<init>(AbstractMap.java:378) at java.util.AbstractMap$2.iterator(AbstractMap.java:377) at hudson.util.RunList.iterator(RunList.java:97) at com.google.common.collect.Iterables$15.apply(Iterables.java:1128) at com.google.common.collect.Iterables$15.apply(Iterables.java:1125) at com.google.common.collect.Iterators$8.next(Iterators.java:812) at com.google.common.collect.Iterators$MergingIterator.<init>(Iterators.java:1306) at com.google.common.collect.Iterators.mergeSorted(Iterators.java:1274) at com.google.common.collect.Iterables$14.iterator(Iterables.java:1113) at com.google.common.collect.Iterables$UnmodifiableIterable.iterator(Iterables.java:94) at com.google.common.collect.Iterables$6.iterator(Iterables.java:585) at hudson.util.RunList$2.iterator(RunList.java:210) at hudson.util.RunList$2.iterator(RunList.java:210) at com.google.common.collect.Iterables$6.iterator(Iterables.java:585) at hudson.util.RunList.iterator(RunList.java:97) at hudson.model.BuildTimelineWidget.doData(BuildTimelineWidget.java:63)
            Hide
            abayer Andrew Bayer added a comment -

            fwiw, it's now looking a lot better - no blocked threads, build history's showing up for all slaves now, so far as I can tell.

            Show
            abayer Andrew Bayer added a comment - fwiw, it's now looking a lot better - no blocked threads, build history's showing up for all slaves now, so far as I can tell.
            Hide
            danielbeck Daniel Beck added a comment -

            Andrew Bayer: What changed?

            Show
            danielbeck Daniel Beck added a comment - Andrew Bayer : What changed?
            Hide
            abayer Andrew Bayer added a comment -

            Nothing - just time after startup and first attempt to load it.

            Show
            abayer Andrew Bayer added a comment - Nothing - just time after startup and first attempt to load it.
            Hide
            jglick Jesse Glick added a comment -

            Yup.

            Show
            jglick Jesse Glick added a comment - Yup.
            Hide
            pupssman Ivan Kalinin added a comment -

            We are still experiencing great deal of trouble with slave buld history thing.

            I just tried to open that for one slave and got all the Jenkins master locked up UI-side.

            The thread that calls `AbstractLazyLoadRunMap.load` goes on foverer (yes, we have a great deal of builds), but somehow other threads from the UI pool keep getting locked. Eventually, Jenkins became unresponsive altogether – but the jobs were still running.

            Maybe we could use a separate thread pool for this kind of stuff so it wont lock all the UI threads?

            BTW, we are running current LTS

            Show
            pupssman Ivan Kalinin added a comment - We are still experiencing great deal of trouble with slave buld history thing. I just tried to open that for one slave and got all the Jenkins master locked up UI-side. The thread that calls `AbstractLazyLoadRunMap.load` goes on foverer (yes, we have a great deal of builds), but somehow other threads from the UI pool keep getting locked. Eventually, Jenkins became unresponsive altogether – but the jobs were still running. Maybe we could use a separate thread pool for this kind of stuff so it wont lock all the UI threads? BTW, we are running current LTS
            Hide
            lkarnasiewicz Lukasz Karnasiewicz added a comment -

            Steps to reproduce:
            1. Display slave builds history page. Wait for it to render, there should be a small progress bar with "Computation in progress" hint
            2. Request any other page (e.g. the main page) - it will hang

            Sample thread dump illustrating the problem attached.
            Thread 30745 is processing request for slave builds history (http://jenkins/computer/slave_name/builds)
            All other requests now hang on jenkins.model.lazy.AbstractLazyLoadRunMap.load for up to 2 minutes in our case.

            Show
            lkarnasiewicz Lukasz Karnasiewicz added a comment - Steps to reproduce: 1. Display slave builds history page. Wait for it to render, there should be a small progress bar with "Computation in progress" hint 2. Request any other page (e.g. the main page) - it will hang Sample thread dump illustrating the problem attached. Thread 30745 is processing request for slave builds history ( http://jenkins/computer/slave_name/builds ) All other requests now hang on jenkins.model.lazy.AbstractLazyLoadRunMap.load for up to 2 minutes in our case.
            Hide
            mmitche Matthew Mitchell added a comment -

            I'm seeing this in our installation. It severely impacts the repsonsiveness of the system.

            Show
            mmitche Matthew Mitchell added a comment - I'm seeing this in our installation. It severely impacts the repsonsiveness of the system.
            Hide
            mmitche Matthew Mitchell added a comment -

            (FYI this installation is around 6-7k builds a day)

            Even in the case of walking newer builds, it seems like this woudl be super expensive. Maybe it's better to keep an index of buildname/number to machine to avoid loading the metadata at all?

            Show
            mmitche Matthew Mitchell added a comment - (FYI this installation is around 6-7k builds a day) Even in the case of walking newer builds, it seems like this woudl be super expensive. Maybe it's better to keep an index of buildname/number to machine to avoid loading the metadata at all?
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Akbashev Alexander
            Path:
            core/src/main/resources/hudson/model/BuildTimelineWidget/control.jelly
            http://jenkins-ci.org/commit/jenkins/2a0ac4f0989407a20e277444a7737e9c5f7ea78a
            Log:
            [FIX JENKINS-23244] Slave build history page has no data and spawns a ton of very long-lived blocking threads on the master (#2584)

            Mainly commit are doing two things:
            1) Show only selected (visible) builds
            2) Query build one-by-one - not it parallel

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Akbashev Alexander Path: core/src/main/resources/hudson/model/BuildTimelineWidget/control.jelly http://jenkins-ci.org/commit/jenkins/2a0ac4f0989407a20e277444a7737e9c5f7ea78a Log: [FIX JENKINS-23244] Slave build history page has no data and spawns a ton of very long-lived blocking threads on the master (#2584) Mainly commit are doing two things: 1) Show only selected (visible) builds 2) Query build one-by-one - not it parallel
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Akbashev Alexander
            Path:
            core/src/main/resources/hudson/model/BuildTimelineWidget/control.jelly
            http://jenkins-ci.org/commit/jenkins/4421d1b94d143956475f20a03c63fb1a367321f2
            Log:
            [FIX JENKINS-23244] Slave build history page has no data and spawns a ton of very long-lived blocking threads on the master (#2584)

            Mainly commit are doing two things:
            1) Show only selected (visible) builds
            2) Query build one-by-one - not it parallel
            (cherry picked from commit 2a0ac4f0989407a20e277444a7737e9c5f7ea78a)

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Akbashev Alexander Path: core/src/main/resources/hudson/model/BuildTimelineWidget/control.jelly http://jenkins-ci.org/commit/jenkins/4421d1b94d143956475f20a03c63fb1a367321f2 Log: [FIX JENKINS-23244] Slave build history page has no data and spawns a ton of very long-lived blocking threads on the master (#2584) Mainly commit are doing two things: 1) Show only selected (visible) builds 2) Query build one-by-one - not it parallel (cherry picked from commit 2a0ac4f0989407a20e277444a7737e9c5f7ea78a)

              People

              Assignee:
              jimilian Alexander A
              Reporter:
              abayer Andrew Bayer
              Votes:
              8 Vote for this issue
              Watchers:
              18 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: