Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-59793

Possible thread leak 'QueueSubTaskMetrics' in metrics - Allow finishing builds when SubTask badly fail

    XMLWordPrintable

Details

    • jenkins-2.205

    Description

      In one large instance we can see 3000 threads with this shape:

       

      "QueueSubTaskMetrics [#11342]" #5508106 daemon prio=5 os_prio=0 tid=0x00007efcf085a800 nid=0x52c7 in Object.wait() [0x00007efbccb32000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) at hudson.remoting.AsyncFutureImpl.get(AsyncFutureImpl.java:75) - locked <0x0000000512c01d10> (a hudson.model.queue.FutureImpl) at jenkins.metrics.impl.JenkinsMetricProviderImpl.lambda$asSupplier$3(JenkinsMetricProviderImpl.java:1142) at jenkins.metrics.impl.JenkinsMetricProviderImpl$$Lambda$388/1851215464.get(Unknown Source) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Locked ownable synchronizers: - <0x00000004dc401fb0> (a java.util.concurrent.ThreadPoolExecutor$Worker)

       

      There is way less number of jobs running or waiting (130) and the number of these threads never decreases.

      After investigating the code:

      • Metrics listens to the queue with a QueueListener creating threads with the name QueueSubTaskMetrics
      • The 3000 threads shown are all waiting for a future to be completed.
      while(!completed)
         wait();

      I think it could be avoided calling the get method with a timeout instead of getting blocked there: https://github.com/jenkinsci/metrics-plugin/commit/e803bc3b82b54bfe27e66a6393dedf53bdf1896e#diff-b02885f3ba6b4982b5322b73e664c0b6R1049

       

       

      Attachments

        Issue Links

          Activity

            mramonleon Ramon Leon added a comment -

            Added what has been done to the title for the sake of searches

            mramonleon Ramon Leon added a comment - Added what has been done to the title for the sake of searches

            mramonleon, thanks for the prompt fix of the issue. I would appreciate if you can report back whether this fully mediated the problem in affected production deployment.

            olivergondza Oliver Gondža added a comment - mramonleon , thanks for the prompt fix of the issue. I would appreciate if you can report back whether this fully mediated the problem in affected production deployment.
            mramonleon Ramon Leon added a comment -

            It will take some time to land, I can give you some feedback by the end of December

            mramonleon Ramon Leon added a comment - It will take some time to land, I can give you some feedback by the end of December

            We seem to see this issue with metrics. Will this be backported to the LTS release? olivergondza

            We are currently on 2.176.2

            raihaan Raihaan Shouhell added a comment - We seem to see this issue with metrics. Will this be backported to the LTS release? olivergondza We are currently on 2.176.2

            This was backported to 2.204.1. RC is expected to be out on Dec 4th, final release on Dec 18th.

            olivergondza Oliver Gondža added a comment - This was backported to 2.204.1. RC is expected to be out on Dec 4th, final release on Dec 18th.

            People

              mramonleon Ramon Leon
              mramonleon Ramon Leon
              Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: