Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-21044

Throttle Concurrent Builds blocking Jenkins queue

      Jenkins had stopped responding to browser requests for Jenkins pages and I think it may be caused by the recent upgrade to Throttle Concurrent Builds 1.8.1

      Requests were getting blocked waiting on 0x00000004181e4520

      "Handling GET /jenkins/ : RequestHandlerThread[#171]" daemon prio=10 tid=0x00000000168ee800 nid=0x193b waiting for monitor entry [0x000000004335b000]
         java.lang.Thread.State: BLOCKED (on object monitor)
      	at hudson.model.Queue.getItems(Queue.java:687)
      	- waiting to lock <0x00000004181e4520> (a hudson.model.Queue)
      	at hudson.model.Queue$CachedItemList.get(Queue.java:216)
      	at hudson.model.Queue.getApproximateItemsQuickly(Queue.java:717)
      	at hudson.model.View.getApproximateQueueItemsQuickly(View.java:483)
      	at sun.reflect.GeneratedMethodAccessor355.invoke(Unknown Source)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      

      This seems to be waiting on Throttle Concurrent Builds code.
      Further dumps after 10 minutes, 20 minutes and 30 minutes showed this same stack trace.

      "Thread-126" daemon prio=10 tid=0x00002aaae0529800 nid=0x1785 runnable [0x0000000046590000]
         java.lang.Thread.State: RUNNABLE
      	at java.util.WeakHashMap$HashIterator.hasNext(WeakHashMap.java:875)
      	at java.util.AbstractCollection.toArray(AbstractCollection.java:139)
      	at java.util.ArrayList.<init>(ArrayList.java:164)
      	at hudson.plugins.throttleconcurrents.ThrottleJobProperty.getCategoryProjects(ThrottleJobProperty.java:141)
      	- locked <0x000000041a79b778> (a java.util.HashMap)
      	at hudson.plugins.throttleconcurrents.ThrottleQueueTaskDispatcher.canRun(ThrottleQueueTaskDispatcher.java:118)
      	at hudson.plugins.throttleconcurrents.ThrottleQueueTaskDispatcher.canRun(ThrottleQueueTaskDispatcher.java:90)
      	at hudson.model.Queue.isBuildBlocked(Queue.java:937)
      	at hudson.model.Queue.maintain(Queue.java:1006)
      	- locked <0x00000004181e4520> (a hudson.model.Queue)
      	at hudson.model.Queue$1.call(Queue.java:303)
      	at hudson.model.Queue$1.call(Queue.java:300)
      	at jenkins.util.AtmostOneTaskExecutor$1.call(AtmostOneTaskExecutor.java:69)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at hudson.remoting.AtmostOneThreadExecutor$Worker.run(AtmostOneThreadExecutor.java:104)
      	at java.lang.Thread.run(Thread.java:724)
      
         Locked ownable synchronizers:
      	- None
      

      CPU usage was at ~100% for this thread for the 30 minutes that I was watching it before I restarted Jenkins.
      (6021 = 0x1785)

        PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                                                   
       6021 rcbuild_  35  10 18.0g 6.2g  32m R 99.7 19.7  30:45.97 java     
      

      I have rolled back to Throttle Concurrent Builds 1.8 for now.

      Still learning how to investigate thread dumps but please let me know if there is anything I can do to help.

          [JENKINS-21044] Throttle Concurrent Builds blocking Jenkins queue

          Oleg Nenashev added a comment -

          Finally, I have reproduced the issue.
          It occurs on configuration-reload operations with many jobs.

          • According to comments above, the internal categories cache is being dumped to hudson.plugins.throttleconcurrents.ThrottleJobProperty.xml
          • The property's loading operations are not synchronized
          • Seems that ThrottleQueueTaskDispatcher tries to access the data before the complete loading of the plugin's configuration. In my case it occurs during the queue's loading

          I'll create the PR soon.
          BTW, we will have to add an additional lock object to prevent issues on persistence.

          Oleg Nenashev added a comment - Finally, I have reproduced the issue. It occurs on configuration-reload operations with many jobs. According to comments above, the internal categories cache is being dumped to hudson.plugins.throttleconcurrents.ThrottleJobProperty.xml The property's loading operations are not synchronized Seems that ThrottleQueueTaskDispatcher tries to access the data before the complete loading of the plugin's configuration. In my case it occurs during the queue's loading I'll create the PR soon. BTW, we will have to add an additional lock object to prevent issues on persistence.

          Oleg Nenashev added a comment -

          Added the local build with a probable fix.
          https://github.com/jenkinsci/throttle-concurrent-builds-plugin/pull/13

          I've tried about 10 restarts with big queues, but the issue has not been reproduced. BTW, it would be great to have a light-weight unit test.

          Oleg Nenashev added a comment - Added the local build with a probable fix. https://github.com/jenkinsci/throttle-concurrent-builds-plugin/pull/13 I've tried about 10 restarts with big queues, but the issue has not been reproduced. BTW, it would be great to have a light-weight unit test.

          Jesse Glick added a comment -

          While that PR may work, I think it would be better to just make the cache be transient. I never had any intention of its being persisted to disk, so if that is what is happening, that was purely an accident (XStream automagically finding stuff and saving it). The cache would just need to be recreated if and when a job category is changed or the cache is requested, but that is pretty simple synchronization.

          Jesse Glick added a comment - While that PR may work, I think it would be better to just make the cache be transient. I never had any intention of its being persisted to disk, so if that is what is happening, that was purely an accident (XStream automagically finding stuff and saving it). The cache would just need to be recreated if and when a job category is changed or the cache is requested, but that is pretty simple synchronization.

          Oleg Nenashev added a comment -

          @Jesse
          I've made the cache transient.
          All other changes just provide the safe migration procedure if there is any cache data on the disk.
          The load procedure is synchronized to avoid concurrency. After that, the code just re-saves the configuration in order to purge wrong data from configs.

          Oleg Nenashev added a comment - @Jesse I've made the cache transient. All other changes just provide the safe migration procedure if there is any cache data on the disk. The load procedure is synchronized to avoid concurrency. After that, the code just re-saves the configuration in order to purge wrong data from configs.

          Dirk Kuypers added a comment -

          We are testing the attached version in our production environment since about 4 hours now. Works like a charm until now.

          We are consolidating two jenkins masters into one machine with about 15 Slaves now, 3000 jobs altogether and quite some continuously running jobs with concurrent builds that are heavily loading the about 100 cores. We even had severe problems with blocked threads when we rolled back to 1.8.0! Funny enough I was using the version 1.8.1 on "my" old master without problems before (even more jobs, same amount of nodes) and using the throttle concurrent builds plugin was "my" idea.

          Dirk Kuypers added a comment - We are testing the attached version in our production environment since about 4 hours now. Works like a charm until now. We are consolidating two jenkins masters into one machine with about 15 Slaves now, 3000 jobs altogether and quite some continuously running jobs with concurrent builds that are heavily loading the about 100 cores. We even had severe problems with blocked threads when we rolled back to 1.8.0! Funny enough I was using the version 1.8.1 on "my" old master without problems before (even more jobs, same amount of nodes) and using the throttle concurrent builds plugin was "my" idea.

          Oleg Nenashev added a comment -

          The new version works for me as well.

          @abayer, do you confirm the merge?

          Oleg Nenashev added a comment - The new version works for me as well. @abayer, do you confirm the merge?

          Code changed in jenkins
          User: Oleg Nenashev
          Path:
          src/main/java/hudson/plugins/throttleconcurrents/ThrottleJobProperty.java
          http://jenkins-ci.org/commit/throttle-concurrent-builds-plugin/9b7562d4b08e0a4202130d43307082553142df82
          Log:
          [FIXED JENKINS-21044] - Throttling blocks the Jenkins queue

          Seems the issue was in improper usage of WeakHashMap (see analysis from @centic).
          I've managed to reproduce the behavior in the following case:

          • There is a big number of jobs/configurations with throttling
          • The builds queue is not empty
            // Seems that ThrottleQueueTaskDispatcher tries to access the data before the complete loading of the plugin's configuration.

          This fix provides an explicit locking of any load operations + manual cleanup of erroneous cache data, which goes to persistence in 1.8.1
          Resolves https://issues.jenkins-ci.org/browse/JENKINS-21044

          Signed-off-by: Oleg Nenashev <o.v.nenashev@gmail.com>

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Oleg Nenashev Path: src/main/java/hudson/plugins/throttleconcurrents/ThrottleJobProperty.java http://jenkins-ci.org/commit/throttle-concurrent-builds-plugin/9b7562d4b08e0a4202130d43307082553142df82 Log: [FIXED JENKINS-21044] - Throttling blocks the Jenkins queue Seems the issue was in improper usage of WeakHashMap (see analysis from @centic). I've managed to reproduce the behavior in the following case: There is a big number of jobs/configurations with throttling The builds queue is not empty // Seems that ThrottleQueueTaskDispatcher tries to access the data before the complete loading of the plugin's configuration. This fix provides an explicit locking of any load operations + manual cleanup of erroneous cache data, which goes to persistence in 1.8.1 Resolves https://issues.jenkins-ci.org/browse/JENKINS-21044 Signed-off-by: Oleg Nenashev <o.v.nenashev@gmail.com>

          Code changed in jenkins
          User: Oleg Nenashev
          Path:
          src/main/java/hudson/plugins/throttleconcurrents/ThrottleJobProperty.java
          http://jenkins-ci.org/commit/throttle-concurrent-builds-plugin/c453516716079248d74ce588efc0293669e6e1a7
          Log:
          JENKINS-21044 - Don't create a new HashMap after the load operation

          Signed-off-by: Oleg Nenashev <o.v.nenashev@gmail.com>

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Oleg Nenashev Path: src/main/java/hudson/plugins/throttleconcurrents/ThrottleJobProperty.java http://jenkins-ci.org/commit/throttle-concurrent-builds-plugin/c453516716079248d74ce588efc0293669e6e1a7 Log: JENKINS-21044 - Don't create a new HashMap after the load operation Signed-off-by: Oleg Nenashev <o.v.nenashev@gmail.com>

          Code changed in jenkins
          User: Andrew Bayer
          Path:
          src/main/java/hudson/plugins/throttleconcurrents/ThrottleJobProperty.java
          http://jenkins-ci.org/commit/throttle-concurrent-builds-plugin/70107b4222502935a9e46beffa31daae2e99e50b
          Log:
          Merge pull request #13 from synopsys-arc-oss/JENKINS_21044_fix

          [FIXED JENKINS-21044] - Throttling blocks the Jenkins queue

          Compare: https://github.com/jenkinsci/throttle-concurrent-builds-plugin/compare/dc16282a90b7...70107b422250

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Andrew Bayer Path: src/main/java/hudson/plugins/throttleconcurrents/ThrottleJobProperty.java http://jenkins-ci.org/commit/throttle-concurrent-builds-plugin/70107b4222502935a9e46beffa31daae2e99e50b Log: Merge pull request #13 from synopsys-arc-oss/JENKINS_21044_fix [FIXED JENKINS-21044] - Throttling blocks the Jenkins queue Compare: https://github.com/jenkinsci/throttle-concurrent-builds-plugin/compare/dc16282a90b7...70107b422250

          Andrew Bayer added a comment -

          Thanks, Oleg!

          Andrew Bayer added a comment - Thanks, Oleg!

            oleg_nenashev Oleg Nenashev
            gcummings Geoff Cummings
            Votes:
            9 Vote for this issue
            Watchers:
            18 Start watching this issue

              Created:
              Updated:
              Resolved: