Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-66484

Concurrency limit drifts after a while

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      We recently upgraded our Jenkins instances to use kubernetes-plugin 1.30.1 (from 1.27.8). We have a concurrency limit set to 2 on our kubernetes cloud. As expected, two agents can spawn concurrently on Jenkins startup. This proper behavior continues for some time. But after a random number of days, only a single agent is spawn at a time. Concurrent jobs are added to the queue until the single agent is teared down. It is as if the concurrency limit was set to 1. Increasing it to 3 lets the plugin spawn 2 agents concurrently. 

      I've created a FINEST logger on org.csanchez.jenkins.plugins.kubernetes.KubernetesProvisioningLimits and I noticed that when the issue happen, the kubernetes global limit never goes back to 0/2. It stays to 1/2 even when no job is running (or 1/3 if I increase the concurrency limit to 3).

      My guess is that https://github.com/jenkinsci/kubernetes-plugin/pull/939 has a race condition and that, on specific timing, the global count https://github.com/jenkinsci/kubernetes-plugin/pull/939/files#diff-4877a6b83daf403574dc28dca505926a6b3ad326b84f891f278a4424a68f4b84R103 is not properly decreased.

      I'll try to write a test case that shows up the issue.  

        Attachments

          Activity

          Hide
          mbarbero Mikaël Barbero added a comment -

          I did not manage so far to modify https://github.com/jenkinsci/kubernetes-plugin/blob/master/src/test/java/org/csanchez/jenkins/plugins/kubernetes/KubernetesProvisioningLimitsTest.java to reproduce the issue, even though the logger in the ticket's description points to an issue in the org.csanchez.jenkins.plugins.kubernetes.KubernetesProvisioningLimits class.

          Any idea?

          Show
          mbarbero Mikaël Barbero added a comment - I did not manage so far to modify https://github.com/jenkinsci/kubernetes-plugin/blob/master/src/test/java/org/csanchez/jenkins/plugins/kubernetes/KubernetesProvisioningLimitsTest.java  to reproduce the issue, even though the logger in the ticket's description points to an issue in the org.csanchez.jenkins.plugins.kubernetes.KubernetesProvisioningLimits class. Any idea?
          Hide
          mbarbero Mikaël Barbero added a comment -

          We faced the issue again yesterday on another instance. This time, the "race condition" to have been triggered at startup (the instance have been restarted while the job queue was not empty).

          I've pushed a PR with my code https://github.com/jenkinsci/kubernetes-plugin/pull/1028 which sanitize the concurrency lock mechanism.

          Hope it will trigger some discussions.

          Show
          mbarbero Mikaël Barbero added a comment - We faced the issue again yesterday on another instance. This time, the "race condition" to have been triggered at startup (the instance have been restarted while the job queue was not empty). I've pushed a PR with my code https://github.com/jenkinsci/kubernetes-plugin/pull/1028  which sanitize the concurrency lock mechanism. Hope it will trigger some discussions.
          Hide
          mbarbero Mikaël Barbero added a comment -

          I've updated the PR, it now passes all checks (there was a problem on JDK8 with an early patch).

          It's a really annoying behavior, is there anything else we could do to help moving forward with this? Thanks!

          Show
          mbarbero Mikaël Barbero added a comment - I've updated the PR, it now passes all checks (there was a problem on JDK8 with an early patch). It's a really annoying behavior, is there anything else we could do to help moving forward with this? Thanks!
          Hide
          mbarbero Mikaël Barbero added a comment -

          At least 3 of our instances have been hit by this one (see https://bugs.eclipse.org/bugs/show_bug.cgi?id=575285). 

          Please let me know if there is anything we can do to get some traction on this issue. Thanks!

          Show
          mbarbero Mikaël Barbero added a comment - At least 3 of our instances have been hit by this one (see https://bugs.eclipse.org/bugs/show_bug.cgi?id=575285).   Please let me know if there is anything we can do to get some traction on this issue. Thanks!

            People

            Assignee:
            Unassigned Unassigned
            Reporter:
            mbarbero Mikaël Barbero
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated: