Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-63731

jcloud plugin does not respect max instances setting

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Component/s: jclouds-plugin
    • Labels:
      None
    • Environment:
    • Similar Issues:

      Description

      I'm using the digitalocean2 backend.   I have set Max no. of instances to 3 however I am watching it spin up 8 instances in response to a job queue.   I'm not sure it has done this all the time but I'm not really sure why it's doing it now.  No errors in jenkins.log.    the attached screenshots illustrate all 8 nodes happily registered, now idle for the next 30 minutes, in direct contradiction to the 3 instances max setting.

       

      in the logs, I can see a whole bunch of messages for the instance cap reached (seven of them).  I would guess that several workers were launched independently, each of them seeing that the instance cap of "3" wasn't reached, yet they were not coordinating with each other.   I think we had this problem w/ the AWS plugin years ago also, but as these nodes represent money I'm spending this is a really critical problem.

       

      2020-09-20 18:16:15.780+0000 [id=105463] INFO o.jclouds.logging.jdk.JDKLogger#logInfo: >> adding node location(nyc1) name(basic-672)
      image(70119657) hardware(c-4)
      2020-09-20 18:16:15.862+0000 [id=105464] INFO o.jclouds.logging.jdk.JDKLogger#logInfo: >> adding node location(nyc1) name(basic-142)
      image(70119657) hardware(c-4)
      2020-09-20 18:16:25.191+0000 [id=38] INFO j.p.jclouds.compute.JCloudsCloud#provision: Instance cap reached while adding capacity for label fast
      2020-09-20 18:16:35.367+0000 [id=34] INFO j.p.jclouds.compute.JCloudsCloud#provision: Instance cap reached while adding capacity for label fast
      2020-09-20 18:16:45.193+0000 [id=32] INFO j.p.jclouds.compute.JCloudsCloud#provision: Instance cap reached while adding capacity for label fast
      2020-09-20 18:16:55.204+0000 [id=37] INFO j.p.jclouds.compute.JCloudsCloud#provision: Instance cap reached while adding capacity for label fast
      2020-09-20 18:17:05.268+0000 [id=39] INFO j.p.jclouds.compute.JCloudsCloud#provision: Instance cap reached while adding capacity for label fast
      2020-09-20 18:17:15.185+0000 [id=32] INFO j.p.jclouds.compute.JCloudsCloud#provision: Instance cap reached while adding capacity for label fast
      2020-09-20 18:17:25.198+0000 [id=39] INFO j.p.jclouds.compute.JCloudsCloud#provision: Instance cap reached while adding capacity for label fast

       

       

        Attachments

          Activity

          Hide
          zzzeek mike bayer added a comment - - edited

          yeah I looked in my logs and can see it often has spun up just the three instances, but occasionally it spun up eight.

          I know why the number is eight. A worker thread in Jenkins says, "let's spin up three instances, since we have none". Then another worker thread says the same thing at the same time. now there are six provisioning. Then a third worker comes in says "let's spin up three instances! " but it sees one of them, so it only hits two.

          I've set the number to two. the theory is now I will get sometimes only two nodes, sometimes five. Or maybe three.

          Show
          zzzeek mike bayer added a comment - - edited yeah I looked in my logs and can see it often has spun up just the three instances, but occasionally it spun up eight. I know why the number is eight. A worker thread in Jenkins says, "let's spin up three instances, since we have none". Then another worker thread says the same thing at the same time. now there are six provisioning. Then a third worker comes in says "let's spin up three instances! " but it sees one of them, so it only hits two. I've set the number to two. the theory is now I will get sometimes only two nodes, sometimes five. Or maybe three.
          Hide
          zzzeek mike bayer added a comment -

          I have tried:

           

          1. upgrade to jenkins 2.249.3 which seems to have made some changes with NodeProvisioner.  did not change behavior.

          2. tried the "delay before spooling up", maybe it would see there are already too many nodes; this timer seems to take place long after it's decided to spin up far too many nodes, did not change behavior

          3. I'm using a label expression.   there seems to be a doubling of the problem because provision is called both for the "null" label as well as my custom "fast" label.   tried setting "only build jobs with label expressions matching this node", jenkins still spins up 2x as many nodes required for the "null" label as well as the "fast" label.   did not change behavior.

           

          When i set max # of instances to 4, and a lot of jobs come in at once, I get as many as 16 nodes all charging me 11 cents an hour.

           

          It certainly looks like the jenkins NodeProvisioner rains a whole bunch of concurrent requests onto JCloudsCloud.provision() and JCloudsCloud makes itself a method-private plannedNodeList() that IMO looks like it should be much more global than that.    getRunningNodesCount() is not returning the number because all the provision() calls are concurrent.   But I haven't dug in looking at thread idents and so forth.      

           

          It would be nice to get a developer to poke in here and suggest that yes, this seems like what would be happening so at least I know if im looking in the right direction.

           

           

           

          Show
          zzzeek mike bayer added a comment - I have tried:   1. upgrade to jenkins 2.249.3 which seems to have made some changes with NodeProvisioner.  did not change behavior. 2. tried the "delay before spooling up", maybe it would see there are already too many nodes; this timer seems to take place long after it's decided to spin up far too many nodes, did not change behavior 3. I'm using a label expression.   there seems to be a doubling of the problem because provision is called both for the "null" label as well as my custom "fast" label.   tried setting "only build jobs with label expressions matching this node", jenkins still spins up 2x as many nodes required for the "null" label as well as the "fast" label.   did not change behavior.   When i set max # of instances to 4, and a lot of jobs come in at once, I get as many as 16 nodes all charging me 11 cents an hour.   It certainly looks like the jenkins NodeProvisioner rains a whole bunch of concurrent requests onto JCloudsCloud.provision() and JCloudsCloud makes itself a method-private plannedNodeList() that IMO looks like it should be much more global than that.    getRunningNodesCount() is not returning the number because all the provision() calls are concurrent.   But I haven't dug in looking at thread idents and so forth.         It would be nice to get a developer to poke in here and suggest that yes, this seems like what would be happening so at least I know if im looking in the right direction.      

            People

            Assignee:
            felfert Fritz Elfert
            Reporter:
            zzzeek mike bayer
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Dates

              Created:
              Updated: