Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-63731

jcloud plugin does not respect max instances setting

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • jclouds-plugin
    • None

      I'm using the digitalocean2 backend.   I have set Max no. of instances to 3 however I am watching it spin up 8 instances in response to a job queue.   I'm not sure it has done this all the time but I'm not really sure why it's doing it now.  No errors in jenkins.log.    the attached screenshots illustrate all 8 nodes happily registered, now idle for the next 30 minutes, in direct contradiction to the 3 instances max setting.

       

      in the logs, I can see a whole bunch of messages for the instance cap reached (seven of them).  I would guess that several workers were launched independently, each of them seeing that the instance cap of "3" wasn't reached, yet they were not coordinating with each other.   I think we had this problem w/ the AWS plugin years ago also, but as these nodes represent money I'm spending this is a really critical problem.

       

      2020-09-20 18:16:15.780+0000 [id=105463] INFO o.jclouds.logging.jdk.JDKLogger#logInfo: >> adding node location(nyc1) name(basic-672)
      image(70119657) hardware(c-4)
      2020-09-20 18:16:15.862+0000 [id=105464] INFO o.jclouds.logging.jdk.JDKLogger#logInfo: >> adding node location(nyc1) name(basic-142)
      image(70119657) hardware(c-4)
      2020-09-20 18:16:25.191+0000 [id=38] INFO j.p.jclouds.compute.JCloudsCloud#provision: Instance cap reached while adding capacity for label fast
      2020-09-20 18:16:35.367+0000 [id=34] INFO j.p.jclouds.compute.JCloudsCloud#provision: Instance cap reached while adding capacity for label fast
      2020-09-20 18:16:45.193+0000 [id=32] INFO j.p.jclouds.compute.JCloudsCloud#provision: Instance cap reached while adding capacity for label fast
      2020-09-20 18:16:55.204+0000 [id=37] INFO j.p.jclouds.compute.JCloudsCloud#provision: Instance cap reached while adding capacity for label fast
      2020-09-20 18:17:05.268+0000 [id=39] INFO j.p.jclouds.compute.JCloudsCloud#provision: Instance cap reached while adding capacity for label fast
      2020-09-20 18:17:15.185+0000 [id=32] INFO j.p.jclouds.compute.JCloudsCloud#provision: Instance cap reached while adding capacity for label fast
      2020-09-20 18:17:25.198+0000 [id=39] INFO j.p.jclouds.compute.JCloudsCloud#provision: Instance cap reached while adding capacity for label fast

       

       

          [JENKINS-63731] jcloud plugin does not respect max instances setting

          mike bayer added a comment - - edited

          yeah I looked in my logs and can see it often has spun up just the three instances, but occasionally it spun up eight.

          I know why the number is eight. A worker thread in Jenkins says, "let's spin up three instances, since we have none". Then another worker thread says the same thing at the same time. now there are six provisioning. Then a third worker comes in says "let's spin up three instances! " but it sees one of them, so it only hits two.

          I've set the number to two. the theory is now I will get sometimes only two nodes, sometimes five. Or maybe three.

          mike bayer added a comment - - edited yeah I looked in my logs and can see it often has spun up just the three instances, but occasionally it spun up eight. I know why the number is eight. A worker thread in Jenkins says, "let's spin up three instances, since we have none". Then another worker thread says the same thing at the same time. now there are six provisioning. Then a third worker comes in says "let's spin up three instances! " but it sees one of them, so it only hits two. I've set the number to two. the theory is now I will get sometimes only two nodes, sometimes five. Or maybe three.

          mike bayer added a comment -

          I have tried:

           

          1. upgrade to jenkins 2.249.3 which seems to have made some changes with NodeProvisioner.  did not change behavior.

          2. tried the "delay before spooling up", maybe it would see there are already too many nodes; this timer seems to take place long after it's decided to spin up far too many nodes, did not change behavior

          3. I'm using a label expression.   there seems to be a doubling of the problem because provision is called both for the "null" label as well as my custom "fast" label.   tried setting "only build jobs with label expressions matching this node", jenkins still spins up 2x as many nodes required for the "null" label as well as the "fast" label.   did not change behavior.

           

          When i set max # of instances to 4, and a lot of jobs come in at once, I get as many as 16 nodes all charging me 11 cents an hour.

           

          It certainly looks like the jenkins NodeProvisioner rains a whole bunch of concurrent requests onto JCloudsCloud.provision() and JCloudsCloud makes itself a method-private plannedNodeList() that IMO looks like it should be much more global than that.    getRunningNodesCount() is not returning the number because all the provision() calls are concurrent.   But I haven't dug in looking at thread idents and so forth.      

           

          It would be nice to get a developer to poke in here and suggest that yes, this seems like what would be happening so at least I know if im looking in the right direction.

           

           

           

          mike bayer added a comment - I have tried:   1. upgrade to jenkins 2.249.3 which seems to have made some changes with NodeProvisioner.  did not change behavior. 2. tried the "delay before spooling up", maybe it would see there are already too many nodes; this timer seems to take place long after it's decided to spin up far too many nodes, did not change behavior 3. I'm using a label expression.   there seems to be a doubling of the problem because provision is called both for the "null" label as well as my custom "fast" label.   tried setting "only build jobs with label expressions matching this node", jenkins still spins up 2x as many nodes required for the "null" label as well as the "fast" label.   did not change behavior.   When i set max # of instances to 4, and a lot of jobs come in at once, I get as many as 16 nodes all charging me 11 cents an hour.   It certainly looks like the jenkins NodeProvisioner rains a whole bunch of concurrent requests onto JCloudsCloud.provision() and JCloudsCloud makes itself a method-private plannedNodeList() that IMO looks like it should be much more global than that.    getRunningNodesCount() is not returning the number because all the provision() calls are concurrent.   But I haven't dug in looking at thread idents and so forth.         It would be nice to get a developer to poke in here and suggest that yes, this seems like what would be happening so at least I know if im looking in the right direction.      

          Fritz Elfert added a comment -

          Sorry for the very late reply. You are looking in the right direction. Does this is still relevant? I yes, I will look what I can do about that.

          Fritz Elfert added a comment - Sorry for the very late reply. You are looking in the right direction. Does this is still relevant? I yes, I will look what I can do about that.

          mike bayer added a comment -

          This is still relevant because while I had to migrate to using local machines here instead of cloud machines due to this issue, if this could be fixed then I'd have the option to move back to cloud servers.   I was not able to figure out where to look for this and it looks like some kind of architectural-level approach to concurrency is the source of the issue

          mike bayer added a comment - This is still relevant because while I had to migrate to using local machines here instead of cloud machines due to this issue, if this could be fixed then I'd have the option to move back to cloud servers.   I was not able to figure out where to look for this and it looks like some kind of architectural-level approach to concurrency is the source of the issue

          Fritz Elfert added a comment -

          Ok, bad news:

          In preparation for work on this, I reactivated my account on DigitalOcean and attempted a few tests.

          However these failed immediately with a NumberFormatException. Turns out, that DigitalOcean was a bit vague

          in their specification about Droplet-Ids. Their developer-doc says that this is an integer without actually defining the maximum value.

          The jclouds developers who implemented the digitalocean2 module, used a java object of type Integer (which has 32bits).

          However the actual numbers returned by the DigitalOcian REST Api don't fit into 32bits in jclouds anymore.

          So: At the moment, jclouds digitalocean2 module is completely useless.

          I'll investigate more, report a bug in their bugtracker and probably will have to contribute a fix to jclouds myself.

           

          Your problem can be adressed before that, but not with DigitalOcean at the moment. I will attemt to solve this in a generic way.

          Fritz Elfert added a comment - Ok, bad news: In preparation for work on this, I reactivated my account on DigitalOcean and attempted a few tests. However these failed immediately with a NumberFormatException. Turns out, that DigitalOcean was a bit vague in their specification about Droplet-Ids. Their developer-doc says that this is an integer without actually defining the maximum value. The jclouds developers who implemented the digitalocean2 module, used a java object of type Integer (which has 32bits). However the actual numbers returned by the DigitalOcian REST Api don't fit into 32bits in jclouds anymore. So: At the moment, jclouds digitalocean2 module is completely useless. I'll investigate more, report a bug in their bugtracker and probably will have to contribute a fix to jclouds myself.   Your problem can be adressed before that, but not with DigitalOcean at the moment. I will attemt to solve this in a generic way.

            felfert Fritz Elfert
            zzzeek mike bayer
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: