Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-26020

Will not start builds even though there are available slots on executor

    • Icon: Bug Bug
    • Resolution: Incomplete
    • Icon: Critical Critical
    • core
    • None
    • LTS 1.580.1

      Sometimes our nodes won't be able to start new builds even though there are free slots available.

      A workaround for the slaves is to disconnect/connect the slave and it will start to schedule builds again.

      I have observed that when this happens for a slave the slave has fewer threads ongoing than an idle slave.

      Attaching thread dumps when this happens and after doing an disconnect/connect.

      We have seen this issue both on Windows(jlnp) slaves and linux(ssh) slaves as well as on the master node which is running linux.

          [JENKINS-26020] Will not start builds even though there are available slots on executor

          Christian Bremer added a comment - 200$ is up for grabs for this issue at: https://freedomsponsors.org/issue/598/will-not-start-builds-even-though-there-are-available-slots-on-executor

          Daniel Beck added a comment -

          Any interesting errors getting logged?

          Daniel Beck added a comment - Any interesting errors getting logged?

          We get ~5000 JnlpSlaveHandshake errors per hour:

          Dec 15, 2014 8:40:10 AM jenkins.slaves.JnlpSlaveHandshake error
          WARNING: TCP slave agent connection handler #150398 with /10.33.21.14:62740 is aborted: generic_AESL-JENKINS07 is already connected to this master. Rejecting this connection.

          We get these errors at all times, also when we can schedule on all slaves.

          Christian Bremer added a comment - We get ~5000 JnlpSlaveHandshake errors per hour: Dec 15, 2014 8:40:10 AM jenkins.slaves.JnlpSlaveHandshake error WARNING: TCP slave agent connection handler #150398 with /10.33.21.14:62740 is aborted: generic_AESL-JENKINS07 is already connected to this master. Rejecting this connection. We get these errors at all times, also when we can schedule on all slaves.

          Other than that I see no errors in the log that occurs when it fails to schedule on a node although I might have missed it since our logs are flooded.

          Christian Bremer added a comment - Other than that I see no errors in the log that occurs when it fails to schedule on a node although I might have missed it since our logs are flooded.

          Oleg Nenashev added a comment -

          > Dec 15, 2014 8:40:10 AM jenkins.slaves.JnlpSlaveHandshake error
          WARNING: TCP slave agent connection handler #150398 with /10.33.21.14:62740 is aborted: generic_AESL-JENKINS07 is already connected to this master. Rejecting this connection.

          It seems to be unrelated. Such issue usually happens when you have two jenkins-slave processes. On Windows machines it rarely happens on improper service termination, etc. You can also configure Jenkins slave to have a bigger reconnect attempt interval.

          Oleg Nenashev added a comment - > Dec 15, 2014 8:40:10 AM jenkins.slaves.JnlpSlaveHandshake error WARNING: TCP slave agent connection handler #150398 with /10.33.21.14:62740 is aborted: generic_AESL-JENKINS07 is already connected to this master. Rejecting this connection. It seems to be unrelated. Such issue usually happens when you have two jenkins-slave processes. On Windows machines it rarely happens on improper service termination, etc. You can also configure Jenkins slave to have a bigger reconnect attempt interval.

          Oleg Nenashev added a comment - - edited

          Windows service issue should be fixed by JENKINS-39231. I do not see anything else we can diagnose here

          Oleg Nenashev added a comment - - edited Windows service issue should be fixed by JENKINS-39231 . I do not see anything else we can diagnose here

          Oleg Nenashev added a comment -

          I see no way to proceed with this issue without more info. There is also no other commenters/voters. So I'm closing it as Incomplete, feel free to reopen it if you have additional info

          Oleg Nenashev added a comment - I see no way to proceed with this issue without more info. There is also no other commenters/voters. So I'm closing it as Incomplete, feel free to reopen it if you have additional info

            Unassigned Unassigned
            ki82 Christian Bremer
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: