Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-26020

Will not start builds even though there are available slots on executor

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved (View Workflow)
    • Critical
    • Resolution: Incomplete
    • core
    • None
    • LTS 1.580.1

    Description

      Sometimes our nodes won't be able to start new builds even though there are free slots available.

      A workaround for the slaves is to disconnect/connect the slave and it will start to schedule builds again.

      I have observed that when this happens for a slave the slave has fewer threads ongoing than an idle slave.

      Attaching thread dumps when this happens and after doing an disconnect/connect.

      We have seen this issue both on Windows(jlnp) slaves and linux(ssh) slaves as well as on the master node which is running linux.

      Attachments

        Issue Links

          Activity

            ki82 Christian Bremer added a comment - 200$ is up for grabs for this issue at: https://freedomsponsors.org/issue/598/will-not-start-builds-even-though-there-are-available-slots-on-executor
            danielbeck Daniel Beck added a comment -

            Any interesting errors getting logged?

            danielbeck Daniel Beck added a comment - Any interesting errors getting logged?

            We get ~5000 JnlpSlaveHandshake errors per hour:

            Dec 15, 2014 8:40:10 AM jenkins.slaves.JnlpSlaveHandshake error
            WARNING: TCP slave agent connection handler #150398 with /10.33.21.14:62740 is aborted: generic_AESL-JENKINS07 is already connected to this master. Rejecting this connection.

            We get these errors at all times, also when we can schedule on all slaves.

            ki82 Christian Bremer added a comment - We get ~5000 JnlpSlaveHandshake errors per hour: Dec 15, 2014 8:40:10 AM jenkins.slaves.JnlpSlaveHandshake error WARNING: TCP slave agent connection handler #150398 with /10.33.21.14:62740 is aborted: generic_AESL-JENKINS07 is already connected to this master. Rejecting this connection. We get these errors at all times, also when we can schedule on all slaves.

            Other than that I see no errors in the log that occurs when it fails to schedule on a node although I might have missed it since our logs are flooded.

            ki82 Christian Bremer added a comment - Other than that I see no errors in the log that occurs when it fails to schedule on a node although I might have missed it since our logs are flooded.
            oleg_nenashev Oleg Nenashev added a comment -

            > Dec 15, 2014 8:40:10 AM jenkins.slaves.JnlpSlaveHandshake error
            WARNING: TCP slave agent connection handler #150398 with /10.33.21.14:62740 is aborted: generic_AESL-JENKINS07 is already connected to this master. Rejecting this connection.

            It seems to be unrelated. Such issue usually happens when you have two jenkins-slave processes. On Windows machines it rarely happens on improper service termination, etc. You can also configure Jenkins slave to have a bigger reconnect attempt interval.

            oleg_nenashev Oleg Nenashev added a comment - > Dec 15, 2014 8:40:10 AM jenkins.slaves.JnlpSlaveHandshake error WARNING: TCP slave agent connection handler #150398 with /10.33.21.14:62740 is aborted: generic_AESL-JENKINS07 is already connected to this master. Rejecting this connection. It seems to be unrelated. Such issue usually happens when you have two jenkins-slave processes. On Windows machines it rarely happens on improper service termination, etc. You can also configure Jenkins slave to have a bigger reconnect attempt interval.
            oleg_nenashev Oleg Nenashev added a comment - - edited

            Windows service issue should be fixed by JENKINS-39231. I do not see anything else we can diagnose here

            oleg_nenashev Oleg Nenashev added a comment - - edited Windows service issue should be fixed by JENKINS-39231 . I do not see anything else we can diagnose here
            oleg_nenashev Oleg Nenashev added a comment -

            I see no way to proceed with this issue without more info. There is also no other commenters/voters. So I'm closing it as Incomplete, feel free to reopen it if you have additional info

            oleg_nenashev Oleg Nenashev added a comment - I see no way to proceed with this issue without more info. There is also no other commenters/voters. So I'm closing it as Incomplete, feel free to reopen it if you have additional info

            People

              Unassigned Unassigned
              ki82 Christian Bremer
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: