Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-47561

Pipelines wait indefinitely for kubernetes slaves to come back online

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Minor Minor
    • kubernetes-plugin
    • None

      Our situation

      We run Jenkins Master and ephemeral slaves in kubernetes clusters and love it. However we quite regularly run into stalled pipelines that wait indefinitely for a kubernetes ephemeral slave to come back online, which will never happen. I have no 100% reproducible steps because this does not happen every time. Instead it happens on average once out of 15 slaves. A slave will inexplicably disconnect/crash/be-decommissioned and the Jenkins pipeline using it thinks, "Ok, durable task means I wait for the slave that just went offline to come back online... forever." And so it does. :/

      Someone else has opened a ticket with Jenkins Master team requesting an option to flag a job/task as not durable, however they do make clear that this behavior is something that needs to be addressed by the plugin as well.

      The Story "Requirement"

      This may be a bug or a feature/improvement request, depending on how your team has intended the plugin to function. I have flagged this as a bug because I do not believe this behavior is intended.

      Kubernetes slaves should fail a pipeline step/task that is running on it whenever the slave goes offline.

          [JENKINS-47561] Pipelines wait indefinitely for kubernetes slaves to come back online

          Sam Beckwith III created issue -

          This may be a duplicate. I searched many different ways through all the jiras on this plugin (regardless of status) and did not find this particular issue.

          We very much like the kubernetes plugin, so thank you very much for making it publicly available.

          Sam Beckwith III added a comment - This may be a duplicate. I searched many different ways through all the jiras on this plugin (regardless of status) and did not find this particular issue. We very much like the kubernetes plugin, so thank you very much for making it publicly available.
          Carlos Sanchez made changes -
          Link New: This issue duplicates JENKINS-47476 [ JENKINS-47476 ]

          possibly the cause is a duplicate of JENKINS-47476

          Carlos Sanchez added a comment - possibly the cause is a duplicate of JENKINS-47476

          I don't see this problem with the latest versions of the plugin. Reopen if that is not the case

          Carlos Sanchez added a comment - I don't see this problem with the latest versions of the plugin. Reopen if that is not the case
          Carlos Sanchez made changes -
          Resolution New: Cannot Reproduce [ 5 ]
          Status Original: Open [ 1 ] New: Resolved [ 5 ]

          Rishi Thakkar added a comment -

          I still see this issue in the newest version of the plugin.

           

          The pod runs out of ephemeral storage and the JNLP agent dies. Then, the build step gets stuck indefinitely.

          Rishi Thakkar added a comment - I still see this issue in the newest version of the plugin.   The pod runs out of ephemeral storage and the JNLP agent dies. Then, the build step gets stuck indefinitely.
          Rishi Thakkar made changes -
          Resolution Original: Cannot Reproduce [ 5 ]
          Status Original: Resolved [ 5 ] New: Reopened [ 4 ]

          Arun Kaushik added a comment -

          We are using 1.14.9 version on kubernetes-plugin and still facing this issue. We pass CPU/Memory limits in container template and when those limits are reached, the container is killed, leaving the build in stalled state for ever. Seems like Jenkins master never gets to know that slave is decommissioned intentionally and it has to move on and fail the build. 

          Arun Kaushik added a comment - We are using 1.14.9 version on kubernetes-plugin  and still facing this issue. We pass CPU/Memory limits in container template and when those limits are reached, the container is killed, leaving the build in stalled state for ever. Seems like Jenkins master never gets to know that slave is decommissioned intentionally and it has to move on and fail the build. 
          Vincent Latombe made changes -
          Assignee Original: Carlos Sanchez [ csanchez ]

            Unassigned Unassigned
            sbeckwithiii Sam Beckwith III
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: