Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-47476

Provisioned slaves cannot reconnect following a Jenkins restart

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • kubernetes-plugin
    • kubernetes-plugin 1.1
      jenkins 2.73.2

      If a jenkins master is restarted while a build is running on a provisioned slave, when the master is back online, the connection to the existing slave cannot be made.

      I used the file src/test/resources/org/csanchez/jenkins/plugins/kubernetes/pipeline/runInPodWithRestart.groovy for my pipeline.

      Attached is the pod log from the jnlp container and a snippet from the master's log.

          [JENKINS-47476] Provisioned slaves cannot reconnect following a Jenkins restart

          Scott Hebert created issue -
          Scott Hebert made changes -
          Description Original: If a jenkins master is restarted while a build is running on a provisioned slave, when the master is back online, the connection to the existing slave cannot be made.

          I used the file *src/test/resources/org/csanchez/jenkins/plugins/kubernetes/pipeline/runInPodWithRestart.groovy * for my pipeline.

          Attached is the pod log from the jnlp container.

          New: If a jenkins master is restarted while a build is running on a provisioned slave, when the master is back online, the connection to the existing slave cannot be made.

          I used the file *src/test/resources/org/csanchez/jenkins/plugins/kubernetes/pipeline/runInPodWithRestart.groovy* for my pipeline.

          Attached is the pod log from the jnlp container.

          Scott Hebert made changes -
          Attachment New: master-snippet-log.txt [ 40046 ]
          Scott Hebert made changes -
          Description Original: If a jenkins master is restarted while a build is running on a provisioned slave, when the master is back online, the connection to the existing slave cannot be made.

          I used the file *src/test/resources/org/csanchez/jenkins/plugins/kubernetes/pipeline/runInPodWithRestart.groovy* for my pipeline.

          Attached is the pod log from the jnlp container.

          New: If a jenkins master is restarted while a build is running on a provisioned slave, when the master is back online, the connection to the existing slave cannot be made.

          I used the file *src/test/resources/org/csanchez/jenkins/plugins/kubernetes/pipeline/runInPodWithRestart.groovy* for my pipeline.

          Attached is the pod log from the jnlp container and a snippet from the master's log.

          Scott Hebert added a comment -

          0x89 have you seen this before?

          Scott Hebert added a comment - 0x89 have you seen this before?
          Martin Sander made changes -
          Description Original: If a jenkins master is restarted while a build is running on a provisioned slave, when the master is back online, the connection to the existing slave cannot be made.

          I used the file *src/test/resources/org/csanchez/jenkins/plugins/kubernetes/pipeline/runInPodWithRestart.groovy* for my pipeline.

          Attached is the pod log from the jnlp container and a snippet from the master's log.

          New: If a jenkins master is restarted while a build is running on a provisioned slave, when the master is back online, the connection to the existing slave cannot be made.

          I used the file [{{src/test/resources/org/csanchez/jenkins/plugins/kubernetes/pipeline/runInPodWithRestart.groovy}}|https://github.com/jenkinsci/kubernetes-plugin/blob/ecb8779f09438aef37490ed5cbac58fab7d2ee25/src/test/resources/org/csanchez/jenkins/plugins/kubernetes/pipeline/runInPodWithRestart.groovy] for my pipeline.

          Attached is the pod log from the jnlp container and a snippet from the master's log.

          Martin Sander added a comment -

          scoheb: no, I personally haven't seen this before.

          If I remember correctly, there used to be a disclaimer in README.md that restarting of pipelines after restart still has problems regarding container executions, but maybe I am wrong about that.

          Martin Sander added a comment - scoheb : no, I personally haven't seen this before. If I remember correctly, there used to be a disclaimer in README.md that restarting of pipelines after restart still has problems regarding container executions, but maybe I am wrong about that.

          I'm getting this currently. As far as I can tell, jenkins restarts, the agents are no longer "computers/nodes" in jenkins eyes, so it starts rejecting them from connecting over JNLP, and the container eventually errors.

          I can see errors like "WARNING: safe-exit thread for pod-test-x123b-rknm5 terminated"

          Chance Zibolski added a comment - I'm getting this currently. As far as I can tell, jenkins restarts, the agents are no longer "computers/nodes" in jenkins eyes, so it starts rejecting them from connecting over JNLP, and the container eventually errors. I can see errors like "WARNING: safe-exit thread for pod-test-x123b-rknm5 terminated"

          Chance Zibolski added a comment - - edited

          Also, I found that the pod templates still exist after this, and the plugin tries to reprovision these pods but gets errors that they already exist. Perhaps upon restart, the plugin should look at all it's pod templates, and check if those pods still exist, and re-create them as jenkins nodes so the pods can connect.

          Here's a full example of log messages containing a particular pods name, before and after jenkins restarts:

          https://gist.github.com/chancez/0a3e0e4798d4bb70280d136b7c12f8ec

          Chance Zibolski added a comment - - edited Also, I found that the pod templates still exist after this, and the plugin tries to reprovision these pods but gets errors that they already exist. Perhaps upon restart, the plugin should look at all it's pod templates, and check if those pods still exist, and re-create them as jenkins nodes so the pods can connect. Here's a full example of log messages containing a particular pods name, before and after jenkins restarts: https://gist.github.com/chancez/0a3e0e4798d4bb70280d136b7c12f8ec
          Carlos Sanchez made changes -
          Link New: This issue is duplicated by JENKINS-47561 [ JENKINS-47561 ]

            scoheb Scott Hebert
            scoheb Scott Hebert
            Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: