Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-48146

Jenkins slaves won't reconnect to master if the master is killed and restarted.

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Minor Minor
    • kubernetes-plugin
    • None
    • - Operating system: Oracle Linux 7.3
      - JDK: java-1.8.0-openjdk-devel-1:1.8.0
      - Jenkins 2.73.3 and Remoting 3.10.2
      - Kubernetes plugin 1.1
      - Jenkins master as a k8s deployment and slaves running as k8s pods in k8s 1.7.3

      At any time while a job is running and the master pod is killed the new pod will come up and the slaves will not be able to reconnect to it and finish the job. Instead the job times out after 300s and says the executor is assumed to be never coming back. Looking at the Jenkins master logs it appears that during the initialization of the master it tries to provision the slave but fails because it already exists (looking at the source code for provisioning at this link (line 221) it looks like it deletes the slave from the master's list of executor nodes). Then when the slave tries to reconnect once the master is back up the master rejects the slave's connection and the slave pod goes into an error state.

      To reproduce: start a Jenkins job that sleeps 300, then sleeps another 300. During the first sleep, delete the master pod. The job will never get to the second sleep, and will time out and end in FAILURE.

      Interesting note, if Jenkins master is restarted as part of a rolling upgrade of the master deployment, any slaves created from kubernetes plugin 0.11 that hang around while the master updates to the version with 1.1 will reconnect successfully after the restart.

      This issue does not occur when using kubernetes plugin 1.0.

        1. jenkins-console-output.txt
          1 kB
          Laura Owczarski
        2. jenkins-master-log.txt
          11 kB
          Laura Owczarski

            csanchez Carlos Sanchez
            lowcars Laura Owczarski
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: