Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-60476

Jenkins-agent quits when Jenkins controller docker is rebooted, ending the long running task

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • core

      In our setup, we run Jenkins master in Docker and Jenkins slaves in VMs. There's an Nginx frontend to Jenkins master.

      When running a long task (e.g. declarative pipeline with "sleep infinity") with MAX_DURABILITY mode, the slave spawns a shell that waits for end result file to be created, while making `touch` every three seconds to the log file in `durable` folder.

      When the Jenkins master docker container is restarted, the slave tries to reconnect but exits after 5 minutes of Master unavailability. This may have to do with it receiving `503 Service Unavailable` from nginx. This kills the whole spawned processes tree, including the actual task.

      This may have to do with internal slave.jar logic, where it expects the master IP to become unavailable while rebooting, whereas in our situation the IP is available but no one is answering JLNP requests and 503 is returned by nginx.

      The slave is restarted later by the service (or supervisord on some slaves), but the task is already gone, so the actual promise of durability is never delivered on.

      The log is attached, the Jenkins master reboot happened at 21:39 or so, at 21:45 the slave process (pid 9827) has quit, killing the job with it.

      Also, can't find Component for slave/agent, please re-route.

            Unassigned Unassigned
            eplodn1 efo plo
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: