Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-34448

Slave's offlineCause is rewritten during a rare condition



    • Type: Bug
    • Status: Resolved (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Component/s: core
    • Labels:
    • Similar Issues:


      SlaveComputer._connect() can rewrite given offlineCause during a rare use-case - race condition.

      Connect and Disconnect of a slave is executed in a separate threads (due to non-block an UI operation). It is possible to connect slave and during this operation something asks to disconnect it. SlaveComputer._connect() expects if channel is null it means something wrong happened while launching although an exception hadn't thrown. It writes OfflineCause.LaunchFailed cause as a reason for that even it is already given for other reason (e.g. disconnect by CLI). The problematic part of the code is here. If channel is null, there is always stored LaunchFailed cause although the situation is the result of other action (disconnect, temporary offline etc.).

      I'm convinced this part of code should be executed only when we have empty offlineCause here (so nothing recognized a launch issue yet). Also all listeners in the case an offlineCause is given are notified already (closed channel, terminated channel...).

      I've investigated this issue during test coverage of disconnect-node CLI command when I've had a simple scenario:

      DumbSlave slave = j.createSlave("aNode", "", null);
      assertThat(slave.toComputer().isOnline(), equalTo(true));
      assertThat(slave.toComputer().getOfflineCause(), equalTo(null));
      CLICommandInvoker.Result result = command
          authorizedTo(Computer.DISCONNECT, Jenkins.READ)
      assertThat(slave.toComputer().isOffline(), equalTo(true));
      assertThat(slave.toComputer().getOfflineCause() instanceof OfflineCause.ByCLI, equalTo(true));

      where the last assert failed randomly (from 10 to 40 percent of executions) for the reason that offlineCause wasn't a ByCLI but LaunchFailed although slave had been correctly started before and was able to execute a job.


          Issue Links


            pajasoft Pavel Janoušek created issue -
            pajasoft Pavel Janoušek made changes -
            Field Original Value New Value
            Link This issue is blocking JENKINS-34328 [ JENKINS-34328 ]
            pajasoft Pavel Janoušek made changes -
            Resolution Fixed [ 1 ]
            Status Open [ 1 ] Resolved [ 5 ]
            rtyler R. Tyler Croy made changes -
            Workflow JNJira [ 170564 ] JNJira + In-Review [ 198890 ]


              pajasoft Pavel Janoušek
              pajasoft Pavel Janoušek
              0 Vote for this issue
              2 Start watching this issue