Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-64590

Nodes lose reference to computer upon reloading config from disk / plugin install

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not A Defect
    • Icon: Major Major
    • core
    • Jenkins: 2.263.1
      OS: EC2 Linux
      Java:
      openjdk version "1.8.0_265"
      OpenJDK Runtime Environment (build 1.8.0_265-b01)
      OpenJDK 64-Bit Server VM (build 25.265-b01, mixed mode)

      Hello,

      I help develop a plugin for Jenkins.

      We have noticed that at least when clicking "Reload Configuration From Disk", all connected and idle nodes at that time lose reference to their corresponding `Computer` object. Connected nodes that are busy at the time of restart, will continue processing their builds/jobs after it, and then promptly lose reference to their `Computer` objects.

      Upon inspecting `Jenkings.get().getNode("NodeName").getComputer()`, I see`null`. however, assuming the corresponding computer is at index `1`,  `Arrays.asList(Jenkins.get().getComputers()).get(1).getNode()` will return the supposed `Node` for the given `Computer` object.

      All nodes appear as online, but they will gradually start failing their builds instantly with a message like this:

       

      13:26:00 ERROR: i-xxx is offline
      13:26:00 ERROR: Issue with creating launcher for agent i-xxx. Computer has been disconnected
      

      This has a devastating effect, as Jenkins will keep sending jobs to this non-functioning agent. The agent (as explained) isn't aware of its computer, even though the computer is aware of its agent.

       

      If Jenkins is then restarted, say with `sudo service jenkins restart`, all malfunctioning nodes start working again. calling `get/toComputer()` on them returns the corresponding `Computer` object, and their builds pass. However, we'd like not to rely on manually restarting after "Reload Configuration From Disk" is hit (we have reason to believe this isn't the only case where this "de-attaching" behavior is happening).

      Maybe this is expected behavior, but I have very little reason to believe so.

       

      How to reproduce:

      1. Launch JNLP/SSH agents, preferably more than 10 for a better indication
      2. Start filling up queue
      3. At one point, some nodes will be busy, and some will be idle
      4. Go to Jenkins settings, hit "Reload Configuration from Disk"
      5. Nodes that were idle will immediately start failing builds
      6. Nodes that weren't idle will start failing their builds after finishing their build before the restart
      7. All nodes will be marked as online, Jenkins will keep sending jobs to those nodes.

       

       

       

       

            Unassigned Unassigned
            shxkm SHxKM
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: