• Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Critical Critical
    • core
    • Jenkins 2.245 or 2.235.3
      Java 1.8.0_171 or 1.8.0_172
    • 2.253

      We are seeing the bug originally described in JENKINS-62181. Remote agents hang on launch. This is using a supposedly patched release.

      In our experience, we are seeing a correct launch of this agent the first time after Jenkins is rebooted. Then, after the agent times out (in-demand delay 1, idle delay 5) and goes down, it cannot be restarting, it deadlocks on launch.

      It hangs here:

      <===[JENKINS REMOTING CAPACITY]===>channel started
      Remoting version: 4.3
      This is a Unix agent

      I don't know how to put in a test for Java deadlocks. Please advise. This is blocking usage.

          [JENKINS-63082] Deadlock launching remote agent

          Jesse Glick added a comment - https://repo.jenkins-ci.org/incrementals/org/jenkins-ci/main/jenkins-war/2.251-rc30290.4454a28de2c1/jenkins-war-2.251-rc30290.4454a28de2c1.war is available for testing.

          Marcus Philip added a comment -

          I have tested the patch and it seems to fix the problem.

          I have relaunched several agents after starting Jenkins master with this patch and it works fine.

          Thanks for the quick response. Hoping to see this soon in a LTS patch.

          Marcus Philip added a comment - I have tested the patch and it seems to fix the problem. I have relaunched several agents after starting Jenkins master with this patch and it works fine. Thanks for the quick response. Hoping to see this soon in a LTS patch.

          Jesse Glick added a comment -

          marcus_phi thanks for testing!

          Jesse Glick added a comment - marcus_phi thanks for testing!

          Is this in 2.251? It is not in the Changelog.

          In any case, it is a huge blocker for us if it is not in any LTS anyhow. When is this coming? After an update we cannot connect to half of our slaves anymore. With exactly this error.

          Julianus Pfeuffer added a comment - Is this in 2.251? It is not in the Changelog. In any case, it is a huge blocker for us if it is not in any LTS anyhow. When is this coming? After an update we cannot connect to half of our slaves anymore. With exactly this error.

          Jesse Glick added a comment -

          jpfeuffer the fix is still under review. You can help by testing the binary linked above and verifying whether it makes the issue go away for you.

          Jesse Glick added a comment - jpfeuffer the fix is still under review. You can help by testing the binary linked above and verifying whether it makes the issue go away for you.

          Yes, it seems to work now. At first, I still had some hiccups (on far fewer slaves) since there were still some (hung up) java processes running on the slaves but after a restart of those, all of them are up again.

          Julianus Pfeuffer added a comment - Yes, it seems to work now. At first, I still had some hiccups (on far fewer slaves) since there were still some (hung up) java processes running on the slaves but after a restart of those, all of them are up again.

          I have upgraded to version 2.253 and we are still experiencing problems with this exact issue. After upgrading and restarting more than half of our nodes deadlock in the manner described above. I have not been able to identify a procedure that will bring them up again dependably. Is anyone else having better luck at this?

          Vegar Andersen added a comment - I have upgraded to version 2.253 and we are still experiencing problems with this exact issue. After upgrading and restarting more than half of our nodes deadlock in the manner described above. I have not been able to identify a procedure that will bring them up again dependably. Is anyone else having better luck at this?

          Jesse Glick added a comment -

          vegar_andersen please attach a thread dump.

          Jesse Glick added a comment - vegar_andersen please attach a thread dump.

          Fredrik de Vibe added a comment - - edited

          jglick I just attached one, it's from the same Jenkins instance that vegar_andersen refers to.

          Fredrik de Vibe added a comment - - edited jglick I just attached one, it's from the same Jenkins instance that vegar_andersen refers to.

          Jesse Glick added a comment -

          f5k vegar_andersen thanks, I have filed your deadlock as JENKINS-63458.

          Jesse Glick added a comment - f5k vegar_andersen thanks, I have filed your deadlock as JENKINS-63458 .

            jglick Jesse Glick
            gehlhaar Dan Gehlhaar
            Votes:
            3 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: