Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-70334

When TcpSlaveAgentListener dies it is not restarted

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • core
    • core:2.375.1
    • 2.388

      When the TCP Agent listener crashes, it prints:

      2022-12-22 01:29:20.541+0000 [id=632]	WARNING	hudson.TcpSlaveAgentListener$1#run: Connection handler failed, restarting listener
      

      However, it never seems to be restarted.

      How to Reproduce

      (thanks duemir for providing those steps)

      It can be reproduced with a debugger (at least an IntelliJ one)

      • Start a 2.375.1 Jenkins instance with debugger enabled
      • Prepare the IDE
      • Checkout tag jenkins-2.375.1 from the jenkinsci/jenkins repo
      • Open hudson.TcpSlaveAgentListener
      • Set breakpoint somewhere in the run method of the ConnectionHandler (line 279)
      • Enable the TCP port
      • Set up an inbound agent and test that it connects
      • Connect the debugger to the controller
      • Use Throw an exception to throw some exception that is not handled in the run, e.g. new IllegalStateException("BOOM")

      As a result, something similar to the lines below should be printed in the controller logs

      2022-12-22 01:29:20.540+0000 [id=632]	SEVERE	h.TcpSlaveAgentListener$ConnectionHandler#lambda$new$0: Uncaught exception in TcpSlaveAgentListener ConnectionHandler Thread[TCP agent connection handler #6 with /127.0.0.1:61392,5,main]
      java.lang.IllegalStateException: BOOM
      	at hudson.TcpSlaveAgentListener$ConnectionHandler.run(TcpSlaveAgentListener.java:280)
      2022-12-22 01:29:20.541+0000 [id=632]	WARNING	hudson.TcpSlaveAgentListener$1#run: Connection handler failed, restarting listener
      java.lang.IllegalStateException: BOOM
      	at hudson.TcpSlaveAgentListener$ConnectionHandler.run(TcpSlaveAgentListener.java:280)
      

      May need to fiddle a bit with a breakpoint (it didn't work the first time for me for some reason. I ended up with a breakpoint that only suspended a Thread, and I had to do the "Throw exception" action twice)

      Expected: As the log says, the TCP Agent listener is restarted after the crash
      Actual: It is not. The Port is not up, and agents cannot connect.

      The workaround is to disable the port and then enable it again.

          [JENKINS-70334] When TcpSlaveAgentListener dies it is not restarted

          Right. I am not referring to the exception used to reproduce the problem. This reproduction script was used initially to reproduce the problem because anything that wasn't an IOException or InterruptedException was uncaught... Now we catch Throwable so we should not except to reach uncaught exception..
          If the thread died, there should be logs and/or a stacktrace mentioning TcpSlaveAgentListener in the log.

          Allan BURDAJEWICZ added a comment - Right. I am not referring to the exception used to reproduce the problem. This reproduction script was used initially to reproduce the problem because anything that wasn't an IOException or InterruptedException was uncaught... Now we catch Throwable so we should not except to reach uncaught exception.. If the thread died, there should be logs and/or a stacktrace mentioning TcpSlaveAgentListener in the log.

          I believe that the stack trace is essentially the same as in JENKINS-59910. Nevertheless, I made a copy of the stack trace to paste it here:
          Please note that this stack trace is taken from Jenkins 2.375.3.

          Uncaught exception in TcpSlaveAgentListener ConnectionHandler Thread[TCP agent connection handler #2795266 with /10.162.132.16:60750,5,main]
          java.lang.UnsupportedOperationException: Network layer is not supposed to call isSendOpen
          	at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.isSendOpen(ProtocolStack.java:739)
          	at org.jenkinsci.remoting.protocol.FilterLayer.isSendOpen(FilterLayer.java:343)
          	at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.isSendOpen(ProtocolStack.java:747)
          	at org.jenkinsci.remoting.protocol.FilterLayer.isSendOpen(FilterLayer.java:343)
          	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.isSendOpen(SSLEngineFilterLayer.java:233)
          	at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.doSend(ProtocolStack.java:699)
          	at org.jenkinsci.remoting.protocol.impl.ConnectionHeadersFilterLayer.doSend(ConnectionHeadersFilterLayer.java:474)
          	at org.jenkinsci.remoting.protocol.impl.ConnectionHeadersFilterLayer.start(ConnectionHeadersFilterLayer.java:138)
          	at org.jenkinsci.remoting.protocol.ProtocolStack.init(ProtocolStack.java:209)
          	at org.jenkinsci.remoting.protocol.ProtocolStack$Builder.build(ProtocolStack.java:563)
          	at org.jenkinsci.remoting.engine.JnlpProtocol4Handler.handle(JnlpProtocol4Handler.java:156)
          	at jenkins.slaves.JnlpSlaveAgentProtocol4.handle(JnlpSlaveAgentProtocol4.java:196)
          	at hudson.TcpSlaveAgentListener$ConnectionHandler.run(TcpSlaveAgentListener.java:282)
          

          I am not sure what has been fixed in the PR. I would have expected that the tread would be restarted no matter which exception killed the thread?

          Joerg Schwaerzler added a comment - I believe that the stack trace is essentially the same as in JENKINS-59910 . Nevertheless, I made a copy of the stack trace to paste it here: Please note that this stack trace is taken from Jenkins 2.375.3. Uncaught exception in TcpSlaveAgentListener ConnectionHandler Thread [TCP agent connection handler #2795266 with /10.162.132.16:60750,5,main] java.lang.UnsupportedOperationException: Network layer is not supposed to call isSendOpen at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.isSendOpen(ProtocolStack.java:739) at org.jenkinsci.remoting.protocol.FilterLayer.isSendOpen(FilterLayer.java:343) at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.isSendOpen(ProtocolStack.java:747) at org.jenkinsci.remoting.protocol.FilterLayer.isSendOpen(FilterLayer.java:343) at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.isSendOpen(SSLEngineFilterLayer.java:233) at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.doSend(ProtocolStack.java:699) at org.jenkinsci.remoting.protocol.impl.ConnectionHeadersFilterLayer.doSend(ConnectionHeadersFilterLayer.java:474) at org.jenkinsci.remoting.protocol.impl.ConnectionHeadersFilterLayer.start(ConnectionHeadersFilterLayer.java:138) at org.jenkinsci.remoting.protocol.ProtocolStack.init(ProtocolStack.java:209) at org.jenkinsci.remoting.protocol.ProtocolStack$Builder.build(ProtocolStack.java:563) at org.jenkinsci.remoting.engine.JnlpProtocol4Handler.handle(JnlpProtocol4Handler.java:156) at jenkins.slaves.JnlpSlaveAgentProtocol4.handle(JnlpSlaveAgentProtocol4.java:196) at hudson.TcpSlaveAgentListener$ConnectionHandler.run(TcpSlaveAgentListener.java:282) I am not sure what has been fixed in the PR. I would have expected that the tread would be restarted no matter which exception killed the thread?

          We really need to see what happens in 2.387.1+. The isSendOpen may still happen but it will not be an "Uncaught exception" and the TCP Agent Listener socket will be restarted.

          Allan BURDAJEWICZ added a comment - We really need to see what happens in 2.387.1+. The isSendOpen may still happen but it will not be an "Uncaught exception" and the TCP Agent Listener socket will be restarted.

          I see.
          However I was wondering why we would not want to have the thread restarted in the case of an "Uncaught exception"?
          I have to admin that for me it was not easy to follow all the discussions in the PR.
          Does it mean that with 2.387.1+ there will be no more uncaught exceptions in the TcpSlaveAgentListener thread?

          I might have found a way to reproduce the isSendOpen exception. Will have to check. It could be related to a java8 JNLP Kubernetes client trying to connect to the Java11 Jenkins master.

          Joerg Schwaerzler added a comment - I see. However I was wondering why we would not want to have the thread restarted in the case of an "Uncaught exception"? I have to admin that for me it was not easy to follow all the discussions in the PR. Does it mean that with 2.387.1+ there will be no more uncaught exceptions in the TcpSlaveAgentListener thread? I might have found a way to reproduce the isSendOpen exception. Will have to check. It could be related to a java8 JNLP Kubernetes client trying to connect to the Java11 Jenkins master.

          allan_burdajewicz As I am currently not able to see the isSendOpen exception on 2.378.2: Could it be that because of the change the exception may no longer appear in the logs as it has been caught earlier?

          Joerg Schwaerzler added a comment - allan_burdajewicz As I am currently not able to see the isSendOpen exception on 2.378.2: Could it be that because of the change the exception may no longer appear in the logs as it has been caught earlier?

          > Does it mean that with 2.387.1+ there will be no more uncaught exceptions in the TcpSlaveAgentListener thread?

          Kind of. Giving that we catch Throwable, we don't except to face uncaught exceptions.. I am not sure in what scenario in Java we would still get there.. basil maybe have an answer to that.

          > As I am currently not able to see the isSendOpen exception on 2.378.2: Could it be that because of the change the exception may no longer appear in the logs as it has been caught earlier?

          It should still be logged in the catch methods. The exception was usually happening in the ConnectionHandler:

          In rare case where it might happen in the Agent Listener thread itself:

          There is one case where it would not be logged and that is when the Agent Listener is being shutdown (that would only happen when Jenkins is shutting down or when you are changing the port through the UI):

          Allan BURDAJEWICZ added a comment - > Does it mean that with 2.387.1+ there will be no more uncaught exceptions in the TcpSlaveAgentListener thread? Kind of. Giving that we catch Throwable, we don't except to face uncaught exceptions.. I am not sure in what scenario in Java we would still get there.. basil maybe have an answer to that. > As I am currently not able to see the isSendOpen exception on 2.378.2: Could it be that because of the change the exception may no longer appear in the logs as it has been caught earlier? It should still be logged in the catch methods. The exception was usually happening in the ConnectionHandler: https://github.com/jenkinsci/jenkins/blob/jenkins-2.387.2/core/src/main/java/hudson/TcpSlaveAgentListener.java#L287 https://github.com/jenkinsci/jenkins/blob/jenkins-2.387.2/core/src/main/java/hudson/TcpSlaveAgentListener.java#L294-L298 In rare case where it might happen in the Agent Listener thread itself: https://github.com/jenkinsci/jenkins/blob/jenkins-2.387.2/core/src/main/java/hudson/TcpSlaveAgentListener.java#L192-L201 There is one case where it would not be logged and that is when the Agent Listener is being shutdown (that would only happen when Jenkins is shutting down or when you are changing the port through the UI): https://github.com/jenkinsci/jenkins/blob/jenkins-2.387.2/core/src/main/java/hudson/TcpSlaveAgentListener.java#L191 https://github.com/jenkinsci/jenkins/blob/jenkins-2.387.2/core/src/main/java/jenkins/model/Jenkins.java#L1333-L1334

          Thanks for the explanation.
          In that case I will try to downgrade our test instance to see whether we will be able to reproduce the issue then.

          FYI.: On the productive instance it really looks like the issue is caused by JAVA8 JNLP images. Will post that in the linked ticket, too.

          Joerg Schwaerzler added a comment - Thanks for the explanation. In that case I will try to downgrade our test instance to see whether we will be able to reproduce the issue then. FYI.: On the productive instance it really looks like the issue is caused by JAVA8 JNLP images. Will post that in the linked ticket, too.

          macdrega any update ?

          Allan BURDAJEWICZ added a comment - macdrega any update ?

          Rahali added a comment -

          macdrega any update please for this issue ?

          Rahali added a comment - macdrega any update please for this issue ?

          Joerg Schwaerzler added a comment - - edited

          We fully migrated to Java11 and do not see this issues anymore. Currently we are running 2.401.3.
          Sorry for the late response.

          Joerg Schwaerzler added a comment - - edited We fully migrated to Java11 and do not see this issues anymore. Currently we are running 2.401.3. Sorry for the late response.

            allan_burdajewicz Allan BURDAJEWICZ
            allan_burdajewicz Allan BURDAJEWICZ
            Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: