Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-64598

Jenkins agent disconnects on k8s with SIGHUP / ClosedChannelException

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Not A Defect
    • Labels:
    • Environment:
      jenkins instance:
      jenkins core 2.263.1
      CentOS Linux 7 (Core)
      kubernetes plugin 1.28.4

      jenkins agent remoting VERSION=4.6
      -websocket flag passed to jenkins agent
    • Similar Issues:

      Description

      I get intermittent agent disconnects while build is running. I'll try to provide as much info, let me know what else I can check.

       

      • Jenkins master java version 11 (java-11-openjdk-11.0.5.10) started with hudson.slaves.ChannelPinger.pingIntervalSeconds 30 in order to avoid disconnects
      • Nginx reverse proxy in use and ssl timeout is 5 minutes, which was too close to the default hudson.slaves.ChannelPinger.pingIntervalSeconds, so was reduced to 30 seconds with good results, and reduced the number of disconnects per day (stack trace was different and did not show a SIGHUP)
      • jenkins masters are on premise
      • jenkins agents are in GKE GCP kubernetes version 1.16.5
      • jenkins agent container image has default java -version
        openjdk version "1.8.0_232"
        OpenJDK Runtime Environment (build 1.8.0_232-8u232-b09-1~deb9u1-b09)
        OpenJDK 64-Bit Server VM (build 25.232-b09, mixed mode)
      • remoting VERSION=4.6
      • -websocket flag passed to jenkins agent via the k8s plugin extra cli command, I noticed afterwards there is a checkbox for websocket (in the kubernetes plugin config), but couldn't find docs to go with it, should I switch to using that?
      • In terms of sizing, we peak to about 400 jenkins-agents / pods connected at a time, the limit is set to 500 in the jenkins kubernetes plugin configuration
      • The issue happens even when load is low

      The connection is established fine, but intermittently gets disconnected. Let me know what else I can look at.

       

      Stack trace:

       

      SignalException: SIGHUP
      FATAL: command execution failed
      java.nio.channels.ClosedChannelException
      	at jenkins.agents.WebSocketAgents$Session.closed(WebSocketAgents.java:141)
      	at jenkins.websocket.WebSocketSession.onWebSocketSomething(WebSocketSession.java:91)
      	at com.sun.proxy.$Proxy105.onWebSocketClose(Unknown Source)
      	at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onClose(JettyListenerEventDriver.java:149)
      	at org.eclipse.jetty.websocket.common.WebSocketSession.callApplicationOnClose(WebSocketSession.java:394)
      	at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.disconnect(AbstractWebSocketConnection.java:316)
      	at org.eclipse.jetty.websocket.common.io.DisconnectCallback.succeeded(DisconnectCallback.java:42)
      	at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection$CallbackBridge.writeSuccess(AbstractWebSocketConnection.java:86)
      	at org.eclipse.jetty.websocket.common.io.FrameFlusher.notifyCallbackSuccess(FrameFlusher.java:359)
      	at org.eclipse.jetty.websocket.common.io.FrameFlusher.succeedEntries(FrameFlusher.java:288)
      	at org.eclipse.jetty.websocket.common.io.FrameFlusher.succeeded(FrameFlusher.java:280)
      	at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:293)
      	at org.eclipse.jetty.io.AbstractEndPoint.write(AbstractEndPoint.java:381)
      	at org.eclipse.jetty.websocket.common.io.FrameFlusher.flush(FrameFlusher.java:264)
      	at org.eclipse.jetty.websocket.common.io.FrameFlusher.process(FrameFlusher.java:193)
      	at org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:241)
      	at org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:223)
      	at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.outgoingFrame(AbstractWebSocketConnection.java:581)
      	at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.close(AbstractWebSocketConnection.java:181)
      	at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:510)
      	at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:440)
      	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
      	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
      	at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
      	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
      	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
      	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
      	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)
      	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:375)
      	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:773)
      	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:905)
      	at java.base/java.lang.Thread.run(Thread.java:834)

        Attachments

          Issue Links

            Activity

            Hide
            jglick Jesse Glick added a comment -

            I cannot think of any reason offhand why you would get SIGHUP.

            Note that if you suspect problems with WebSocket, you can check the server behavior using variants of

            websocat -vv -t wss://user:apitoken@jenkins/wsecho/
            
            Show
            jglick Jesse Glick added a comment - I cannot think of any reason offhand why you would get SIGHUP. Note that if you suspect problems with WebSocket, you can check the server behavior using variants of websocat -vv -t wss://user:apitoken@jenkins/wsecho/
            Hide
            sbeaulie Samuel Beaulieu added a comment -

            Thanks for the tip. I do not think at this point that it is a specific issue due to websockets, it just happen that it is our setup, but the issue presents itself when using JNLP port too.
            Our next step to investigate is to run jenkins servers in kubernetes so that they are close to each other and see if the issue is still present.

            Show
            sbeaulie Samuel Beaulieu added a comment - Thanks for the tip. I do not think at this point that it is a specific issue due to websockets, it just happen that it is our setup, but the issue presents itself when using JNLP port too. Our next step to investigate is to run jenkins servers in kubernetes so that they are close to each other and see if the issue is still present.
            Hide
            sbeaulie Samuel Beaulieu added a comment -

            We found out that the k8s nodes were being removed from the cluster because they were pre-emptible instances

            Show
            sbeaulie Samuel Beaulieu added a comment - We found out that the k8s nodes were being removed from the cluster because they were pre-emptible instances
            Hide
            jglick Jesse Glick added a comment -

            Hmm. An issue for kubernetes-plugin perhaps, to add appropriate labels or something?

            Show
            jglick Jesse Glick added a comment - Hmm. An issue for kubernetes-plugin perhaps, to add appropriate labels or something?
            Hide
            jglick Jesse Glick added a comment -

            And then there is the diagnosis aspect. I wonder if https://www.jenkins.io/projects/gsoc/2021/project-ideas/remoting-monitoring/ would help make it more apparent what is going on.

            Show
            jglick Jesse Glick added a comment - And then there is the diagnosis aspect. I wonder if https://www.jenkins.io/projects/gsoc/2021/project-ideas/remoting-monitoring/ would help make it more apparent what is going on.

              People

              Assignee:
              jthompson Jeff Thompson
              Reporter:
              sbeaulie Samuel Beaulieu
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: