Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-61851

Repeated build failures after agent disconnection

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not A Defect
    • Icon: Major Major
    • remoting
    • None
    • master = Jenkins 2.229, in docker image (linux) proxy passed by nginx, openjdk 1.8.0_242
      agent = remoting 4.3 (with -webSocket connection), windows10, openjdk 1.9.0_202

      We've been facing a couple of build failure because of agent disconnection for no apparent reasons. Failures are following the same pattern:

      The build logs say (time of error):

      9:45:50 FATAL: command execution failed
      java.nio.channels.ClosedChannelException
      	at jenkins.agents.WebSocketAgents$Session.closed(WebSocketAgents.java:141)
      	at jenkins.websocket.WebSocketSession.onWebSocketSomething(WebSocketSession.java:91)
      ...
      Caused: java.io.IOException: Backing channel 'our-windows10-agent' is disconnected.
      	at hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:216)
      

      On the agent, the logs say:

      Apr 09, 2020 9:45:50 AM hudson.Launcher$RemoteLaunchCallable$1 join
      INFO: Failed to synchronize IO streams on the channel hudson.remoting.Channel@11010e5e:our-windows10-agent
      hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@11010e5e:our-windows10-agent": Remote call on our-windows10-agent failed. The channel is closing down or has closed down
              at hudson.remoting.Channel.call(Channel.java:991)
              at hudson.remoting.Channel.syncIO(Channel.java:1730)
      ...
      Caused by: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@11010e5e:our-windows10-agent": channel is already closed
              at hudson.remoting.Engine$1AgentEndpoint.onClose(Engine.java:590)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusEndpointWrapper.onClose(TyrusEndpointWrapper.java:1251)
      
      Apr 09, 2020 9:45:50 AM hudson.remoting.UserRequest perform
      WARNING: LinkageError while performing UserRequest:UserRPCRequest:hudson.Launcher$RemoteProcess.join[](4)
      java.lang.LinkageError: Failed to load hudson.util.ProcessTree
              at hudson.remoting.RemoteClassLoader.loadClassFile(RemoteClassLoader.java:391)
      ...
      Caused by: java.lang.NoClassDefFoundError: hudson/util/ProcessTreeRemoting$IProcessTree
              at java.lang.ClassLoader.defineClass1(Native Method)
      ...
      Caused by: java.lang.ClassNotFoundException: hudson.util.ProcessTreeRemoting$IProcessTree
              at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
      

      On the master, the agent logs say (file slave.log.1, mtime = Apr 9 09:45):

      Inbound agent connected from x.x.x.x
      Remoting version: 4.3
      This is a Windows agent
      onOnline: class org.jenkinsci.modules.slave_installer.impl.ComputerListenerImpl reported an exception: java.net.MalformedURLException: no protocol: jnlpJars/slave.jar
      Agent successfully connected and online
      ERROR: Connection terminated
      java.nio.channels.ClosedChannelException
      	at jenkins.agents.WebSocketAgents$Session.closed(WebSocketAgents.java:141)
      	at jenkins.websocket.WebSocketSession.onWebSocketSomething(WebSocketSession.java:91)
      	at jenkins.websocket.WebSockets$$Lambda$282/00000000940934C0.invoke(Unknown Source)
      	at com.sun.proxy.$Proxy74.onWebSocketClose(Unknown Source)
      	at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onClose(JettyListenerEventDriver.java:119)
      	at org.eclipse.jetty.websocket.common.WebSocketSession.callApplicationOnClose(WebSocketSession.java:389)
      	at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.disconnect(AbstractWebSocketConnection.java:317)
      	at org.eclipse.jetty.websocket.common.io.DisconnectCallback.succeeded(DisconnectCallback.java:42)
      	at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection$CallbackBridge.writeSuccess(AbstractWebSocketConnection.java:86)
      	at org.eclipse.jetty.websocket.common.io.FrameFlusher.notifyCallbackSuccess(FrameFlusher.java:359)
      	at org.eclipse.jetty.websocket.common.io.FrameFlusher.succeedEntries(FrameFlusher.java:288)
      	at org.eclipse.jetty.websocket.common.io.FrameFlusher.succeeded(FrameFlusher.java:280)
      	at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:293)
      	at org.eclipse.jetty.io.AbstractEndPoint.write(AbstractEndPoint.java:381)
      	at org.eclipse.jetty.websocket.common.io.FrameFlusher.flush(FrameFlusher.java:264)
      	at org.eclipse.jetty.websocket.common.io.FrameFlusher.process(FrameFlusher.java:193)
      	at org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:241)
      	at org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:223)
      	at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.outgoingFrame(AbstractWebSocketConnection.java:584)
      	at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.close(AbstractWebSocketConnection.java:181)
      	at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:511)
      	at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:441)
      	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
      	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
      	at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
      	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
      	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
      	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
      	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)
      	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:375)
      	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806)
      	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938)
      	at java.lang.Thread.run(Unknown Source)
      

      I see no error in the nginx (reverse proxy) logs. I only see the GET /wsagent/ in the access logs.

      Weirdly, the last GET I see in the access logs on the the wsagents endoint does not fit with the error. It's like the agent did not reconnect. The agent logs in the master say that the "Connection terminated" (see above) and it's still connected (the mtime of the latest agent log file - slave.log - on the master is Apr 9 09:46).

      We used to run this agent with a direct TCP connection (with Jenkins 2.204.5 and remoting 3.36.1) and we were not facing the issue. We need to switch to websocket connection to avoid streaming the TCP connection in the nginx reverse-proxy.

      Is there anything I can do to better analyze what's going on?

            jthompson Jeff Thompson
            mbarbero Mikaƫl Barbero
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: