Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-55251

Builds randomly stuck and fail for the reason of lost connection but the slave is actually online

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Component/s: remoting
    • Labels:
      None
    • Environment:
      Master: Jenkins ver. 2.138.2 on Ubuntu 16.04.3 LTS (Xenial Xerus), Java 1.8.0_191
      Slave: Unix slave, version 3.25 on Centos 7.5,Java 1.8.0_144
    • Similar Issues:

      Description

      The master is installed with an offline installation package.
      Tasks are all run on physical machines and are written using declarative pipeline.
      The browser is Google Chrome 70


      I tried UNIX SSH and JNLP - Slave to connect the master node and the slave node, which will cause this phenomenon, but it is slightly different.
      When using UNIX SSH, the node is still online after the build is interrupted. However, using JNLP - Slave method, the node is not online after the construction interruption ( agent.jar process hangs up ).

      The construction task reported an error as follows:

       

      wrapper script does not seem to be touching the log file in /var/lib/jenkins_node/workspace/FDS/FDS-FdsWebServer-deploy-test@tmp/durable-5927b140 (JENKINS-48300: if on an extremely laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=86400) [Pipeline] stash Error when executing always post condition: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on JNLP4-connect connection from 10.0.100.106/10.0.100.106:52594 failed. The channel is closing down or has closed down at hudson.remoting.Channel.call(Channel.java:948) at hudson.FilePath.act(FilePath.java:1071) at hudson.FilePath.act(FilePath.java:1060) at hudson.FilePath.archive(FilePath.java:484) at org.jenkinsci.plugins.workflow.flow.StashManager.stash(StashManager.java:128) at org.jenkinsci.plugins.workflow.support.steps.stash.StashStep$Execution.run(StashStep.java:115) at org.jenkinsci.plugins.workflow.support.steps.stash.StashStep$Execution.run(StashStep.java:103) at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution$1$1.call(SynchronousNonBlockingStepExecution.java:50) at hudson.security.ACL.impersonate(ACL.java:290) at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution$1.run(SynchronousNonBlockingStepExecution.java:47) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.nio.channels.ClosedChannelException at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:154) at org.jenkinsci.remoting.protocol.impl.NIONetworkLayer.ready(NIONetworkLayer.java:142) at org.jenkinsci.remoting.protocol.IOHub$OnReady.run(IOHub.java:795) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59) ... 3 more

       

      Query node log, only these contents

      Dec 18, 2018 10:41:49 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Remote identity confirmed: 69:b4:62:1d:b1:5a:f4:9d:11:16:2c:26:8d:e8:a7:74
      Dec 18, 2018 10:41:51 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Connected
      Dec 18, 2018 10:41:52 PM org.jenkinsci.remoting.util.AnonymousClassWarnings warn
      WARNING: Attempt to (de-)serialize anonymous class org.jenkinsci.plugins.envinject.EnvInjectComputerListener$2; see: https://jenkins.io/redirect/serialization-of-anonymous-classes/
      Dec 19, 2018 2:43:01 AM org.jenkinsci.remoting.util.AnonymousClassWarnings warn
      WARNING: Attempt to (de-)serialize anonymous class org.jenkinsci.plugins.gitclient.Git$1; see: https://jenkins.io/redirect/serialization-of-anonymous-classes/
      Dec 19, 2018 2:43:04 AM org.jenkinsci.remoting.util.AnonymousClassWarnings warn
      WARNING: Attempt to (de-)serialize anonymous class org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1; see: https://jenkins.io/redirect/serialization-of-anonymous-classes/

      This problem has never occurred in the past few months. Then suddenly appeared in recent weeks.

      Jenkins version has not changed, some plugins have been upgraded and some have been installed, but i'm not very clear which ones are changed, so post all installed plugins and version  ( see attachment )

       

       

       

       

       

        Attachments

          Activity

          Hide
          jthompson Jeff Thompson added a comment -

          Unfortunately, these failures tend to be system, network, or environment related. The "Attempt to (de-)serialize anonymous class" messages shouldn't be significant. The ChannelClosedException can occur for a number of different reasons. As you explain these suddenly started occurring recently without updating the Jenkins version, I recommend you investigate what might have changed. If you haven't updated Jenkins or Remoting versions it's unlikely they are the cause.

          Another possibility is to try the Remoting Kafka Plugin. We have some hopes that it will be more reliable to environmental conditions.

          Good luck investigating what is going on.

          Show
          jthompson Jeff Thompson added a comment - Unfortunately, these failures tend to be system, network, or environment related. The "Attempt to (de-)serialize anonymous class" messages shouldn't be significant. The ChannelClosedException can occur for a number of different reasons. As you explain these suddenly started occurring recently without updating the Jenkins version, I recommend you investigate what might have changed. If you haven't updated Jenkins or Remoting versions it's unlikely they are the cause. Another possibility is to try the Remoting Kafka Plugin. We have some hopes that it will be more reliable to environmental conditions. Good luck investigating what is going on.

            People

            Assignee:
            jthompson Jeff Thompson
            Reporter:
            arlen Arlen Rick
            Votes:
            2 Vote for this issue
            Watchers:
            4 Start watching this issue

              Dates

              Created:
              Updated: