Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-51959

linux slave intermittently looses connection to master

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Minor
    • Resolution: Not A Defect
    • Component/s: ssh-slaves-plugin
    • Labels:
      None
    • Environment:
      centos 7.4
      jenkins version- 2.89.4
    • Similar Issues:

      Description

      jenkins linux slave intermittently looses connection to master while build execution and the build fails with the below error

      ERROR: Build step failed with exception
      Also: hudson.remoting.Channel$CallSiteStackTrace: Remote call to $slave_name
      at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1693)
      at hudson.remoting.UserResponse.retrieve(UserRequest.java:310)
      at hudson.remoting.Channel.call(Channel.java:908)
      at hudson.FilePath.act(FilePath.java:986)
      at hudson.FilePath.act(FilePath.java:975)
      at hudson.tasks.ArtifactArchiver.perform(ArtifactArchiver.java:243)
      at hudson.tasks.BuildStepCompatibilityLayer.perform(BuildStepCompatibilityLayer.java:81)
      at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
      at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:744)
      at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:690)
      at hudson.model.Build$BuildExecution.post2(Build.java:186)
      at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:635)
      at hudson.model.Run.execute(Run.java:1749)
      at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
      at hudson.model.ResourceController.execute(ResourceController.java:97)
      at hudson.model.Executor.run(Executor.java:429)

      what is the solution to this problem?
      It occurs intermittently and a lot of our builds on different slaves are failing with this error.
      Can someone help in checking this?
      our slaves connect to the master through SSH

        Attachments

          Activity

          Hide
          ifernandezcalvo Ivan Fernandez Calvo added a comment -

          Do you upload more many artifacts at the same time to the master?

          Do you upload many artifacts at the same time from the same agent?

          long-running uploads to a master could be a problem, but it is not related with the SSH Slave Plugin, it is more remoting stuff, also 0.570 MB/s it is a slow connection error-prone, tye to compress the artifact before upload it, or improve the connection speed.

          Show
          ifernandezcalvo Ivan Fernandez Calvo added a comment - Do you upload more many artifacts at the same time to the master? Do you upload many artifacts at the same time from the same agent? long-running uploads to a master could be a problem, but it is not related with the SSH Slave Plugin, it is more remoting stuff, also 0.570 MB/s it is a slow connection error-prone, tye to compress the artifact before upload it, or improve the connection speed.
          Hide
          priyankapanda348 Priyanka Panda added a comment -

          yes,we upload many artifacts at the same time from the same agent to master.
          But there are only some builds that fail when the slave looses connection(may b once a day) and all other times the builds pass
          If the issue with uploading many artifacts simultaneously then shouldnt all upload jobs fail??

          Show
          priyankapanda348 Priyanka Panda added a comment - yes,we upload many artifacts at the same time from the same agent to master. But there are only some builds that fail when the slave looses connection(may b once a day) and all other times the builds pass If the issue with uploading many artifacts simultaneously then shouldnt all upload jobs fail??
          Hide
          ifernandezcalvo Ivan Fernandez Calvo added a comment -

          >If the issue with uploading many artifacts simultaneously then shouldnt all upload jobs fail??

          no, but you have a poor bandwidth, probably network outages too, two hours uploading a file it is time enough to something wrong happens. 

          Show
          ifernandezcalvo Ivan Fernandez Calvo added a comment - >If the issue with uploading many artifacts simultaneously then shouldnt all upload jobs fail?? no, but you have a poor bandwidth, probably network outages too, two hours uploading a file it is time enough to something wrong happens. 
          Hide
          priyankapanda348 Priyanka Panda added a comment -

          i think i found the error here,
          Connection terminated
          ERROR: Failed to install restarter
          ERROR: Unexpected error in launching an agent. This is probably a bug in Jenkins
          Command close created at
          at hudson.remoting.Command.<init>(Command.java:60)
          at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:1219)
          at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:1217)
          at hudson.remoting.Channel.close(Channel.java:1391)
          at hudson.remoting.Channel.close(Channel.java:1358)
          at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1224)
          Caused: hudson.remoting.Channel$OrderlyShutdown
          Also: hudson.remoting.Channel$CallSiteStackTrace: Remote call to huluwa5_smoke
          at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1693)
          at hudson.remoting.Request.call(Request.java:192)
          at hudson.remoting.Channel.call(Channel.java:907)
          at org.jenkinsci.modules.slave_installer.impl.ComputerListenerImpl.onOnline(ComputerListenerImpl.java:32)
          at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:611)
          at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:413)
          at hudson.slaves.CommandLauncher.launch(CommandLauncher.java:153)
          at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:285)
          at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
          at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)
          Caused: hudson.remoting.RequestAbortedException
          at hudson.remoting.Request.abort(Request.java:329)
          at hudson.remoting.Channel.terminate(Channel.java:992)
          at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1225)
          at hudson.remoting.Channel$1.handle(Channel.java:560)
          at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:87)
          Command close created at
          at hudson.remoting.Command.<init>(Command.java:60)
          at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:1219)
          at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:1217)
          at hudson.remoting.Channel.close(Channel.java:1391)
          at hudson.remoting.Channel.close(Channel.java:1358)
          at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1224)
          Caused: hudson.remoting.Channel$OrderlyShutdown
          Also: hudson.remoting.Channel$CallSiteStackTrace: Remote call to huluwa5_smoke
          at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1693)
          at hudson.remoting.Request.call(Request.java:192)
          at hudson.remoting.Channel.call(Channel.java:907)
          at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller.install(JnlpSlaveRestarterInstaller.java:53)
          at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller.access$000(JnlpSlaveRestarterInstaller.java:34)
          at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$1.call(JnlpSlaveRestarterInstaller.java:40)
          at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$1.call(JnlpSlaveRestarterInstaller.java:37)
          at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
          at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)
          Caused: hudson.remoting.RequestAbortedException
          at hudson.remoting.Request.abort(Request.java:329)
          at hudson.remoting.Channel.terminate(Channel.java:992)
          at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1225)
          at hudson.remoting.Channel$1.handle(Channel.java:560)
          at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:87)

          i see this in the slave logs, the slave fails to connect with this error and then after throwing this error, the slave starts up again after throwing this error again
          any thougts on this?

          Show
          priyankapanda348 Priyanka Panda added a comment - i think i found the error here, Connection terminated ERROR: Failed to install restarter ERROR: Unexpected error in launching an agent. This is probably a bug in Jenkins Command close created at at hudson.remoting.Command.<init>(Command.java:60) at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:1219) at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:1217) at hudson.remoting.Channel.close(Channel.java:1391) at hudson.remoting.Channel.close(Channel.java:1358) at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1224) Caused: hudson.remoting.Channel$OrderlyShutdown Also: hudson.remoting.Channel$CallSiteStackTrace: Remote call to huluwa5_smoke at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1693) at hudson.remoting.Request.call(Request.java:192) at hudson.remoting.Channel.call(Channel.java:907) at org.jenkinsci.modules.slave_installer.impl.ComputerListenerImpl.onOnline(ComputerListenerImpl.java:32) at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:611) at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:413) at hudson.slaves.CommandLauncher.launch(CommandLauncher.java:153) at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:285) at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused: hudson.remoting.RequestAbortedException at hudson.remoting.Request.abort(Request.java:329) at hudson.remoting.Channel.terminate(Channel.java:992) at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1225) at hudson.remoting.Channel$1.handle(Channel.java:560) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:87) Command close created at at hudson.remoting.Command.<init>(Command.java:60) at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:1219) at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:1217) at hudson.remoting.Channel.close(Channel.java:1391) at hudson.remoting.Channel.close(Channel.java:1358) at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1224) Caused: hudson.remoting.Channel$OrderlyShutdown Also: hudson.remoting.Channel$CallSiteStackTrace: Remote call to huluwa5_smoke at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1693) at hudson.remoting.Request.call(Request.java:192) at hudson.remoting.Channel.call(Channel.java:907) at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller.install(JnlpSlaveRestarterInstaller.java:53) at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller.access$000(JnlpSlaveRestarterInstaller.java:34) at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$1.call(JnlpSlaveRestarterInstaller.java:40) at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$1.call(JnlpSlaveRestarterInstaller.java:37) at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused: hudson.remoting.RequestAbortedException at hudson.remoting.Request.abort(Request.java:329) at hudson.remoting.Channel.terminate(Channel.java:992) at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1225) at hudson.remoting.Channel$1.handle(Channel.java:560) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:87) i see this in the slave logs, the slave fails to connect with this error and then after throwing this error, the slave starts up again after throwing this error again any thougts on this?
          Hide
          ifernandezcalvo Ivan Fernandez Calvo added a comment -

          The connection is terminated and cannot be reopened, it seems a network outage between the Agent and the Master, also I think that this is the trace on a JNLP agent, so it is not related with ssh slaves plugin at all.

          Show
          ifernandezcalvo Ivan Fernandez Calvo added a comment - The connection is terminated and cannot be reopened, it seems a network outage between the Agent and the Master, also I think that this is the trace on a JNLP agent, so it is not related with ssh slaves plugin at all.

            People

            Assignee:
            ifernandezcalvo Ivan Fernandez Calvo
            Reporter:
            priyankapanda348 Priyanka Panda
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: