Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-11097

Jobs get aborted with "Command close created" as cause in the stacktrace

    • Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Icon: Critical Critical
    • core
    • None
    • 1.429

      We random see jobs getting aborted with the stacktraces below.
      We see this for long running jobs, which execute a large set of JUnit tests.

      Windows 2008 machine (x64)

      FATAL: Unable to delete script file C:\Windows\TEMP\hudson7878511792654741097.bat
      hudson.util.IOException2: remote file operation failed: C:\Windows\TEMP\hudson7878511792654741097.bat at hudson.remoting.Channel@5a171ae3:srv-nl-crd05
      at hudson.FilePath.act(FilePath.java:754)
      at hudson.FilePath.act(FilePath.java:740)
      at hudson.FilePath.delete(FilePath.java:995)
      at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:92)
      at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58)
      at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
      at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:693)
      at hudson.model.Build$RunnerImpl.build(Build.java:178)
      at hudson.model.Build$RunnerImpl.doRun(Build.java:139)
      at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:459)
      at hudson.model.Run.run(Run.java:1376)
      at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
      at hudson.model.ResourceController.execute(ResourceController.java:88)
      at hudson.model.Executor.run(Executor.java:230)
      Caused by: hudson.remoting.ChannelClosedException: channel is already closed
      at hudson.remoting.Channel.send(Channel.java:492)
      at hudson.remoting.Request.call(Request.java:110)
      at hudson.remoting.Channel.call(Channel.java:674)
      at hudson.FilePath.act(FilePath.java:747)
      ... 13 more
      Caused by: hudson.remoting.Channel$OrderlyShutdown
      at hudson.remoting.Channel$CloseCommand.execute(Channel.java:819)
      at hudson.remoting.Channel$ReaderThread.run(Channel.java:1071)
      Caused by: Command close created at
      at hudson.remoting.Command.<init>(Command.java:62)
      at hudson.remoting.Command.<init>(Command.java:47)
      at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:815)
      at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:815)
      at hudson.remoting.Channel.close(Channel.java:860)
      at hudson.slaves.ChannelPinger$1.onDead(ChannelPinger.java:112)
      at hudson.remoting.PingThread.ping(PingThread.java:107)
      at hudson.remoting.PingThread.run(PingThread.java:81)
      FATAL: hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown
      hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown
      at hudson.remoting.Request.call(Request.java:149)
      at hudson.remoting.Channel.call(Channel.java:674)
      at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:158)
      at $Proxy35.join(Unknown Source)
      at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:850)
      at hudson.Launcher$ProcStarter.join(Launcher.java:336)
      at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:82)
      at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58)
      at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
      at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:693)
      at hudson.model.Build$RunnerImpl.build(Build.java:178)
      at hudson.model.Build$RunnerImpl.doRun(Build.java:139)
      at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:459)
      at hudson.model.Run.run(Run.java:1376)
      at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
      at hudson.model.ResourceController.execute(ResourceController.java:88)
      at hudson.model.Executor.run(Executor.java:230)
      Caused by: hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown
      at hudson.remoting.Request.abort(Request.java:273)
      at hudson.remoting.Channel.terminate(Channel.java:725)
      at hudson.remoting.Channel$CloseCommand.execute(Channel.java:819)
      at hudson.remoting.Channel$ReaderThread.run(Channel.java:1071)
      Caused by: hudson.remoting.Channel$OrderlyShutdown
      ... 2 more
      Caused by: Command close created at
      at hudson.remoting.Command.<init>(Command.java:62)
      at hudson.remoting.Command.<init>(Command.java:47)
      at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:815)
      at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:815)
      at hudson.remoting.Channel.close(Channel.java:860)
      at hudson.slaves.ChannelPinger$1.onDead(ChannelPinger.java:112)
      at hudson.remoting.PingThread.ping(PingThread.java:107)
      at hudson.remoting.PingThread.run(PingThread.java:81)

      Linux (CentOS x64)
      FATAL: Unable to delete script file /tmp/hudson8819356115500333751.sh
      hudson.util.IOException2: remote file operation failed: /tmp/hudson8819356115500333751.sh at hudson.remoting.Channel@6d12070a:srv-nl-crd12
      at hudson.FilePath.act(FilePath.java:754)
      at hudson.FilePath.act(FilePath.java:740)
      at hudson.FilePath.delete(FilePath.java:995)
      at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:92)
      at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58)
      at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
      at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:693)
      at hudson.model.Build$RunnerImpl.build(Build.java:178)
      at hudson.model.Build$RunnerImpl.doRun(Build.java:139)
      at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:459)
      at hudson.model.Run.run(Run.java:1376)
      at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
      at hudson.model.ResourceController.execute(ResourceController.java:88)
      at hudson.model.Executor.run(Executor.java:230)
      Caused by: hudson.remoting.ChannelClosedException: channel is already closed
      at hudson.remoting.Channel.send(Channel.java:492)
      at hudson.remoting.Request.call(Request.java:110)
      at hudson.remoting.Channel.call(Channel.java:674)
      at hudson.FilePath.act(FilePath.java:747)
      ... 13 more
      Caused by: hudson.remoting.Channel$OrderlyShutdown
      at hudson.remoting.Channel$CloseCommand.execute(Channel.java:819)
      at hudson.remoting.Channel$ReaderThread.run(Channel.java:1071)
      Caused by: Command close created at
      at hudson.remoting.Command.<init>(Command.java:62)
      at hudson.remoting.Command.<init>(Command.java:47)
      at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:815)
      at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:815)
      at hudson.remoting.Channel.close(Channel.java:860)
      at hudson.slaves.ChannelPinger$1.onDead(ChannelPinger.java:112)
      at hudson.remoting.PingThread.ping(PingThread.java:107)
      at hudson.remoting.PingThread.run(PingThread.java:81)
      FATAL: hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown
      hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown
      at hudson.remoting.Request.call(Request.java:149)
      at hudson.remoting.Channel.call(Channel.java:674)
      at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:158)
      at $Proxy35.join(Unknown Source)
      at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:850)
      at hudson.Launcher$ProcStarter.join(Launcher.java:336)
      at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:82)
      at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58)
      at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
      at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:693)
      at hudson.model.Build$RunnerImpl.build(Build.java:178)
      at hudson.model.Build$RunnerImpl.doRun(Build.java:139)
      at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:459)
      at hudson.model.Run.run(Run.java:1376)
      at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
      at hudson.model.ResourceController.execute(ResourceController.java:88)
      at hudson.model.Executor.run(Executor.java:230)
      Caused by: hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown
      at hudson.remoting.Request.abort(Request.java:273)
      at hudson.remoting.Channel.terminate(Channel.java:725)
      at hudson.remoting.Channel$CloseCommand.execute(Channel.java:819)
      at hudson.remoting.Channel$ReaderThread.run(Channel.java:1071)
      Caused by: hudson.remoting.Channel$OrderlyShutdown
      ... 2 more
      Caused by: Command close created at
      at hudson.remoting.Command.<init>(Command.java:62)
      at hudson.remoting.Command.<init>(Command.java:47)
      at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:815)
      at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:815)
      at hudson.remoting.Channel.close(Channel.java:860)
      at hudson.slaves.ChannelPinger$1.onDead(ChannelPinger.java:112)
      at hudson.remoting.PingThread.ping(PingThread.java:107)
      at hudson.remoting.PingThread.run(PingThread.java:81)

      Our master is running on CentOS x64.

          [JENKINS-11097] Jobs get aborted with "Command close created" as cause in the stacktrace

          The channel apparently is terminated by the ping thread. It's not getting any response back in 4 mins, so it has decided that something went wrong.

          You can disable the ping with the system property -Dhudson.slaves.ChannelPinger.pingInterval=0 (but this has the downside of Jenkins failing to notice the abnormal communication termination, so use it at your own risk.)

          I wonder if this is caused by spurious wakeup in threads. Even under heavy load, I find it hard to believe that a command roundtrip between a master and a slave takes 4 minutes. You also might want to check for a long GC pause in the master. In the past I've heard of 1+ min stop caused by GC (and back then that was enough to alert the pinger.)

          Kohsuke Kawaguchi added a comment - The channel apparently is terminated by the ping thread. It's not getting any response back in 4 mins, so it has decided that something went wrong. You can disable the ping with the system property -Dhudson.slaves.ChannelPinger.pingInterval=0 (but this has the downside of Jenkins failing to notice the abnormal communication termination, so use it at your own risk.) I wonder if this is caused by spurious wakeup in threads. Even under heavy load, I find it hard to believe that a command roundtrip between a master and a slave takes 4 minutes. You also might want to check for a long GC pause in the master. In the past I've heard of 1+ min stop caused by GC (and back then that was enough to alert the pinger.)

          Code changed in jenkins
          User: Kohsuke Kawaguchi
          Path:
          src/main/java/hudson/remoting/PingThread.java
          http://jenkins-ci.org/commit/remoting/d94bb55ed83e2616409369a35306278234af802f
          Log:
          JENKINS-11097 defending against possible spurious wakeup.

          Although looking at how channel.callAsync is implemented, I think this Future implementation doesn't have that problem.

          Compare: https://github.com/jenkinsci/remoting/compare/01a6df7...d94bb55

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: src/main/java/hudson/remoting/PingThread.java http://jenkins-ci.org/commit/remoting/d94bb55ed83e2616409369a35306278234af802f Log: JENKINS-11097 defending against possible spurious wakeup. Although looking at how channel.callAsync is implemented, I think this Future implementation doesn't have that problem. Compare: https://github.com/jenkinsci/remoting/compare/01a6df7...d94bb55

          Code changed in jenkins
          User: Kohsuke Kawaguchi
          Path:
          changelog.html
          core/src/main/java/hudson/model/FullDuplexHttpChannel.java
          core/src/main/java/hudson/slaves/ChannelPinger.java
          pom.xml
          http://jenkins-ci.org/commit/jenkins/a4440f9b8911c58feec6fc56c20c187fcc3c2e3b
          Log:
          JENKINS-11097

          Added more diagnositics about how ping terminated a connection.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: changelog.html core/src/main/java/hudson/model/FullDuplexHttpChannel.java core/src/main/java/hudson/slaves/ChannelPinger.java pom.xml http://jenkins-ci.org/commit/jenkins/a4440f9b8911c58feec6fc56c20c187fcc3c2e3b Log: JENKINS-11097 Added more diagnositics about how ping terminated a connection.

          dogfood added a comment -

          Integrated in jenkins_main_trunk #1164
          JENKINS-11097

          Kohsuke Kawaguchi : a4440f9b8911c58feec6fc56c20c187fcc3c2e3b
          Files :

          • pom.xml
          • core/src/main/java/hudson/model/FullDuplexHttpChannel.java
          • changelog.html
          • core/src/main/java/hudson/slaves/ChannelPinger.java

          dogfood added a comment - Integrated in jenkins_main_trunk #1164 JENKINS-11097 Kohsuke Kawaguchi : a4440f9b8911c58feec6fc56c20c187fcc3c2e3b Files : pom.xml core/src/main/java/hudson/model/FullDuplexHttpChannel.java changelog.html core/src/main/java/hudson/slaves/ChannelPinger.java

          Cees Bos added a comment -

          Thanks, we will update to the latest version (when this is available) and come back on this once we see this issue again.

          Cees Bos added a comment - Thanks, we will update to the latest version (when this is available) and come back on this once we see this issue again.

          For reference the remoting changes above (at https://github.com/jenkinsci/remoting/compare/01a6df7...d94bb55 ) have been contributing to JENKINS-12037

          My proposed fix is at https://github.com/jenkinsci/remoting/pull/3

          Richard Mortimer added a comment - For reference the remoting changes above (at https://github.com/jenkinsci/remoting/compare/01a6df7...d94bb55 ) have been contributing to JENKINS-12037 My proposed fix is at https://github.com/jenkinsci/remoting/pull/3

          Albin Joy added a comment -

          We are also facing the same issue with same kind of log.
          We are using Jenkins 1.458.

          Can anybody tell me, whether any of the Jenkins latest version has fixed this problem?
          I suspect this issue is happening because of network interrupt during the build execution.

          So any way to do a retry for the connection in the build execution?
          If somebody can help me, then I can implement the solution

          Albin Joy added a comment - We are also facing the same issue with same kind of log. We are using Jenkins 1.458. Can anybody tell me, whether any of the Jenkins latest version has fixed this problem? I suspect this issue is happening because of network interrupt during the build execution. So any way to do a retry for the connection in the build execution? If somebody can help me, then I can implement the solution

          evernat added a comment -

          Is this issue reproduced with a recent Jenkins version?

          evernat added a comment - Is this issue reproduced with a recent Jenkins version?

          Daniel Beck added a comment -

          No response to comment asking for updated information in several months, so resolving as Cannot Reproduce.

          If this still (or again) occurs in a Jenkins version no older than eight weeks, please file a new issue. Thanks.

          Daniel Beck added a comment - No response to comment asking for updated information in several months, so resolving as Cannot Reproduce. If this still (or again) occurs in a Jenkins version no older than eight weeks, please file a new issue . Thanks.

          wbauer added a comment -

          This issue is fairly new and looks similar:
          https://issues.jenkins-ci.org/browse/JENKINS-24761

          I see this sporadically on a 1.572 master (RHEL 6.5) and our Windows slaves, Linux and Mac slaves are fine. Was not able to pinpoint how, when or why yet .

          wbauer added a comment - This issue is fairly new and looks similar: https://issues.jenkins-ci.org/browse/JENKINS-24761 I see this sporadically on a 1.572 master (RHEL 6.5) and our Windows slaves, Linux and Mac slaves are fine. Was not able to pinpoint how, when or why yet .

            Unassigned Unassigned
            cbos Cees Bos
            Votes:
            6 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:
              Resolved: