Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-51878

Random Remoting issues - ChannelClosed

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      Seeing ChannelClosed or Workspace not available errors seemingly randomly when using ECS containers.

       

      Considerations:

      • Cluster load does not appear to be a factor
      • Jenkins server and cluster exist in the same VPC
      • pingIntervalSeconds using default 300 seconds
      • pingTimeoutSeconds using default 240 seconds
      • Unable to reproduce the issue reliably
      • Not other network issues seen at similar times with nodes in EC2
      • Length of build/task does not appear to be a factor

       

      I know this is a horribly vague bug to resolve, so I'm quite happy to try any reconfiguration or similar suggested.

       

      code

      FATAL: java.nio.channels.ClosedChannelException
      java.nio.channels.ClosedChannelException
      Also: hudson.remoting.Channel$CallSiteStackTrace: Remote call to JNLP4-connect connection from ip-10-1-0-187.ec2.internal/10.1.0.187:43840
      at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1741)
      at hudson.remoting.Request.call(Request.java:202)
      at hudson.remoting.Channel.call(Channel.java:954)
      at hudson.Launcher$RemoteLauncher.kill(Launcher.java:1078)
      at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:510)
      at com.tikal.jenkins.plugins.multijob.MultiJobBuild$MultiJobRunnerImpl.run(MultiJobBuild.java:148)
      at hudson.model.Run.execute(Run.java:1794)
      at com.tikal.jenkins.plugins.multijob.MultiJobBuild.run(MultiJobBuild.java:76)
      at hudson.model.ResourceController.execute(ResourceController.java:97)
      at hudson.model.Executor.run(Executor.java:429)
      Caused: hudson.remoting.RequestAbortedException
      at hudson.remoting.Request.abort(Request.java:340)
      at hudson.remoting.Channel.terminate(Channel.java:1038)
      at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer.onReadClosed(ChannelApplicationLayer.java:209)
      at org.jenkinsci.remoting.protocol.ApplicationLayer.onRecvClosed(ApplicationLayer.java:222)
      at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:832)
      at org.jenkinsci.remoting.protocol.FilterLayer.onRecvClosed(FilterLayer.java:287)
      at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecvClosed(SSLEngineFilterLayer.java:172)
      at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:832)
      at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:154)
      at org.jenkinsci.remoting.protocol.impl.NIONetworkLayer.ready(NIONetworkLayer.java:142)
      at org.jenkinsci.remoting.protocol.IOHub$OnReady.run(IOHub.java:789)
      at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
      at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      at java.lang.Thread.run(Thread.java:748)
      code

      code
      INFO: No logs found
      Jun 11, 2018 11:34:32 PM jenkins.slaves.DefaultJnlpSlaveReceiver channelClosed
      WARNING: IOHub#1: Worker[channel:java.nio.channels.SocketChannel[connected local=/10.1.0.85:7300 remote=ip-10-1-0-187.
      ec2.internal/10.1.0.187:43840]] / Computer.threadPoolForRemoting 1796 for ecs-rdkcmf-12377045296774 terminated
      java.nio.channels.ClosedChannelException
      at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:154)
      at org.jenkinsci.remoting.protocol.impl.NIONetworkLayer.ready(NIONetworkLayer.java:142)
      at org.jenkinsci.remoting.protocol.IOHub$OnReady.run(IOHub.java:789)
      at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
      at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      at java.lang.Thread.run(Thread.java:748)

      Jun 11, 2018 11:34:33 PM jenkins.plugins.slack.StandardSlackService publish
      code

        Attachments

          Activity

          Hide
          emconnors Eva Connors added a comment - - edited

          This happens to us too sometimes – our ECS cluster will downscale while a job is still actively running on the instance, and instead of waiting and draining, it just kills the job. We've been trying to investigate ways to more intelligently downscale, but so far no luck. Does your cluster autoscale based on CPU usage?

          Show
          emconnors Eva Connors added a comment - - edited This happens to us too sometimes – our ECS cluster will downscale while a job is still actively running on the instance, and instead of waiting and draining, it just kills the job. We've been trying to investigate ways to more intelligently downscale, but so far no luck. Does your cluster autoscale based on CPU usage?
          Hide
          evidex David Hayes added a comment -

          Eva Connors, no, it doesn't. We downscale based on memory reservation being 0 (we tend to have over night periods where no jobs are running). I've ruled out termination of the EC2 host instance as a cause in this case, though it would certainly lead to the same effect as you've seen.

          Show
          evidex David Hayes added a comment - Eva Connors , no, it doesn't. We downscale based on memory reservation being 0 (we tend to have over night periods where no jobs are running). I've ruled out termination of the EC2 host instance as a cause in this case, though it would certainly lead to the same effect as you've seen.
          Hide
          stpierre Chris St. Pierre added a comment -

          I'm seeing this as well, although on Fargate.

          Show
          stpierre Chris St. Pierre added a comment - I'm seeing this as well, although on Fargate.

            People

            Assignee:
            roehrijn2 Jan Roehrich
            Reporter:
            evidex David Hayes
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Dates

              Created:
              Updated: