Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-49707

Auto retry for elastic agents after channel closure

      While my pipeline was running, the node that was executing logic terminated. I see this at the bottom of my console output:

      Cannot contact ip-172-31-242-8.us-west-2.compute.internal: java.io.IOException: remote file operation failed: /ebs/jenkins/workspace/common-pipelines-nodeploy at hudson.remoting.Channel@48503f20:ip-172-31-242-8.us-west-2.compute.internal: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on ip-172-31-242-8.us-west-2.compute.internal failed. The channel is closing down or has closed down
      

      There's a spinning arrow below it.

      I have a cron script that uses the Jenkins master CLI to remove nodes which have stopped responding. When I examine this node's page in my Jenkins website, it looks like the node is still running that job and i see an orange label that says "Feb 22, 2018 5:16:02 PM Node is being removed".

      I'm wondering what would be a better way to say "If the channel closes down, retry the work on another node with the same label?

      Things seem stuck. Please advise.

        1. threadDump.txt
          98 kB
        2. jenkins.log
          984 kB
        3. support_2018-07-04_07.35.22.zip
          956 kB
        4. JavaMelodyGrubHeapDump_4_07_18.pdf
          220 kB
        5. NetworkAndMachineStats.png
          NetworkAndMachineStats.png
          224 kB
        6. Thread dump [Jenkins].html
          219 kB
        7. grubSystemInformation.html
          67 kB
        8. slaveLogInMaster.grub.zip
          8 kB
        9. JavaMelodyNodeGrubThreads_4_07_18.pdf
          9 kB
        10. MonitoringJavaelodyOnNodes.html
          44 kB
        11. grub.remoting.logs.zip
          3 kB
        12. jobConsoleOutput.txt
          12 kB
        13. jobConsoleOutput.txt
          12 kB
        14. jenkins_support_2018-06-29_01.14.18.zip
          1.26 MB
        15. jenkins_agents_Thread_dump.html
          172 kB
        16. jenkins_Agent_devbuild9_System_Information.html
          66 kB
        17. jenkins_agent_devbuild9_remoting_logs.zip
          4 kB
        18. image-2018-02-22-17-28-03-053.png
          image-2018-02-22-17-28-03-053.png
          30 kB
        19. image-2018-02-22-17-27-31-541.png
          image-2018-02-22-17-27-31-541.png
          56 kB

          [JENKINS-49707] Auto retry for elastic agents after channel closure

          Jon B created issue -
          Jon B made changes -
          Description Original: While my pipeline was running, the node that was executing logic terminated. I see this at the bottom of my console output:

          ```

          Cannot contact ip-172-31-242-8.us-west-2.compute.internal: java.io.IOException: remote file operation failed: /ebs/jenkins/workspace/common-pipelines-nodeploy at hudson.remoting.Channel@48503f20:ip-172-31-242-8.us-west-2.compute.internal: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on ip-172-31-242-8.us-west-2.compute.internal failed. The channel is closing down or has closed down

          ```

          There's a spinning arrow below it.

          I have a cron script that uses the Jenkins master CLI to remove nodes which have stopped responding. When I examine this node's page in my Jenkins website, it looks like the node is still running that job and i see an orange label that says "Feb 22, 2018 5:16:02 PM Node is being removed".

          I'm wondering what would be a better way to say "If the channel closes down, retry the work on another node with the same label?

           

          Please advise.

          !image-2018-02-22-17-27-31-541.png!!image-2018-02-22-17-28-03-053.png!
          New: While my pipeline was running, the node that was executing logic terminated. I see this at the bottom of my console output:

           
          {code:java}
          Cannot contact ip-172-31-242-8.us-west-2.compute.internal: java.io.IOException: remote file operation failed: /ebs/jenkins/workspace/common-pipelines-nodeploy at hudson.remoting.Channel@48503f20:ip-172-31-242-8.us-west-2.compute.internal: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on ip-172-31-242-8.us-west-2.compute.internal failed. The channel is closing down or has closed down
          {code}
           

          There's a spinning arrow below it.

          I have a cron script that uses the Jenkins master CLI to remove nodes which have stopped responding. When I examine this node's page in my Jenkins website, it looks like the node is still running that job and i see an orange label that says "Feb 22, 2018 5:16:02 PM Node is being removed".

          I'm wondering what would be a better way to say "If the channel closes down, retry the work on another node with the same label?

           

          Please advise.

          !image-2018-02-22-17-27-31-541.png!!image-2018-02-22-17-28-03-053.png!
          Jon B made changes -
          Description Original: While my pipeline was running, the node that was executing logic terminated. I see this at the bottom of my console output:

           
          {code:java}
          Cannot contact ip-172-31-242-8.us-west-2.compute.internal: java.io.IOException: remote file operation failed: /ebs/jenkins/workspace/common-pipelines-nodeploy at hudson.remoting.Channel@48503f20:ip-172-31-242-8.us-west-2.compute.internal: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on ip-172-31-242-8.us-west-2.compute.internal failed. The channel is closing down or has closed down
          {code}
           

          There's a spinning arrow below it.

          I have a cron script that uses the Jenkins master CLI to remove nodes which have stopped responding. When I examine this node's page in my Jenkins website, it looks like the node is still running that job and i see an orange label that says "Feb 22, 2018 5:16:02 PM Node is being removed".

          I'm wondering what would be a better way to say "If the channel closes down, retry the work on another node with the same label?

           

          Please advise.

          !image-2018-02-22-17-27-31-541.png!!image-2018-02-22-17-28-03-053.png!
          New: While my pipeline was running, the node that was executing logic terminated. I see this at the bottom of my console output:
          {code:java}
          Cannot contact ip-172-31-242-8.us-west-2.compute.internal: java.io.IOException: remote file operation failed: /ebs/jenkins/workspace/common-pipelines-nodeploy at hudson.remoting.Channel@48503f20:ip-172-31-242-8.us-west-2.compute.internal: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on ip-172-31-242-8.us-west-2.compute.internal failed. The channel is closing down or has closed down
          {code}
          There's a spinning arrow below it.

          I have a cron script that uses the Jenkins master CLI to remove nodes which have stopped responding. When I examine this node's page in my Jenkins website, it looks like the node is still running that job and i see an orange label that says "Feb 22, 2018 5:16:02 PM Node is being removed".

          I'm wondering what would be a better way to say "If the channel closes down, retry the work on another node with the same label?

          Things seem stuck. Please advise.

          !image-2018-02-22-17-27-31-541.png!!image-2018-02-22-17-28-03-053.png!
          Oleg Nenashev made changes -
          Component/s New: pipeline [ 21692 ]
          Component/s Original: core [ 15593 ]
          Andrew Bayer made changes -
          Component/s New: workflow-durable-task-step-plugin [ 21715 ]
          Component/s Original: pipeline [ 21692 ]
          Jon B made changes -
          Summary Original: Pipeline stuck: "The channel is closing down or has closed down" New: Pipeline hangs: "The channel is closing down or has closed down"
          Jon B made changes -
          Component/s New: remoting [ 15489 ]
          Component/s Original: workflow-durable-task-step-plugin [ 21715 ]
          Oleg Nenashev made changes -
          Component/s New: _unsorted [ 19622 ]
          Alex Slaughter made changes -
          Priority Original: Minor [ 4 ] New: Major [ 3 ]
          Federico Naum made changes -
          Assignee New: Federico Naum [ fnaum ]
          Federico Naum made changes -
          Attachment New: jenkins_agent_devbuild9_remoting_logs.zip [ 43244 ]
          Attachment New: jenkins_agents_Thread_dump.html [ 43245 ]
          Attachment New: jenkins_Agent_devbuild9_System_Information.html [ 43246 ]
          Attachment New: jenkins_support_2018-06-29_01.14.18.zip [ 43247 ]

            jglick Jesse Glick
            piratejohnny Jon B
            Votes:
            37 Vote for this issue
            Watchers:
            54 Start watching this issue

              Created:
              Updated:
              Resolved: