Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-49707

Auto retry for elastic agents after channel closure

      While my pipeline was running, the node that was executing logic terminated. I see this at the bottom of my console output:

      Cannot contact ip-172-31-242-8.us-west-2.compute.internal: java.io.IOException: remote file operation failed: /ebs/jenkins/workspace/common-pipelines-nodeploy at hudson.remoting.Channel@48503f20:ip-172-31-242-8.us-west-2.compute.internal: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on ip-172-31-242-8.us-west-2.compute.internal failed. The channel is closing down or has closed down
      

      There's a spinning arrow below it.

      I have a cron script that uses the Jenkins master CLI to remove nodes which have stopped responding. When I examine this node's page in my Jenkins website, it looks like the node is still running that job and i see an orange label that says "Feb 22, 2018 5:16:02 PM Node is being removed".

      I'm wondering what would be a better way to say "If the channel closes down, retry the work on another node with the same label?

      Things seem stuck. Please advise.

        1. threadDump.txt
          98 kB
        2. jenkins.log
          984 kB
        3. support_2018-07-04_07.35.22.zip
          956 kB
        4. JavaMelodyGrubHeapDump_4_07_18.pdf
          220 kB
        5. NetworkAndMachineStats.png
          NetworkAndMachineStats.png
          224 kB
        6. Thread dump [Jenkins].html
          219 kB
        7. grubSystemInformation.html
          67 kB
        8. slaveLogInMaster.grub.zip
          8 kB
        9. JavaMelodyNodeGrubThreads_4_07_18.pdf
          9 kB
        10. MonitoringJavaelodyOnNodes.html
          44 kB
        11. grub.remoting.logs.zip
          3 kB
        12. jobConsoleOutput.txt
          12 kB
        13. jobConsoleOutput.txt
          12 kB
        14. jenkins_support_2018-06-29_01.14.18.zip
          1.26 MB
        15. jenkins_agents_Thread_dump.html
          172 kB
        16. jenkins_Agent_devbuild9_System_Information.html
          66 kB
        17. jenkins_agent_devbuild9_remoting_logs.zip
          4 kB
        18. image-2018-02-22-17-28-03-053.png
          image-2018-02-22-17-28-03-053.png
          30 kB
        19. image-2018-02-22-17-27-31-541.png
          image-2018-02-22-17-27-31-541.png
          56 kB

          [JENKINS-49707] Auto retry for elastic agents after channel closure

          Jon B created issue -
          Jon B made changes -
          Description Original: While my pipeline was running, the node that was executing logic terminated. I see this at the bottom of my console output:

          ```

          Cannot contact ip-172-31-242-8.us-west-2.compute.internal: java.io.IOException: remote file operation failed: /ebs/jenkins/workspace/common-pipelines-nodeploy at hudson.remoting.Channel@48503f20:ip-172-31-242-8.us-west-2.compute.internal: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on ip-172-31-242-8.us-west-2.compute.internal failed. The channel is closing down or has closed down

          ```

          There's a spinning arrow below it.

          I have a cron script that uses the Jenkins master CLI to remove nodes which have stopped responding. When I examine this node's page in my Jenkins website, it looks like the node is still running that job and i see an orange label that says "Feb 22, 2018 5:16:02 PM Node is being removed".

          I'm wondering what would be a better way to say "If the channel closes down, retry the work on another node with the same label?

           

          Please advise.

          !image-2018-02-22-17-27-31-541.png!!image-2018-02-22-17-28-03-053.png!
          New: While my pipeline was running, the node that was executing logic terminated. I see this at the bottom of my console output:

           
          {code:java}
          Cannot contact ip-172-31-242-8.us-west-2.compute.internal: java.io.IOException: remote file operation failed: /ebs/jenkins/workspace/common-pipelines-nodeploy at hudson.remoting.Channel@48503f20:ip-172-31-242-8.us-west-2.compute.internal: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on ip-172-31-242-8.us-west-2.compute.internal failed. The channel is closing down or has closed down
          {code}
           

          There's a spinning arrow below it.

          I have a cron script that uses the Jenkins master CLI to remove nodes which have stopped responding. When I examine this node's page in my Jenkins website, it looks like the node is still running that job and i see an orange label that says "Feb 22, 2018 5:16:02 PM Node is being removed".

          I'm wondering what would be a better way to say "If the channel closes down, retry the work on another node with the same label?

           

          Please advise.

          !image-2018-02-22-17-27-31-541.png!!image-2018-02-22-17-28-03-053.png!
          Jon B made changes -
          Description Original: While my pipeline was running, the node that was executing logic terminated. I see this at the bottom of my console output:

           
          {code:java}
          Cannot contact ip-172-31-242-8.us-west-2.compute.internal: java.io.IOException: remote file operation failed: /ebs/jenkins/workspace/common-pipelines-nodeploy at hudson.remoting.Channel@48503f20:ip-172-31-242-8.us-west-2.compute.internal: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on ip-172-31-242-8.us-west-2.compute.internal failed. The channel is closing down or has closed down
          {code}
           

          There's a spinning arrow below it.

          I have a cron script that uses the Jenkins master CLI to remove nodes which have stopped responding. When I examine this node's page in my Jenkins website, it looks like the node is still running that job and i see an orange label that says "Feb 22, 2018 5:16:02 PM Node is being removed".

          I'm wondering what would be a better way to say "If the channel closes down, retry the work on another node with the same label?

           

          Please advise.

          !image-2018-02-22-17-27-31-541.png!!image-2018-02-22-17-28-03-053.png!
          New: While my pipeline was running, the node that was executing logic terminated. I see this at the bottom of my console output:
          {code:java}
          Cannot contact ip-172-31-242-8.us-west-2.compute.internal: java.io.IOException: remote file operation failed: /ebs/jenkins/workspace/common-pipelines-nodeploy at hudson.remoting.Channel@48503f20:ip-172-31-242-8.us-west-2.compute.internal: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on ip-172-31-242-8.us-west-2.compute.internal failed. The channel is closing down or has closed down
          {code}
          There's a spinning arrow below it.

          I have a cron script that uses the Jenkins master CLI to remove nodes which have stopped responding. When I examine this node's page in my Jenkins website, it looks like the node is still running that job and i see an orange label that says "Feb 22, 2018 5:16:02 PM Node is being removed".

          I'm wondering what would be a better way to say "If the channel closes down, retry the work on another node with the same label?

          Things seem stuck. Please advise.

          !image-2018-02-22-17-27-31-541.png!!image-2018-02-22-17-28-03-053.png!

          Oleg Nenashev added a comment -

          Not sure what is the request here. You get a system message from Remoting, it's not related to Pipeline or Jobs at all in general. If you want to implement retry features or document the suggestions, IMHO it is on the Pipeline side

          Oleg Nenashev added a comment - Not sure what is the request here. You get a system message from Remoting, it's not related to Pipeline or Jobs at all in general. If you want to implement retry features or document the suggestions, IMHO it is on the Pipeline side
          Oleg Nenashev made changes -
          Component/s New: pipeline [ 21692 ]
          Component/s Original: core [ 15593 ]
          Andrew Bayer made changes -
          Component/s New: workflow-durable-task-step-plugin [ 21715 ]
          Component/s Original: pipeline [ 21692 ]

          Jon B added a comment -

          oleg_nenashev I'm not sure how to handle this situation. The problem I need to overcome is that my pipeline hangs with the error message I have screenshotted above. I would much prefer that it errors out and fails. Unfortunately, the pipeline keeps running indefinately.

          Can this instead be configured to throw a catchable exception?

          Jon B added a comment - oleg_nenashev I'm not sure how to handle this situation. The problem I need to overcome is that my pipeline hangs with the error message I have screenshotted above. I would much prefer that it errors out and fails. Unfortunately, the pipeline keeps running indefinately. Can this instead be configured to throw a catchable exception?
          Jon B made changes -
          Summary Original: Pipeline stuck: "The channel is closing down or has closed down" New: Pipeline hangs: "The channel is closing down or has closed down"

          Jon B added a comment -

          oleg_nenashev Should this be redesignated a remoting bug? I'm not sure how to unblock my pipelines that are hanging from this issue.

          Jon B added a comment - oleg_nenashev Should this be redesignated a remoting bug? I'm not sure how to unblock my pipelines that are hanging from this issue.
          Jon B made changes -
          Component/s New: remoting [ 15489 ]
          Component/s Original: workflow-durable-task-step-plugin [ 21715 ]

            jglick Jesse Glick
            piratejohnny Jon B
            Votes:
            37 Vote for this issue
            Watchers:
            54 Start watching this issue

              Created:
              Updated:
              Resolved: