Auto retry for elastic agents after channel closure

XMLWordPrintable

    While my pipeline was running, the node that was executing logic terminated. I see this at the bottom of my console output:

    Cannot contact ip-172-31-242-8.us-west-2.compute.internal: java.io.IOException: remote file operation failed: /ebs/jenkins/workspace/common-pipelines-nodeploy at hudson.remoting.Channel@48503f20:ip-172-31-242-8.us-west-2.compute.internal: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on ip-172-31-242-8.us-west-2.compute.internal failed. The channel is closing down or has closed down
    

    There's a spinning arrow below it.

    I have a cron script that uses the Jenkins master CLI to remove nodes which have stopped responding. When I examine this node's page in my Jenkins website, it looks like the node is still running that job and i see an orange label that says "Feb 22, 2018 5:16:02 PM Node is being removed".

    I'm wondering what would be a better way to say "If the channel closes down, retry the work on another node with the same label?

    Things seem stuck. Please advise.

      1. grub.remoting.logs.zip
        3 kB
      2. grubSystemInformation.html
        67 kB
      3. image-2018-02-22-17-27-31-541.png
        image-2018-02-22-17-27-31-541.png
        56 kB
      4. image-2018-02-22-17-28-03-053.png
        image-2018-02-22-17-28-03-053.png
        30 kB
      5. JavaMelodyGrubHeapDump_4_07_18.pdf
        220 kB
      6. JavaMelodyNodeGrubThreads_4_07_18.pdf
        9 kB
      7. jenkins_agent_devbuild9_remoting_logs.zip
        4 kB
      8. jenkins_Agent_devbuild9_System_Information.html
        66 kB
      9. jenkins_agents_Thread_dump.html
        172 kB
      10. jenkins_support_2018-06-29_01.14.18.zip
        1.26 MB
      11. jenkins.log
        984 kB
      12. jobConsoleOutput.txt
        12 kB
      13. jobConsoleOutput.txt
        12 kB
      14. MonitoringJavaelodyOnNodes.html
        44 kB
      15. NetworkAndMachineStats.png
        NetworkAndMachineStats.png
        224 kB
      16. slaveLogInMaster.grub.zip
        8 kB
      17. support_2018-07-04_07.35.22.zip
        956 kB
      18. threadDump.txt
        98 kB
      19. Thread dump [Jenkins].html
        219 kB

          Assignee:
          Jesse Glick
          Reporter:
          Jon B
          Votes:
          37 Vote for this issue
          Watchers:
          54 Start watching this issue

            Created:
            Updated:
            Resolved: