Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-56673

Better handling of ChannelClosedException in Declarative pipeline

    • Icon: Improvement Improvement
    • Resolution: Duplicate
    • Icon: Minor Minor
    • kubernetes-plugin
    • None
    • Jenkins: 2.150.2, k8s plugin version: 1.14.3

      When pods get deleted for any reason,  there is a log/exception like so:

      hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on JNLP4-connect connection from .... failed. The channel is closing down or has closed down 

      The job then appears to hang indefinitely until a timeout is reached or it's stopped manually.

      In our use case (k8s using preemptible vms) we actually expect pods to be deleted mid build and want to be able to handle pod deletion with a retry.

      I have not been able to find a way to handle this in declarative syntax.

      For testing, using a very simple declarative example:

          stages {
              stage('Try test') {
                  steps {
                      container('jnlp') {
                          sh """
                          echo Kill the pod now
                          sleep 5m
                          """
                      }
                  }
                  post {
                      failure {
                          echo "Failuuure"
                      }
                  }
              }

      But the exception does not actually trigger the failure block when the pod is killed.

      Is there currently any best practice to handle the deletion of a pod? Are there any timeout parameters that would be useful in this case?

      I'm happy to add a PR to the Readme after learning

          [JENKINS-56673] Better handling of ChannelClosedException in Declarative pipeline

          I think this is just another version of JENKINS-55392
          You can't catch these exceptions are they are underlying infra issues

          Carlos Sanchez added a comment - I think this is just another version of JENKINS-55392 You can't catch these exceptions are they are underlying infra issues

          csanchez
          I don't think this is the same case.
          I've opened a ticket with the same problem (sorry).
          What I've tracked is when my pods are killed (by OOMKiller, usually) the job doesn't get aborted, it hangs indefinitely. The node label gets an "offline" and node log displays the OP message.
          If I click the abort button, the job is aborted immediately. So, why this don't occur when the node is detected to be offline?

          Bruno Meneguello added a comment - csanchez I don't think this is the same case. I've opened a ticket with the same problem (sorry). What I've tracked is when my pods are killed (by OOMKiller, usually) the job doesn't get aborted, it hangs indefinitely. The node label gets an "offline" and node log displays the OP message. If I click the abort button, the job is aborted immediately. So, why this don't occur when the node is detected to be offline?

          Jesse Glick added a comment -

          I believe the patch for JENKINS-49707 addresses this.

          Jesse Glick added a comment - I believe the patch for JENKINS-49707 addresses this.

            Unassigned Unassigned
            cfebs Collin Lefeber
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: