Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-39072

timeout step should include more logging/diagnostics information

      While trying to diagnose the failure of the timeout step to terminate a process (see process tree, the Pipeline was awaiting completion of pid 1031) with jglick we determined that the timeout step doesn't include sufficient information in the logs of the Pipeline Thread Dump to be terrifically useful when things go awry.

      See the following chat log:

      16:39 < jglick> rtyler: so that was a normal termination via `SIGTERM` (143 = 128 + 15; cf. `kill -l`). Not sure why `timeout` did not do it. No Jenkins restart that I can see. Could try to add more status information to `timeout` indicating in virtual thread dump (a) whether it ever delivered a cancellation, (b) whether its scheduled task is still there.
      16:40 <@rtyler> jglick: should adding more verbiage to timeout be among the tickets I should file/
      16:40 < jglick> rtyler: that would be in `workflow-basic-steps-plugin`; `sh` is in `workflow-durable-task-step-plugin`
      16:41 < jglick> rtyler: verbiage in the log, but also information in virtual thread dump (currently it does not override `getStatus`)
      

      The more information the timeout step can include in both the Console Output and the Thread Dump, the better.

      Potentially related to JENKINS-34637

          [JENKINS-39072] timeout step should include more logging/diagnostics information

          R. Tyler Croy created issue -
          Jesse Glick made changes -
          Link New: This issue relates to JENKINS-34637 [ JENKINS-34637 ]
          Jesse Glick made changes -
          Component/s New: workflow-basic-steps-plugin [ 21712 ]
          Component/s Original: pipeline [ 21692 ]

          Jesse Glick added a comment -

          As part of JENKINS-34637 I am adding a message to the log, but something in getStatus is also in order.

          And I suppose if canceling the body does not work for whatever reason, it should set a (say) 1m timer and then declare itself dead so the rest of the build will abort.

          Independently, it might be appropriate for DurableTaskStep.Execution.stop to forcibly end after a (say) 10s grace period.

          Jesse Glick added a comment - As part of JENKINS-34637 I am adding a message to the log, but something in getStatus is also in order. And I suppose if canceling the body does not work for whatever reason, it should set a (say) 1m timer and then declare itself dead so the rest of the build will abort. Independently, it might be appropriate for DurableTaskStep.Execution.stop to forcibly end after a (say) 10s grace period.
          Jesse Glick made changes -
          Labels New: diagnostics robustness
          Jesse Glick made changes -
          Priority Original: Minor [ 4 ] New: Major [ 3 ]

          Code changed in jenkins
          User: Jesse Glick
          Path:
          src/main/java/org/jenkinsci/plugins/workflow/steps/TimeoutStepExecution.java
          http://jenkins-ci.org/commit/workflow-basic-steps-plugin/f36a5c382fbdfc7bc2b0d98430f34ace03375b40
          Log:
          JENKINS-39072 Print log messages about what is happening with timeouts.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: src/main/java/org/jenkinsci/plugins/workflow/steps/TimeoutStepExecution.java http://jenkins-ci.org/commit/workflow-basic-steps-plugin/f36a5c382fbdfc7bc2b0d98430f34ace03375b40 Log: JENKINS-39072 Print log messages about what is happening with timeouts.
          Jesse Glick made changes -
          Assignee Original: CloudBees Inc. [ cloudbees ] New: Jesse Glick [ jglick ]
          Jesse Glick made changes -
          Status Original: Open [ 1 ] New: In Progress [ 3 ]
          Jesse Glick made changes -
          Link New: This issue is related to JENKINS-25504 [ JENKINS-25504 ]

            jglick Jesse Glick
            rtyler R. Tyler Croy
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: