• Jenkins 2.199

      1. Windows
      2. Jenkins 2.176.1
      3. Create pipeline:
        node() {
          bat "ping 127.0.0.1 -n 100000"
        }
        
      4. Run pipeline
      5. Abort pipeline
      6. View build log

      Expected: pipeline aborts fast and without any issues

      Actual (reproducibility is less than 100%):

      1. It takes pipeline 20s to abort
      2. Build log contains "Click here to forcibly terminate running steps" and "After 20s process did not stop", indicating that Jenkins has issues with stopping the pipeline
      3. "Click here to forcibly terminate running steps" link is still visible even after the build has finished
      4. Sometimes ping processes are NOT terminated even when build has aborted.

      Issue analysis:

      1. There is a race condition between 2 minute timer in hudson.util.ProcessTree.WindowsOSProcess#killSoftly introduced for JENKINS-17116 by PR#3414 and 20s timer in org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep.Execution#stop. It is possible for DurableTaskStep to pretend that step was cancelled while it fact process is still running. Because of this race condition, it is possible to trick Jenkins into thinking that build has finished while if fact there are still processes running in workspace and potentially locking files there (this happens to us in practice).
      2. org.jvnet.winp.WinProcess#sendCtrlC that is used in hudson.util.ProcessTree.WindowsOSProcess#killSoftly is NOT a proper way to terminate processes. Many apps do not interpret CTRL+C as a shutdown signal. (cmd.exe being the most important one here, because running bat in pipeline involved TWO cmd.exe - one running jenkins-wrapper.bat and second running jenkins-main.bat. Why you're not using TerminateProcess function from WinAPI?
      3. There's a race condition between gathering of process list in hudson.util.ProcessTree.Windows#Windows constructor and killing of the processes, during which build can produce new processes that will not be attempted to be killed.
      4. Usage of JENKINS_NODE_COOKIE to find what processes to kill is unreliable because 1) processes are free to alter their environment 2) CreateProcessA allows to pass custom environment variables 3) It has unpredictable order 4) It doesn't match Jenkins behavior on Linux

          [JENKINS-59152] Jenkins fails to properly abort "bat" step

          All that I described in this issue can be reproduced by running org.jenkinsci.plugins.workflow.steps.durable_task.ShellStepTest#abort test on Windows. Sometimes it quickly passes. Sometimes it idles with 20s timeout. Sometimes it fails to kill ping process.

          Marat Radchenko added a comment - All that I described in this issue can be reproduced by running org.jenkinsci.plugins.workflow.steps.durable_task.ShellStepTest#abort test on Windows. Sometimes it quickly passes. Sometimes it idles with 20s timeout. Sometimes it fails to kill ping process.

          See comments to PR#4216 for additional technical analysis.

          Marat Radchenko added a comment - See comments to PR#4216 for additional technical analysis.

          Oleg Nenashev added a comment -

          The fix was released in Jenkins 2.199

          Oleg Nenashev added a comment - The fix was released in Jenkins 2.199

          I do not agree that PR#4225 fully fixed this issue. Race conditions between multiple timers are still there. Shortening of softkill timeout makes issue less often but still possible.

          Marat Radchenko added a comment - I do not agree that PR#4225 fully fixed this issue. Race conditions between multiple timers are still there. Shortening of softkill timeout makes issue less often but still possible.

          Given the fix is disputed (and far from trivial, IMO), I am postponing the backport to 2.190.3 at least.

          Oliver Gondža added a comment - Given the fix is disputed (and far from trivial, IMO), I am postponing the backport to 2.190.3 at least.

            Unassigned Unassigned
            slonopotamusorama Marat Radchenko
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: