Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-59152

Jenkins fails to properly abort "bat" step

XMLWordPrintable

    • Jenkins 2.199

      1. Windows
      2. Jenkins 2.176.1
      3. Create pipeline:
        node() {
          bat "ping 127.0.0.1 -n 100000"
        }
        
      4. Run pipeline
      5. Abort pipeline
      6. View build log

      Expected: pipeline aborts fast and without any issues

      Actual (reproducibility is less than 100%):

      1. It takes pipeline 20s to abort
      2. Build log contains "Click here to forcibly terminate running steps" and "After 20s process did not stop", indicating that Jenkins has issues with stopping the pipeline
      3. "Click here to forcibly terminate running steps" link is still visible even after the build has finished
      4. Sometimes ping processes are NOT terminated even when build has aborted.

      Issue analysis:

      1. There is a race condition between 2 minute timer in hudson.util.ProcessTree.WindowsOSProcess#killSoftly introduced for JENKINS-17116 by PR#3414 and 20s timer in org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep.Execution#stop. It is possible for DurableTaskStep to pretend that step was cancelled while it fact process is still running. Because of this race condition, it is possible to trick Jenkins into thinking that build has finished while if fact there are still processes running in workspace and potentially locking files there (this happens to us in practice).
      2. org.jvnet.winp.WinProcess#sendCtrlC that is used in hudson.util.ProcessTree.WindowsOSProcess#killSoftly is NOT a proper way to terminate processes. Many apps do not interpret CTRL+C as a shutdown signal. (cmd.exe being the most important one here, because running bat in pipeline involved TWO cmd.exe - one running jenkins-wrapper.bat and second running jenkins-main.bat. Why you're not using TerminateProcess function from WinAPI?
      3. There's a race condition between gathering of process list in hudson.util.ProcessTree.Windows#Windows constructor and killing of the processes, during which build can produce new processes that will not be attempted to be killed.
      4. Usage of JENKINS_NODE_COOKIE to find what processes to kill is unreliable because 1) processes are free to alter their environment 2) CreateProcessA allows to pass custom environment variables 3) It has unpredictable order 4) It doesn't match Jenkins behavior on Linux

            Unassigned Unassigned
            slonopotamusorama Marat Radchenko
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: