Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-48485

Aborting a Job running on Windows terminates the process immediately with no chance to run build clean up code, thus leaves build related lock files hanging at slave.

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • core
    • None

      I have Jenkins setup with many build and test jobs running on Windows and Linux.

      I am noticing issue with aborting/cancelling jobs running on Windows hosts. It terminates the job, but leaves lock files at various places depending on the build phase the job got aborted, thus impacts further jobs landing in the workspace.

      On linux it works fine as I have a build wrapper that detects SIGTERM signal received on abort and terminates the build gracefully by clearing all the locks etc.

      But unable to do such thing on Windows.

      I came to know from https://wiki.jenkins.io/display/JENKINS/Aborting+a+build that on Linux, job is aborted through java.lang.UnixProcess.destroyProcess, which sendsSIGTERM on Sun's JREs, while on Windows, this is done through TerminateProcess API.

      If a process is terminated by TerminateProcess, all threads of the process are terminated immediately with no chance to run additional code. This means that the thread does not execute code in termination handler blocks. In addition, no attached DLLs are notified that the process is detaching. (source: https://msdn.microsoft.com/en-us/library/windows/desktop/ms686722(v=vs.85).aspx)

      From above, it looks like on Windows Jenkins is using a job termination process that is inefficient and can't be handled gracefully by executing build process on Windows host.

      Can we have a similar procedure for Windows Job termination as it's handled in case of Linux job termination?

       

          [JENKINS-48485] Aborting a Job running on Windows terminates the process immediately with no chance to run build clean up code, thus leaves build related lock files hanging at slave.

          Oleg Nenashev added a comment -

          Please provide...

          1) Your version of Jenkins
          2) Version of Remoting you use on the agent
          3) Version of Java on the agent (which version? 32 or 64 bit?)
          4) Output of http://file-leak-detector.kohsuke.org/

          Oleg Nenashev added a comment - Please provide... 1) Your version of Jenkins 2) Version of Remoting you use on the agent 3) Version of Java on the agent (which version? 32 or 64 bit?) 4) Output of http://file-leak-detector.kohsuke.org/

          Here's the information required:

          1) Your version of Jenkins -   2.60.3

          2) Version of Remoting you use on the agent - 3.7

          3) Version of Java on the agent (which version? 32 or 64 bit?) - 1.8.0_144, some agents running 64 bit java and some running 32 bit.

          4) Output of http://file-leak-detector.kohsuke.org/

          Sharad Upadhyaya added a comment - Here's the information required: 1) Your version of Jenkins -    2.60.3 2) Version of Remoting you use on the agent - 3.7 3) Version of Java on the agent (which version? 32 or 64 bit?) - 1.8.0_144 , some agents running 64 bit java and some running 32 bit. 4) Output of http://file-leak-detector.kohsuke.org/

          Oleg Nenashev added a comment -

          If you are running a 32-bit Java on a 64-bit system, Jenkins won't be able to abort processes correctly: https://github.com/kohsuke/winp#platform-support . Not sure whether it leads to file leaks, I need the File Leak detector output to say for sure

          Oleg Nenashev added a comment - If you are running a 32-bit Java on a 64-bit system, Jenkins won't be able to abort processes correctly: https://github.com/kohsuke/winp#platform-support . Not sure whether it leads to file leaks, I need the File Leak detector output to say for sure

          I have seen this issue on both Slaves running 64 bit and 32 bit. 

          We have code in our build scripts to handle the termination signal and process the build cleanup as our build uses lock files at multiple phases, also mounts temporary drives to shorten build path on Windows.

          On Windows agents, seems like the process is getting terminated immediately using TerminateProcess API with no chance for process to execute any build clean up code, thus leaving the lock files and mounted drives hanging.

          Working fine on Linux agents as build process detects SIGTERM signal received on abort and executes build clean up code.

          Here I think the issue is how job is getting terminated by Jenkins on Windows agents which is different from the way it's getting executed on Linux.

          Sharad Upadhyaya added a comment - I have seen this issue on both Slaves running 64 bit and 32 bit.  We have code in our build scripts to handle the termination signal and process the build cleanup as our build uses lock files at multiple phases, also mounts temporary drives to shorten build path on Windows. On Windows agents, seems like the process is getting terminated immediately using TerminateProcess  API with no chance for process to execute any build clean up code, thus leaving the lock files and mounted drives hanging. Working fine on Linux agents as build process detects SIGTERM signal received on abort and executes build clean up code. Here I think the issue is how job is getting terminated by Jenkins on Windows agents which is different from the way it's getting executed on Linux.

          Oleg Nenashev added a comment -

          Yes, Windows process termination flow is different. There are known issues like JENKINS-19156 which do not allow doing graceful process tree termination like in Unix.

          Switching to ExitProcess in https://github.com/kohsuke/winp/blob/4bdec1e8d28d4f5fcf2cf309074284eef1813736/native/winp.cpp#L35 could be reasonable, but I feel that this approach is not sufficient. IMHO a more complex logic is required to ensure that the operation does not hang, etc.

          Oleg Nenashev added a comment - Yes, Windows process termination flow is different. There are known issues like JENKINS-19156 which do not allow doing graceful process tree termination like in Unix. Switching to ExitProcess in https://github.com/kohsuke/winp/blob/4bdec1e8d28d4f5fcf2cf309074284eef1813736/native/winp.cpp#L35 could be reasonable, but I feel that this approach is not sufficient. IMHO a more complex logic is required to ensure that the operation does not hang, etc.

            Unassigned Unassigned
            shaupa01 Sharad Upadhyaya
            Votes:
            2 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: