Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-38807

Jenkins 2.7.4 seems to leave behind Java processes (on Windows agent) if the build is aborted/agent loses connection

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • core
    • Jenkins 2.7.4, Windows 7-10, Java 8

      We have a build step that runs a TestNG suite, with the command looking something like this:

      java -jar -Done-jar.main.class=org.testng.TestNG the-jar.jar TheTest.xml
      

      If the process is aborted in any way (manual intervention, Jenkins build timeout, etc.) OR if the agent loses connection from the master long enough to fail the build, then there is a Java process left behind.

      This is particularly damaging to us, as we load a DLL in the Java process, locking the file handle. If we attempt the job again, we cannot load the DLL again, meaning that all future builds will fail without manual intervention (killing the leftover process manually).

      It is possible to reproduce with ANY java process executed on the Windows agent.

      This bug seems similar to JENKINS-26048, but I did not understand from the title/description if it was the same problem or similar symptoms. Feel free to close as duplicate if it is.

          [JENKINS-38807] Jenkins 2.7.4 seems to leave behind Java processes (on Windows agent) if the build is aborted/agent loses connection

          Greg Fraley created issue -
          Greg Fraley made changes -
          Link New: This issue is related to JENKINS-26048 [ JENKINS-26048 ]

          Oleg Nenashev added a comment -

          It does. Jenkins has a complex process termination logic (ProcessKiller, ProcessKillingVeto extension points), which require a connection to master in order to be invoked properly. From a user perspective I agree it's a serious UX bug

          Oleg Nenashev added a comment - It does. Jenkins has a complex process termination logic (ProcessKiller, ProcessKillingVeto extension points), which require a connection to master in order to be invoked properly. From a user perspective I agree it's a serious UX bug
          Oleg Nenashev made changes -
          Labels Original: regression New: regression ux
          Oleg Nenashev made changes -
          Labels Original: regression ux New: ux

          Oleg Nenashev added a comment -

          Not a regression

          Oleg Nenashev added a comment - Not a regression

          mark mann added a comment -

          Jenkins master is on 2.32.1
          Master and slaves running Win2012

          The symptoms sound very familiar to a problem where we've had a jenkins slave up.... then we reboot the windows server (slave).
          When the server returns and the slave is automatically started, it hangs around for about 30secs then terminates connection which kills our job.

          We've also witnessed the hosting windows service winsw 1.17 (which auto upgrades to 1.18) bombs out but leaves the java process running.
          The java process is still keeping the slave active to the master for an indiscriminate amount of time (anywhere between 20secs to 2hrs) before eventually dying of its own accord, with no fresh jobs sent or interaction with the windows service.

          mark mann added a comment - Jenkins master is on 2.32.1 Master and slaves running Win2012 The symptoms sound very familiar to a problem where we've had a jenkins slave up.... then we reboot the windows server (slave). When the server returns and the slave is automatically started, it hangs around for about 30secs then terminates connection which kills our job. We've also witnessed the hosting windows service winsw 1.17 (which auto upgrades to 1.18) bombs out but leaves the java process running. The java process is still keeping the slave active to the master for an indiscriminate amount of time (anywhere between 20secs to 2hrs) before eventually dying of its own accord, with no fresh jobs sent or interaction with the windows service.

          Oleg Nenashev added a comment -

          markjmanning Sounds like a different issue to me. Please file it and attach logs from both master and the slave for the moment of failure. Windows event log would be also useful. And CC me in the ticket. Both remoting and WinSW are supposed to be maintained by me now, so seems I am a person, who has to triangulate it

          Oleg Nenashev added a comment - markjmanning Sounds like a different issue to me. Please file it and attach logs from both master and the slave for the moment of failure. Windows event log would be also useful. And CC me in the ticket. Both remoting and WinSW are supposed to be maintained by me now, so seems I am a person, who has to triangulate it

          mark mann added a comment -

          I have just noticed that WinSW is now on 2.0.1
          I will upgrade and see if the problem still exists.. if it does, I will raise a separate bug
          thx!

          mark mann added a comment - I have just noticed that WinSW is now on 2.0.1 I will upgrade and see if the problem still exists.. if it does, I will raise a separate bug thx!

          Oleg Nenashev added a comment -

          krogan any updates?

          Oleg Nenashev added a comment - krogan any updates?

            Unassigned Unassigned
            gsfraley Greg Fraley
            Votes:
            2 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: