ProcessLiveness along with PID tracking was introduced as a way to ensure that a sh step would terminate if the controller process died, for example if the computer were rebooted; otherwise the step would just sit there indefinitely waiting for output or an exit status which will never come.

      In practice this code has proven to be a major source of reliability issues. Prior to Java 9 there is no standard API for checking for the existence of a given process, so the code uses JNA. Or tries to, but it has a hard time being sure whether getpgid is actually supported, so it tries to detect that on every new node and cache the answer. Anyway these calls will not work when we are inside withDockerContainer since the container may remap process IDs (the $$ seen from the wrapper script is not necessarily meaningful from the agent JVM), so the code also has to detect decorated Launcher implementations and fall back to a different version based on command-line ps calls, which is not entirely portable, and has also had troubles in responding cleanly to laggy or hung remoting channels.

      Better to throw out this approach and start over. It seems to work to just have the wrapper script itself indicate that it is still alive, for example by touching the log file even when there is no new output. Then the agent JVM need do nothing more exotic than a file timestamp check.

          [JENKINS-47791] Eliminate ProcessLiveness

          Code changed in jenkins
          User: Jesse Glick
          Path:
          pom.xml
          src/main/java/org/jenkinsci/plugins/durabletask/BourneShellScript.java
          src/main/java/org/jenkinsci/plugins/durabletask/FileMonitoringTask.java
          src/main/java/org/jenkinsci/plugins/durabletask/ProcessLiveness.java
          src/test/java/org/jenkinsci/plugins/durabletask/BourneShellScriptTest.java
          src/test/java/org/jenkinsci/plugins/durabletask/CentOSFixture.java
          src/test/resources/org/jenkinsci/plugins/durabletask/CentOSFixture/Dockerfile
          http://jenkins-ci.org/commit/durable-task-plugin/a818ef883ff5200a64ca33f4d700d1ecf17b3211
          Log:
          Merge pull request #49 from jglick/heartbeat

          JENKINS-47791 Remove ProcessLiveness and PID tracking in favor of a simple timestamp check on the log file

          Compare: https://github.com/jenkinsci/durable-task-plugin/compare/edef6839f947...a818ef883ff5

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: pom.xml src/main/java/org/jenkinsci/plugins/durabletask/BourneShellScript.java src/main/java/org/jenkinsci/plugins/durabletask/FileMonitoringTask.java src/main/java/org/jenkinsci/plugins/durabletask/ProcessLiveness.java src/test/java/org/jenkinsci/plugins/durabletask/BourneShellScriptTest.java src/test/java/org/jenkinsci/plugins/durabletask/CentOSFixture.java src/test/resources/org/jenkinsci/plugins/durabletask/CentOSFixture/Dockerfile http://jenkins-ci.org/commit/durable-task-plugin/a818ef883ff5200a64ca33f4d700d1ecf17b3211 Log: Merge pull request #49 from jglick/heartbeat JENKINS-47791 Remove ProcessLiveness and PID tracking in favor of a simple timestamp check on the log file Compare: https://github.com/jenkinsci/durable-task-plugin/compare/edef6839f947...a818ef883ff5

            jglick Jesse Glick
            jglick Jesse Glick
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: