-
Improvement
-
Resolution: Fixed
-
Major
ProcessLiveness along with PID tracking was introduced as a way to ensure that a sh step would terminate if the controller process died, for example if the computer were rebooted; otherwise the step would just sit there indefinitely waiting for output or an exit status which will never come.
In practice this code has proven to be a major source of reliability issues. Prior to Java 9 there is no standard API for checking for the existence of a given process, so the code uses JNA. Or tries to, but it has a hard time being sure whether getpgid is actually supported, so it tries to detect that on every new node and cache the answer. Anyway these calls will not work when we are inside withDockerContainer since the container may remap process IDs (the $$ seen from the wrapper script is not necessarily meaningful from the agent JVM), so the code also has to detect decorated Launcher implementations and fall back to a different version based on command-line ps calls, which is not entirely portable, and has also had troubles in responding cleanly to laggy or hung remoting channels.
Better to throw out this approach and start over. It seems to work to just have the wrapper script itself indicate that it is still alive, for example by touching the log file even when there is no new output. Then the agent JVM need do nothing more exotic than a file timestamp check.
- is blocking
-
JENKINS-47822 docker pipeline finish beforehand when tcp socket is used
- Closed
- is duplicated by
-
JENKINS-35370 Workflow shell step ERROR: script returned exit code -1
- Reopened
-
JENKINS-39307 pipeline docker execution aborts without reason
- In Review
-
JENKINS-38682 Pipeline plugin sh script returned exit code -1 on windows node after a short while
- Resolved
-
JENKINS-37720 Virtual thread dump hangs waiting for ProcessLiveness
- Resolved
-
JENKINS-42166 ProcessLiveness.workingLaunchers heuristic is flaky
- Resolved
-
JENKINS-42405 “Could not initialize class ProcessLiveness$LibC” running sh on Windows
- Resolved
-
JENKINS-46651 container step "script returned exit code -1"
- Resolved
- relates to
-
JENKINS-25503 Use setsid instead of nohup
- Resolved
-
JENKINS-45294 Remoting should reject RPCRequests over the closing channel
- Resolved
-
JENKINS-44785 Add Built-in Request timeout support in Remoting
- In Review
-
JENKINS-48300 Pipeline shell step aborts prematurely with ERROR: script returned exit code -1
- Resolved
-
JENKINS-50892 Pipeline jobs stuck after restart
- Closed
- links to
- mentioned in
-
Page Loading...