[JENKINS-47791] Eliminate ProcessLiveness - Jenkins Jira

Type: Improvement
Resolution: Fixed
Priority: Major
Component/s: durable-task-plugin
Labels:
- robustness

Similar Issues:
Powered by SuggestiMate

Show

ProcessLiveness along with PID tracking was introduced as a way to ensure that a sh step would terminate if the controller process died, for example if the computer were rebooted; otherwise the step would just sit there indefinitely waiting for output or an exit status which will never come.

In practice this code has proven to be a major source of reliability issues. Prior to Java 9 there is no standard API for checking for the existence of a given process, so the code uses JNA. Or tries to, but it has a hard time being sure whether getpgid is actually supported, so it tries to detect that on every new node and cache the answer. Anyway these calls will not work when we are inside withDockerContainer since the container may remap process IDs (the $$ seen from the wrapper script is not necessarily meaningful from the agent JVM), so the code also has to detect decorated Launcher implementations and fall back to a different version based on command-line ps calls, which is not entirely portable, and has also had troubles in responding cleanly to laggy or hung remoting channels.

Better to throw out this approach and start over. It seems to work to just have the wrapper script itself indicate that it is still alive, for example by touching the log file even when there is no new output. Then the agent JVM need do nothing more exotic than a file timestamp check.

is blocking

JENKINS-47822 docker pipeline finish beforehand when tcp socket is used

Closed

is duplicated by

JENKINS-35370 Workflow shell step ERROR: script returned exit code -1

Reopened

JENKINS-39307 pipeline docker execution aborts without reason

In Review

JENKINS-38682 Pipeline plugin sh script returned exit code -1 on windows node after a short while

Resolved

JENKINS-37720 Virtual thread dump hangs waiting for ProcessLiveness

Resolved

JENKINS-42166 ProcessLiveness.workingLaunchers heuristic is flaky

Resolved

JENKINS-42405 “Could not initialize class ProcessLiveness$LibC” running sh on Windows

Resolved

JENKINS-46651 container step "script returned exit code -1"

Resolved

relates to

JENKINS-25503 Use setsid instead of nohup

Resolved

JENKINS-45294 Remoting should reject RPCRequests over the closing channel

Resolved

JENKINS-44785 Add Built-in Request timeout support in Remoting

Open

JENKINS-48300 Pipeline shell step aborts prematurely with ERROR: script returned exit code -1

Resolved

JENKINS-50892 Pipeline jobs stuck after restart

Closed

links to

PR 49

mentioned in: Page Loading...

(3 is duplicated by, 5 relates to, 1 links to, 1 mentioned in)

Assignee:: Jesse Glick

Reporter:: Jesse Glick

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2017-11-02 15:30

Updated:: 2018-06-27 01:36

Resolved:: 2017-11-15 03:14

Details

Description

Attachments

Issue Links

Activity

People

Dates