[JENKINS-41854] Contextualize a fresh FilePath after an agent reconnection - Jenkins Jira

Type: Bug
Resolution: Fixed
Priority: Major
Component/s: workflow-durable-task-step-plugin
Labels:
- robustness
- triaged-2018-11

Similar Issues:
Powered by SuggestiMate

Show
Epic Link:
Pipeline Durability
Released As:
workflow-durable-task-step 2.31, workflow-basic-steps 2.18

PlaceholderExecutable does not bother listening for a closed connection as such, it just lets any nested step actually using the agent fail if the connection dies in the middle. Usually this is fine, but there is a corner case where it is probably wrong (unconfirmed): if an agent is disconnected and reconnected during the first sh in

node {
  sh 'sleep 999'
  sh 'sleep 999'
}

then the second sh could fail since it would be using the old Channel, even when the first sh succeeds because the DurableTaskStep.Execution recomputes the FilePath after the ChannelClosedException. (But if Jenkins restarted during the first sh after the reconnection then it should work, since the FilePath would be reconstructed from a FilePathPickle.)

The fix may be tricky since FilePath.channel is effectively final, so currently whatever workspace is passed from PlaceholderExecutable will be used for the duration of the block. Perhaps BodyInvoker.withContexts should support offering a Provider of contextual objects—in this case something that caches a FilePath so long as it is valid (!Channel.outClosed?), and otherwise falls back to FilePathUtils.find like FilePathPickle.

is duplicated by

JENKINS-47868 Pipeline durability hang when slave node disconnected

Reopened

JENKINS-50504 Jenkins is handing out workspaces that are already in use to new jobs

Resolved

is related to

JENKINS-49707 Auto retry for elastic agents after channel closure

Resolved

relates to

JENKINS-46067 Pipeline task scheduled on uninitialized node

Open

JENKINS-41791 Build cannot be resumed if parallel was used with Kubernetes plugin

Resolved

JENKINS-40613 DurableTaskStep.Execution.getWorkspace() should be subject to timeout

Resolved

JENKINS-54643 A connection interruption causes the pipeline to fail when USE_WATCHING=true

Resolved

JENKINS-58900 Agent disconnections can cause MissingContextVariableException in Pipelines

Resolved

JENKINS-49651 Extend plugin/update center metadata with known incompatibilities

Open

links to

PR 101

workflow-basic-steps #86

(4 relates to, 2 links to)

Assignee:: Jesse Glick

Reporter:: Jesse Glick

Votes:: 4 Vote for this issue

Watchers:: 9 Start watching this issue

Created:: 2017-02-08 18:09

Updated:: 2019-08-16 14:17

Resolved:: 2019-06-03 21:03

Details

Description

Attachments

Issue Links

Activity

People

Dates