A hard kill from
JENKINS-25550 works around it. Still, should behave better: should show what it is trying to resume, and a regular interruption should stop that. Currently you are given no information, and have to escalate from a regular interrupt, to a step termination, to a hard kill, and then also separately kill the unschedulable queue item.
By the design of pickle resolution, we cannot really recover the build, unfortunately. I argued during the initial design of Workflow that the node step’s state should include just the slave name and workspace path, and its onResume should be responsible for trying to get that workspace back (so that this step could handle stop gracefully, for example by throwing an exception up that the script could catch and handle); but Kohsuke Kawaguchi overrode me, insisting that the serialized program state should include a representation of the FilePath, and that script execution shall not resume until that pickle is successfully dehydrated (even if there were other branches able to proceed, etc.).
So the best we can do is display more clearly what is wrong and offer a hard kill right away. W.r.t. to the queue item, PlaceholderTask.run can tell via StepContext.isReady that it is still being unpickled, but it cannot use that to differentiate the case of a normal startup when we are waiting for a sluggish slave to come back online; it cannot even find the Run to tell whether it was already aborted, since it cannot call get yet. Probably it will need to persist a Run.externalizableId to implement run, and use that also instead of accessControlled. If the Run turns out to be finished, it could use Queue.getItems(Task) to cancel itself, so that cleanup from the whole process reduces to pressing the stop button once on the console page or on the flyweight executor in the executor widget.