-
Bug
-
Resolution: Unresolved
-
Major
-
None
I'm using Jenkins for continuous testing. A build is a test. I'm running build 100 (test 100) on node A, and it runs for 3 hours (or 2 days). In parallel, I'll run tests build 101 (test 100) on node B - and when it fails, it's 'stuck' with:
Take node offline on failure is waiting for a checkpoint on test #100
I'm not sure what checkpoint it's waiting for, but it does not end a failed job.
There are no dependencies I'm aware of between them.
- is related to
-
JENKINS-17507 Text finder plugin has BuildStepMonitor.BUILD for getRequiredMonitorService
-
- Resolved
-
I've seen a similar issue where one stuck slave causes all other jobs of the same build to backup. For example, I had a hundred builds that looked to finished, but were still on the slaves and had this message on the console output:
00:32:03.591 Editable Email Notification is waiting for a checkpoint on build_hhvm_fbcode #13105
Build #13105 had the following message:
01:26:19.644 Looks like the node went offline during the build. Check the slave log for the details.
That machine had in fact been rebooted, but all of the other builds shouldn't have been waiting for it. Once I cancelled build 13105, all the other jobs completed.