-
Bug
-
Resolution: Not A Defect
-
Major
-
None
-
Jenkins 1.625.3
Gearman plugin 0.1.3 + https://review.openstack.org/#/c/271543/ "Update to Jenkins LTS 1.625.3 and fix function registration"
I have:
- Zuul
- Gearman plugin 0.1.3 (with https://review.openstack.org/#/c/271543/ )
- Jenkins 1.625.3
- Nodepool 0.1.1 (yeah it is old)
Today I have added a new job that runs a test suite. On build completion I have a few publishers:
- Archive the artifacts ( logs/* ). Note the build produce no log but archiver is set to not fail
- PostBuild, to trigger another project (named castor-save).
The archiver fails because the node went offline while it was executing:
{{
✓ retrieve en.wp main page via mobile-sections (364ms)
✓ retrieve lead section of en.wp main page via mobile-sections-lead (306ms)
FATAL: no longer a configured node for ci-jessie-wikimedia-33866
java.lang.IllegalStateException: no longer a configured node for ci-jessie-wikimedia-33866
at hudson.model.AbstractBuild$AbstractBuildExecution.getCurrentNode(AbstractBuild.java:456)
at hudson.model.AbstractBuild$AbstractBuildExecution.reportBrokenChannel(AbstractBuild.java:813)
at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:788)
at hudson.model.Build$BuildExecution.build(Build.java:205)
at hudson.model.Build$BuildExecution.doRun(Build.java:162)
at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:537)
at hudson.model.Run.execute(Run.java:1741)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:98)
at hudson.model.Executor.run(Executor.java:408)
ERROR: Step ‘Archive the artifacts’ failed: no workspace for mobileapps-deploy-npm-node-4.3 #1
[PostBuildScript] - Execution post build scripts.
[PostBuildScript] Build is not success : do not execute script
Finished: FAILURE
}}
I am apparently not the only one impacted. From a recent IRC log at http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2016-01-18.log.html
> Thelo greghaynes: once in a while I get this error : FATAL: no longer a configured node for d-p-c-local_01-769 in my job's console
JENKINS-26665 "Complete lack of correct synchronization or concern for thread safety in mansion cloud plugin" has a similar stack trace.
Job page: https://integration.wikimedia.org/ci/job/mobileapps-deploy-npm-node-4.3/1/ (hopefully Jenkins will keep it). I have attached the XML configuration. It ran on node ci-jessie-wikimedia-33866.
The job failure occurred on Feb 15th 2016 at 17:39:02
In my case I had two different jobs running on the same node. Which goes something like:
{{
2016-02-15 17:31:37,287 INFO nodepool.NodeLauncher: Node id: 33866 is ready
2016-02-15 17:31:41,056 INFO nodepool.NodeLauncher: Node id: 33866 added to jenkins
2016-02-15 17:37:21,325 DEBUG nodepool.NodeUpdateListener: Received: onStarted {"name":"integration-config-tox-py27-jessie" ... "node_name":"ci-jessie-wikimedia-33866"
2016-02-15 17:38:01,808 DEBUG nodepool.NodeUpdateListener: Received: onFinalized
And half a minute after, a different job is assigned to the same node:
{{
2016-02-15 17:38:33,867 DEBUG nodepool.NodeUpdateListener: Received: onStarted {"name":"mobileapps-deploy-npm-node-4.3" ... "node_name":"ci-jessie-wikimedia-33866"
2016-02-15 17:38:33,871 INFO nodepool.NodeUpdateListener: Setting node id: 33866 to USED
2016-02-15 17:39:01,875 DEBUG nodepool.NodePool: Deleting node id: 33866 which has been in used state for 0.00802109248108 hours
2016-02-15 17:39:02,942 DEBUG nodepool.NodeUpdateListener: Received: onCompleted
,"node_name":"ci-jessie-wikimedia-33866" FAILURE
2016-02-15 17:39:06,763 INFO nodepool.NodePool: Deleted jenkins node id: 33866
}}