-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
Hide'pipeline-build-step':
version: "2.5"
'pipeline-github-lib':
version: "1.0"
'pipeline-graph-analysis':
version: "1.4"
'pipeline-input-step':
version: "2.7"
'pipeline-milestone-step':
version: "1.3.1"
'pipeline-model-api':
version: "1.1.7"
'pipeline-model-declarative-agent':
version: "1.1.1"
'pipeline-model-definition':
version: "1.1.7"
'pipeline-model-extensions':
version: "1.1.7"
'pipeline-rest-api':
version: "2.8"
'pipeline-stage-step':
version: "2.2"
'pipeline-stage-tags-metadata':
version: "1.1.7"
'pipeline-stage-view':
version: "2.8"
'workflow-aggregator':
version: "2.5"
'workflow-api':
version: "2.18"
'workflow-basic-steps':
version: "2.6"
'workflow-cps-global-lib':
version: "2.8"
'workflow-cps':
version: "2.36"
'workflow-durable-task-step':
version: "2.12"
'workflow-job':
version: "2.12.1"
'workflow-multibranch':
version: "2.16"
'workflow-remote-loader':
version: "1.4"
'workflow-scm-step':
version: "2.6"
'workflow-step-api':
version: "2.12"
'workflow-support':
version: "2.14"
Jenkins 2.68 with JNLP4 launcher
Slave node is Windows 2008,
Master is Centos 6.8 with java-1.8.0-openjdk-1.8.0.101-3.b13.el6_8.x86_64Show'pipeline-build-step': version: "2.5" 'pipeline-github-lib': version: "1.0" 'pipeline-graph-analysis': version: "1.4" 'pipeline-input-step': version: "2.7" 'pipeline-milestone-step': version: "1.3.1" 'pipeline-model-api': version: "1.1.7" 'pipeline-model-declarative-agent': version: "1.1.1" 'pipeline-model-definition': version: "1.1.7" 'pipeline-model-extensions': version: "1.1.7" 'pipeline-rest-api': version: "2.8" 'pipeline-stage-step': version: "2.2" 'pipeline-stage-tags-metadata': version: "1.1.7" 'pipeline-stage-view': version: "2.8" 'workflow-aggregator': version: "2.5" 'workflow-api': version: "2.18" 'workflow-basic-steps': version: "2.6" 'workflow-cps-global-lib': version: "2.8" 'workflow-cps': version: "2.36" 'workflow-durable-task-step': version: "2.12" 'workflow-job': version: "2.12.1" 'workflow-multibranch': version: "2.16" 'workflow-remote-loader': version: "1.4" 'workflow-scm-step': version: "2.6" 'workflow-step-api': version: "2.12" 'workflow-support': version: "2.14" Jenkins 2.68 with JNLP4 launcher Slave node is Windows 2008, Master is Centos 6.8 with java-1.8.0-openjdk-1.8.0.101-3.b13.el6_8.x86_64
-
Powered by SuggestiMate
One of our nodes disconnected due to a channel disconnect. Meanwhile a pipeline job was running, and as soon as it reconnected, Jenkins tried to schedule a task on the node, which failed with the error:
ERROR: Issue with creating launcher for agent <nodename>. The agent has not been fully initialized yet
ERROR: Issue with creating launcher for agent <nodename>. The agent has not been fully initialized yet
I have attached the console output from the agent.
- slave.log.1
- 8 kB
- 2xlarge-738_node.log
- 78 kB
- catalina_738.log
- 65 kB
- agentconsole.log
- 7 kB
- is related to
-
JENKINS-45023 Channel#call() should reject requests if the channel is being closed
-
- Resolved
-
- relates to
-
JENKINS-23305 NPE in Slave.createLauncher() for Matrix and Pipeline jobs
-
- Fixed but Unreleased
-
-
JENKINS-41854 Contextualize a fresh FilePath after an agent reconnection
-
- Resolved
-
[JENKINS-46067] Pipeline task scheduled on uninitialized node
We upgraded to Jenkins LTS 2.89.1 which is supposed to have Remoting 3.14 and we are seeing some builds fail with the same error message. These are elastic Openstack build nodes. We are also using a Slave Setup - Setup Script After Copy script too if that matters.
ERROR: Issue with creating launcher for agent tph-build-c1.2xlarge-8428. The agent has not been fully initialized yet
ERROR: Issue with creating launcher for agent tph-build-c1.2xlarge-8428. The agent has not been fully initialized yet
remote file operation failed: /opt/jenkins/workspace/fh_guide_master-DAUIRXWDCG4RH3OVIMVYS3W2QDGGAIJIWFEPCAIQEFWE7YG5KZDQ at hudson.remoting.Channel@681ae622:tph-build-c1.2xlarge-8428: hudson.remoting.ChannelClosedException: Remote call on tph-build-c1.2xlarge-8428 failed. The channel is closing down or has closed down
johnlengeling would you be able to attach...
1) Jenkins master logs
2) Agent logs
3) OpenStack VM provisioniing/termination logs
4) Agent configuration and the Openstack plugin versions (if any)
Oleg,
We are running Jenkins 2.89.1 with Openstack Plugin 2.29. We are running a declarative pipeline job. We have a parallel section which runs 7 different build steps on 7 different nodes that are spun up elastically on Openstack Ocata. We see this failure intermittently.
We see the following error in the console output:
[beautify none-noarch] Still waiting to schedule task [beautify none-noarch] All nodes of label ‘openstack’ are offline [beautify none-noarch] Running on team-ph-build-c1.2xlarge-738 in /opt/jenkins/workspace/h_foo_master-DAUIRXWDCG4RH3OVIMVYS3W2QDGGAIJIWFEPCAIQEFWE7YG5KZDQ [Pipeline] [beautify none-noarch] { [Pipeline] [beautify none-noarch] checkout [beautify none-noarch] ERROR: Issue with creating launcher for agent team-ph-build-c1.2xlarge-738. The agent has not been fully initialized yet [beautify none-noarch] ERROR: Issue with creating launcher for agent team-ph-build-c1.2xlarge-738. The agent has not been fully initialized yet
And the following stack trace at the end of the console log:
java.io.EOFException at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2675) at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3150) at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:859) at java.io.ObjectInputStream.<init>(ObjectInputStream.java:355) at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:48) at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:35) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63) Caused: java.io.IOException: Unexpected termination of the channel at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77) Caused: hudson.remoting.ChannelClosedException: Remote call on team-ph-build-c1.2xlarge-738 failed. The channel is closing down or has closed down at hudson.remoting.Channel.call(Channel.java:901) at hudson.FilePath.act(FilePath.java:986) Caused: java.io.IOException: remote file operation failed: /opt/jenkins/workspace/h_foo_master-DAUIRXWDCG4RH3OVIMVYS3W2QDGGAIJIWFEPCAIQEFWE7YG5KZDQ at hudson.remoting.Channel@2d334044:team-ph-build-c1.2xlarge-738 at hudson.FilePath.act(FilePath.java:993) at hudson.FilePath.act(FilePath.java:975) at hudson.FilePath.mkdirs(FilePath.java:1158) at hudson.plugins.git.GitSCM.createClient(GitSCM.java:747) at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1117) at org.jenkinsci.plugins.workflow.steps.scm.SCMStep.checkout(SCMStep.java:113) at org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:85) at org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:75) at org.jenkinsci.plugins.workflow.steps.AbstractSynchronousNonBlockingStepExecution$1$1.call(AbstractSynchronousNonBlockingStepExecution.java:47) at hudson.security.ACL.impersonate(ACL.java:260) at org.jenkinsci.plugins.workflow.steps.AbstractSynchronousNonBlockingStepExecution$1.run(AbstractSynchronousNonBlockingStepExecution.java:44) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Finished: FAILURE
Attached is an excerpt of the catalina.out and also the node log showing the Agent Setup script running.
We have the same problem on Jenkins 2.60.3, host system is Debian 9, with KVM/QEMU and the libvirt plugin. OS on the VMs is Windows 7 with the Jenkins service installed. We have configured the VMs to shutdown after each job and to revert to a certain snapshot when shutting down.
We have noticed that the machines do not shut down every time, especially if there is heavy load (lots of jobs in the queue) or if the job failed during compilation. And if they do not shut down, this error appears
As a workaround, does anyone know how to setup the retry so that it will try to use a different node? I can setup some retries around functions executing on the node (e.g.)
jenkinsObject.retry(3)
{
jenkinsObject.doTheThing() // Do a clever thing on this node
}
However I'm not sure how to setup a retry around a failed node. Infrastructure fails sometimes. Fact of life. I know there's gotta be a way to compensate for that so that my Jenkins jobs are a little more robust, but I'm not seeing it...
Somehow this is happening more often in our instance Jenkins 2.116 with the latest pipeline plugins. (Note we have not any OpenStack* plugin installed)
I have attached the master and slave jobs
slave.log.1[^jenkins.log]
At this stage, I'm with sames I'm happy with any workaround
We just got this exact error for the first time in one of our pipeline jobs running many things in parallel. It stinks because there is no recovery whatsoever, and even the "post { always {}}" block did not execute. This worries me because it means there is potential for jobs not cleaning up after themselves.
I am trying to determine whether this is supposed to be fixed, but I can't tell. Can't the underlying code just wait a couple seconds for the slave to become fully available, rather than crash the whole job? The referenced ticket (45023) seems to somewhat address that, but it seems to not take effect in my installation.
We are using Jenkins version 2.130 with relatively recent pipeline plugins (pipline:api is v 2.28, pipeline:job is 2.22, pipeline:multibranch is 2.19, pipeline:utility is v2.1.0). Nothing was upgraded recently, and this same job has worked many times in the past.
One possible solution I can see is that our Jenkins nodes are too overloaded, causing them to shut down in the first place. So maybe we should add more nodes?
ajferrigno, it would be wise to make sure none of the nodes are overloaded. All sorts of unusual symptoms can occur when systems are overloaded. An overloaded system can cause a node or a connection to fail. Perhaps something needs to be more robust to these conditions, but reducing the load is a good step.
It sounds like there are possibly a few different, though interacting problems going on here.
Agents / nodes fail or otherwise lose their connection at times. I don't see enough information here to suggest any causes, other than system issues such as overload or other lack of resources.
Some portions of the system could be more resilient in the face of such failures. Pipeline, for one example, could better handle these failures or provide more tools for handling them.
There may be some issue with sequencing of nodes starting up and other things interacting with them, such as pipeline. Again, there isn't sufficient information to determine what the problematic sequencing might be.
Possibly other issues, also.
Just to mention that in the past I did monitor the cpu, memory,I/o, network for the nodes to check if there was an overload and found that this issue happens even when there is almost no load in the node.
Noticed this issue started occurring after upgrading to 2.138.1 and the latest plugins a couple weeks ago. Reliability has gone down significantly after the update. Voting for this issue.
johnlengeling’s issue sounds like a garden-variety Remoting channel outage, which could have any of a host of root causes. Other reports sound a bit like JENKINS-41854, though there is not enough detail to tell. I was not familiar with that particular error message; from full-text search I can find this (plus an apparently copied usage in a much more obscure plugin).
If reporter/commenters are running the Docker Slaves plugin, that may be to blame. I do not think this plugin is maintained. CC ndeloof
we also saw this kind of issue with the AWS EC2 plugin by creating a new EC2 instance/slave on the fly - most of the times we get this error on windows machine when the slave is started via WinRM - on unix machines where the slave is started via SSH we never saw this issue
We saw this but are not using Docker Slaves or AWS EC2 plugin. We are using windows machines connected via JNLP.
rg How did you start the slave? Everytime when we saw this issue we had it with automatic and newly created slaves, never with the slave we create manually and start via Java Web or via cmd -> also the newly created slaves via Swarm Plugin didn't cause this issue, only the slaves created on the fly by the EC2 plugin
I don't know for sure whether it was Java Web or some other way, but definitely a permanent node. Is it possible that it happens more with automatically created agents as they are added/removed more frequently? It's not something we see frequently so we could have just been unlucky!
Never mind, I missed a usage of this message pattern in Jenkins core in Slave.reportLauncherCreateError.
Tracking to oleg_nenashev’s changes for JENKINS-38527, I found the older JENKINS-23305 which sounds like it is tracking a similar problem. I have no clue as to the root cause here. It should go without saying that if anyone knows how to reproduce the error from scratch they should speak up. Otherwise I am not sure whether there is any way to proceed.
The stack traces of the form
java.lang.IllegalStateException: No remoting channel to the agent OR it has not been fully initialized yet at hudson.model.Slave.reportLauncherCreateError(Slave.java:524) at hudson.model.Slave.createLauncher(Slave.java:496) at org.jenkinsci.plugins.workflow.support.DefaultStepContext.makeLauncher(DefaultStepContext.java:112) at org.jenkinsci.plugins.workflow.support.DefaultStepContext.get(DefaultStepContext.java:68) at org.jenkinsci.plugins.workflow.steps.StepDescriptor.checkContextAvailability(StepDescriptor.java:258) at org.jenkinsci.plugins.workflow.cps.DSL.invokeStep(DSL.java:206) at …
suggest that the problem arises when starting a sh, checkout, or similar step inside a node block. That would imply that this is a duplicate of JENKINS-41854. No idea why the frequency of that issue might have changed recently.
I am seeing this issue consistently when trying to use the post.aborted.container['slave'] below.
post { aborted { echo "testaborted" container('slave') { sh "ls" } } failure { echo "testfail" container('slave') { sh "ls" } } }
What is interesting is I can use the post.failure.container['slave'] above all day long and I can spin up a container and exit cleanly.
Jenkins 2.147 Kubernetes Plugin 1.13.6
[Pipeline] withCredentials ERROR: Issue with creating launcher for agent multi-image-1l98r-vc3rk. The agent has not been fully initialized yet [Pipeline] { [Pipeline] sh ERROR: Issue with creating launcher for agent multi-image-1l98r-vc3rk. The agent has not been fully initialized yet ERROR: Issue with creating launcher for agent multi-image-1l98r-vc3rk. The agent has not been fully initialized yet [Pipeline] } ERROR: Issue with creating launcher for agent multi-image-1l98r-vc3rk. The agent has not been fully initialized yet ERROR: Issue with creating launcher for agent multi-image-1l98r-vc3rk. The agent has not been fully initialized yet ERROR: Issue with creating launcher for agent multi-image-1l98r-vc3rk. The agent has not been fully initialized yet [Pipeline] // withCredentials [Pipeline] } [Pipeline] // container [Pipeline] } [Pipeline] // waitUntil [Pipeline] } [Pipeline] // timeout Error when executing aborted post condition: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on JNLP4-connect connection from 10.255.1.162/10.255.1.162:11312 failed. The channel is closing down or has closed down at hudson.remoting.Channel.call(Channel.java:948) at hudson.FilePath.act(FilePath.java:1070) at hudson.FilePath.act(FilePath.java:1059) at hudson.FilePath.mkdirs(FilePath.java:1244) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController.<init>(FileMonitoringTask.java:171) at org.jenkinsci.plugins.durabletask.BourneShellScript$ShellController.<init>(BourneShellScript.java:197) at org.jenkinsci.plugins.durabletask.BourneShellScript$ShellController.<init>(BourneShellScript.java:189) at org.jenkinsci.plugins.durabletask.BourneShellScript.launchWithCookie(BourneShellScript.java:110) at org.jenkinsci.plugins.durabletask.FileMonitoringTask.launch(FileMonitoringTask.java:98) at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.start(DurableTaskStep.java:264) at org.jenkinsci.plugins.workflow.cps.DSL.invokeStep(DSL.java:270) at org.jenkinsci.plugins.workflow.cps.DSL.invokeMethod(DSL.java:178) at org.jenkinsci.plugins.workflow.cps.CpsScript.invokeMethod(CpsScript.java:122) at sun.reflect.GeneratedMethodAccessor969.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1213) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1022) at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:42) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113) at org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:157) at org.kohsuke.groovy.sandbox.GroovyInterceptor.onMethodCall(GroovyInterceptor.java:23) at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxInterceptor.onMethodCall(SandboxInterceptor.java:155) at org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:155) at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:159) at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:129) at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:129) at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:129) at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:129) at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:129) at com.cloudbees.groovy.cps.sandbox.SandboxInvoker.methodCall(SandboxInvoker.java:17) at WorkflowScript.run(WorkflowScript:198) at ___cps.transform___(Native Method) at com.cloudbees.groovy.cps.impl.ContinuationGroup.methodCall(ContinuationGroup.java:57) at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispatchOrArg(FunctionCallBlock.java:109) at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixArg(FunctionCallBlock.java:82) at sun.reflect.GeneratedMethodAccessor516.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72) at com.cloudbees.groovy.cps.impl.ConstantBlock.eval(ConstantBlock.java:21) at com.cloudbees.groovy.cps.Next.step(Next.java:83) at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:174) at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:163) at org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:129) at org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:268) at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:163) at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$101(SandboxContinuable.java:34) at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.lambda$run0$0(SandboxContinuable.java:59) at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.GroovySandbox.runInSandbox(GroovySandbox.java:108) at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:58) at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:182) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:332) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$200(CpsThreadGroup.java:83) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:244) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:232) at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:64) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:131) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.nio.channels.ClosedChannelException at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:154) at org.jenkinsci.remoting.protocol.impl.NIONetworkLayer.ready(NIONetworkLayer.java:142) at org.jenkinsci.remoting.protocol.IOHub$OnReady.run(IOHub.java:795) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59) ... 3 more
We are relatively updated now, and hit this again when restarting our agents. Jenkins 2.154 / Remoting 3.27. The node had just connected via JNLP.
For us i think it would be pretty easy to reproduce if we made a task that continually restarts agents while a job is scheduled via pipeline to run on them.
Cannot contact <nodename>: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on JNLP4-connect connection from <node>/<ip>:56650 failed. The channel is closing down or has closed down
I encounter a similar issue on a test pipeline doing 13 test satges with only a clean workspace and a NAS mount. Something like that:
stage('test_disconnexion') { agent { node { label 'srdspicpops01||srdspicpops03' } } steps { script { try { //sleep 90 // Test with a wait before start echo "Running on node: ${env.NODE_NAME} with workspace: ${env.WORKSPACE}" cleanWs() batFunctionToMountCIFSVolume(){ echo "DEBUG test_disconnexion" } } catch (exc) { def sw = new StringWriter() def pw = new PrintWriter(sw) exc.printStackTrace(pw) echo sw.toString() unstable(message: "${STAGE_NAME} is unstable") stage_failed."{STAGE_NAME}" = true } } } post { always { revertFunction([node: "${env.NODE_NAME}") } } }
I was running this test on a lunux Jenkins master (2.401.3) and two windows JNLP websocket slaves (with the same OpenJDK version than the master and the correct agent.jar file).
As I didn't found any reason why this happens; I move to a windows master (2.414.1) and run the same slave into this windows master... Same behaviour.
Disconnection happens only 1.2% of the time (avg). So this is hard to reproduce.
Last log from Linux master; error when executing the cleanWs():
[2023-09-12T09:33:28.611Z] hudson.remoting.RequestAbortedException: java.nio.channels.ClosedChannelException [2023-09-12T09:33:28.612Z] at hudson.remoting.Request.abort(Request.java:346) [2023-09-12T09:33:28.612Z] at hudson.remoting.Channel.terminate(Channel.java:1080) [2023-09-12T09:33:28.612Z] at hudson.remoting.Channel$1.terminate(Channel.java:620) [2023-09-12T09:33:28.612Z] at hudson.remoting.AbstractByteBufferCommandTransport.terminate(AbstractByteBufferCommandTransport.java:356) [2023-09-12T09:33:28.612Z] at jenkins.agents.WebSocketAgents$Session.closed(WebSocketAgents.java:158) [2023-09-12T09:33:28.612Z] at jenkins.websocket.WebSockets$1.onWebSocketClose(WebSockets.java:88) [2023-09-12T09:33:28.612Z] at jenkins.websocket.WebSockets$1.onWebSocketError(WebSockets.java:94) [2023-09-12T09:33:28.612Z] at jenkins.websocket.Jetty10Provider$2.onWebSocketError(Jetty10Provider.java:174) [2023-09-12T09:33:28.612Z] at org.eclipse.jetty.websocket.common.JettyWebSocketFrameHandler.onError(JettyWebSocketFrameHandler.java:260) [2023-09-12T09:33:28.612Z] at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.lambda$closeConnection$2(WebSocketCoreSession.java:284) [2023-09-12T09:33:28.612Z] at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:1468) [2023-09-12T09:33:28.612Z] at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:1487) [2023-09-12T09:33:28.612Z] at org.eclipse.jetty.websocket.core.server.internal.AbstractHandshaker$1.handle(AbstractHandshaker.java:212) [2023-09-12T09:33:28.612Z] at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.closeConnection(WebSocketCoreSession.java:284) [2023-09-12T09:33:28.612Z] at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.onEof(WebSocketCoreSession.java:254) [2023-09-12T09:33:28.612Z] at org.eclipse.jetty.websocket.core.internal.WebSocketConnection.fillAndParse(WebSocketConnection.java:482) [2023-09-12T09:33:28.612Z] at org.eclipse.jetty.websocket.core.internal.WebSocketConnection.onFillable(WebSocketConnection.java:340) [2023-09-12T09:33:28.612Z] at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:314) [2023-09-12T09:33:28.612Z] at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:100) [2023-09-12T09:33:28.612Z] at org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53) [2023-09-12T09:33:28.612Z] at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.runTask(AdaptiveExecutionStrategy.java:416) [2023-09-12T09:33:28.612Z] at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.consumeTask(AdaptiveExecutionStrategy.java:385) [2023-09-12T09:33:28.612Z] at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:272) [2023-09-12T09:33:28.612Z] at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.lambda$new$0(AdaptiveExecutionStrategy.java:140) [2023-09-12T09:33:28.612Z] at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:411) [2023-09-12T09:33:28.612Z] at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:934) [2023-09-12T09:33:28.612Z] at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1078) [2023-09-12T09:33:28.612Z] at java.base/java.lang.Thread.run(Thread.java:829) [2023-09-12T09:33:28.612Z] Suppressed: hudson.remoting.Channel$CallSiteStackTrace: Remote call to srdspicpops03 [2023-09-12T09:33:28.612Z] at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1784) [2023-09-12T09:33:28.612Z] at hudson.remoting.Request.call(Request.java:199) [2023-09-12T09:33:28.612Z] at hudson.remoting.Channel.call(Channel.java:999) [2023-09-12T09:33:28.612Z] at hudson.FilePath.act(FilePath.java:1192) [2023-09-12T09:33:28.612Z] at hudson.FilePath.act(FilePath.java:1181) [2023-09-12T09:33:28.612Z] at hudson.FilePath.mkdirs(FilePath.java:1372) [2023-09-12T09:33:28.612Z] at org.jenkinsci.plugins.workflow.steps.CoreStep$Execution.run(CoreStep.java:97) [2023-09-12T09:33:28.612Z] at org.jenkinsci.plugins.workflow.steps.CoreStep$Execution.run(CoreStep.java:71) [2023-09-12T09:33:28.612Z] at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution.lambda$start$0(SynchronousNonBlockingStepExecution.java:47) [2023-09-12T09:33:28.612Z] at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [2023-09-12T09:33:28.612Z] at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) [2023-09-12T09:33:28.612Z] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [2023-09-12T09:33:28.612Z] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [2023-09-12T09:33:28.612Z] ... 1 more [2023-09-12T09:33:28.612Z] Suppressed: org.jenkinsci.plugins.workflow.actions.ErrorAction$ErrorId: e4a5a559-6c82-43f5-b66a-3056dd450409 [2023-09-12T09:33:28.612Z] Caused by: java.nio.channels.ClosedChannelException [2023-09-12T09:33:28.612Z] at jenkins.agents.WebSocketAgents$Session.closed(WebSocketAgents.java:157) [2023-09-12T09:33:28.612Z] ... 23 more
Last log from windows master; error when executing the cleanWs():
14:10:55 ERROR: Issue with creating launcher for agent srdspicpops01. The agent is being disconnected 14:10:55 [Pipeline] echo 14:10:55 hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@533ae8:srdspicpops01": Remote call on srdspicpops01 failed. The channel is closing down or has closed down 14:10:55 at hudson.remoting.Channel.call(Channel.java:993) 14:10:55 at hudson.FilePath.act(FilePath.java:1192) 14:10:55 at hudson.FilePath.act(FilePath.java:1181) 14:10:55 at hudson.FilePath.mkdirs(FilePath.java:1372) 14:10:55 at org.jenkinsci.plugins.workflow.steps.CoreStep$Execution.run(CoreStep.java:97) 14:10:55 at org.jenkinsci.plugins.workflow.steps.CoreStep$Execution.run(CoreStep.java:71) 14:10:55 at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution.lambda$start$0(SynchronousNonBlockingStepExecution.java:47) 14:10:55 at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) 14:10:55 at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) 14:10:55 at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 14:10:55 at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 14:10:55 at java.base/java.lang.Thread.run(Thread.java:829) 14:10:55 Suppressed: org.jenkinsci.plugins.workflow.actions.ErrorAction$ErrorId: 2f0912fb-c33e-43ea-826c-41331758b7ec 14:10:55 Caused by: java.nio.channels.ClosedChannelException 14:10:55 at jenkins.agents.WebSocketAgents$Session.closed(WebSocketAgents.java:157) 14:10:55 at jenkins.websocket.WebSockets$1.onWebSocketClose(WebSockets.java:88) 14:10:55 at jenkins.websocket.Jetty10Provider$2.onWebSocketClose(Jetty10Provider.java:164) 14:10:55 at org.eclipse.jetty.websocket.common.JettyWebSocketFrameHandler.notifyOnClose(JettyWebSocketFrameHandler.java:308) 14:10:55 at org.eclipse.jetty.websocket.common.JettyWebSocketFrameHandler.onClosed(JettyWebSocketFrameHandler.java:292) 14:10:55 at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.lambda$closeConnection$0(WebSocketCoreSession.java:272) 14:10:55 at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:1451) 14:10:55 at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:1488) 14:10:55 at org.eclipse.jetty.websocket.core.server.internal.AbstractHandshaker$1.handle(AbstractHandshaker.java:212) 14:10:55 at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.lambda$closeConnection$1(WebSocketCoreSession.java:272) 14:10:55 at org.eclipse.jetty.util.Callback$4.completed(Callback.java:184) 14:10:55 at org.eclipse.jetty.util.Callback$Completing.succeeded(Callback.java:344) 14:10:55 at org.eclipse.jetty.websocket.common.JettyWebSocketFrameHandler.onError(JettyWebSocketFrameHandler.java:268) 14:10:55 at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.lambda$closeConnection$2(WebSocketCoreSession.java:284) 14:10:55 at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:1469) 14:10:55 at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:1488) 14:10:55 at org.eclipse.jetty.websocket.core.server.internal.AbstractHandshaker$1.handle(AbstractHandshaker.java:212) 14:10:55 at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.closeConnection(WebSocketCoreSession.java:284) 14:10:55 at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.processConnectionError(WebSocketCoreSession.java:339) 14:10:55 at org.eclipse.jetty.websocket.core.internal.WebSocketConnection$Flusher.onCompleteFailure(WebSocketConnection.java:654) 14:10:55 at org.eclipse.jetty.util.IteratingCallback.failed(IteratingCallback.java:417) 14:10:55 at org.eclipse.jetty.util.Callback$Nested.failed(Callback.java:405) 14:10:55 at org.eclipse.jetty.io.WriteFlusher.completeWrite(WriteFlusher.java:402) 14:10:55 at org.eclipse.jetty.io.SelectableChannelEndPoint$3.run(SelectableChannelEndPoint.java:87) 14:10:55 at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.runTask(AdaptiveExecutionStrategy.java:416) 14:10:55 at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.consumeTask(AdaptiveExecutionStrategy.java:385) 14:10:55 at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:272) 14:10:55 at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.lambda$new$0(AdaptiveExecutionStrategy.java:140) 14:10:55 at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:411) 14:10:55 at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:969) 14:10:55 at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1194) 14:10:55 at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1149) 14:10:55 ... 1 more
This windows slave is a standard node that I revert after each stage test (post > always). It seems that this revert may have an incidence with the test stability. On a slave that I don't revert (build / installer); I do not have that kind of disconnexion.
Most of the time, the disconnexion occurs when starting the cleanWs(). So at the very beginning of the satge.
It will be likely fixed by
JENKINS-45023. Or you get a failure early at least.