-
Bug
-
Resolution: Unresolved
-
Blocker
-
None
-
Jenkins LTS 2.479.3
swarm-plugin 3.49
Sometimes after a hiccup with internet availability between community members' computers with Swarm agents and the FOSS project's Jenkins controller in the cloud, or during a restart of the said controller or its VM, some agents do not come back online until their services/containers/VMs are restarted by community members (so marking as a blocker - only a manual intervention fixes the run-time issue), e.g.:
$ tail -F /var/log/jenkins-swarm-nutci.log java.util.concurrent.TimeoutException: Ping started at 1737925236886 hasn't completed by 1737925476910 at hudson.remoting.PingThread.ping(PingThread.java:135) at hudson.remoting.PingThread.run(PingThread.java:87)Jan 26, 2025 10:04:36 PM hudson.remoting.Launcher$CuiListener status INFO: Terminated Jan 26, 2025 10:04:36 PM hudson.plugins.swarm.Client run INFO: Retrying in 10 seconds Jan 26, 2025 10:04:47 PM hudson.plugins.swarm.Client run INFO: Attempting to connect to https://ci.ourproject.org/ $ date Mon Jan 27 08:13:48 CET 2025
As seen above, the "Attempting to connect" (from client/src/main/java/hudson/plugins/swarm/Client.java::run()) never returned, even after 10 hours into the call.
The code in client/src/main/java/hudson/plugins/swarm/SwarmClient.java::createSwarmAgent() seems to createHttpClient() and HttpRequest.newBuilder(uri).POST(...) which I suppose may behave in a "generally undefined manner" while communications and/or controller itself are not fully available, and should be constrained by some timeout.
Agents in that setup use the LabelFileWatcher, an info-log message about which is literally the next line in that run() method, and is not logged - so the agent is blocked inside the createSwarmAgent() code somewhere.
A healthy startup, e.g. when restarting the same client, looks like this:
+ exec java -jar /home/abuild/jenkins-swarm/swarm-client-3.49.jar -config jenkins-swarm.yml Jan 27, 2025 8:14:25 AM hudson.plugins.swarm.Client logArguments INFO: Client invoked with: -config jenkins-swarm.yml Jan 27, 2025 8:14:25 AM hudson.plugins.swarm.Client main INFO: Load configuration from jenkins-swarm.yml Jan 27, 2025 8:14:25 AM hudson.plugins.swarm.SwarmClient <init> INFO: Loading labels from jenkins-swarm.labels... Jan 27, 2025 8:14:25 AM hudson.plugins.swarm.SwarmClient <init> INFO: Labels found in file: nut-builder ... Jan 27, 2025 8:14:25 AM hudson.plugins.swarm.SwarmClient <init> INFO: Effective label list: [nut-builder, ...] ### The healthy attempt to connect... Jan 27, 2025 8:14:25 AM hudson.plugins.swarm.Client run INFO: Connecting to Jenkins controller Jan 27, 2025 8:14:25 AM hudson.plugins.swarm.Client run INFO: Attempting to connect to https://ci.ourproject.org/ Jan 27, 2025 8:14:27 AM hudson.plugins.swarm.Client run INFO: Setting up LabelFileWatcher ### ...happens quickly Jan 27, 2025 8:14:27 AM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir INFO: Using ./remoting as a remoting work directory Jan 27, 2025 8:14:27 AM org.jenkinsci.remoting.engine.WorkDirManager setupLogging INFO: Both error and output logs will be printed to ./remoting Jan 27, 2025 8:14:27 AM hudson.remoting.Launcher createEngine INFO: Setting up agent: nutci-freebsd12-amd64 Jan 27, 2025 8:14:27 AM hudson.remoting.Engine startEngine INFO: Using Remoting version: 3283.v92c105e0f819 Jan 27, 2025 8:14:27 AM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir INFO: Using ./remoting as a remoting work directory Jan 27, 2025 8:14:27 AM hudson.remoting.Engine startEngine INFO: Using custom JAR Cache: FileSystem JAR Cache: path=/home/abuild/jenkins-swarm/../.jarcache, touch=true Jan 27, 2025 8:14:27 AM hudson.remoting.Launcher$CuiListener status INFO: Locating server among [https://ci.ourproject.org/] Jan 27, 2025 8:14:48 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve INFO: Remoting server accepts the following protocols: [JNLP4-connect, Ping] Jan 27, 2025 8:14:48 AM hudson.remoting.Launcher$CuiListener status INFO: Agent discovery successful Agent address: ci.ourproject.org Agent port: 22033 Identity: 38:9c:a5:6b:6e:17:b6:b3:3b:6a:15:a7:52:4d:12:34 Jan 27, 2025 8:14:48 AM hudson.remoting.Launcher$CuiListener status INFO: Handshaking Jan 27, 2025 8:14:48 AM hudson.remoting.Launcher$CuiListener status INFO: Connecting to ci.ourproject.org:22033 Jan 27, 2025 8:14:48 AM hudson.remoting.Launcher$CuiListener status INFO: Server reports protocol JNLP4-connect-proxy not supported, skipping Jan 27, 2025 8:14:48 AM hudson.remoting.Launcher$CuiListener status INFO: Trying protocol: JNLP4-connect Jan 27, 2025 8:14:48 AM org.jenkinsci.remoting.protocol.impl.BIONetworkLayer$Reader run INFO: Waiting for ProtocolStack to start. Jan 27, 2025 8:14:48 AM hudson.remoting.Launcher$CuiListener status INFO: Remote identity confirmed: 38:9c:a5:6b:6e:17:b6:b3:3b:6a:15:a7:52:4d:12:34 Jan 27, 2025 8:14:48 AM hudson.remoting.Launcher$CuiListener status INFO: Connected
This may be or not be related to JENKINS-59817