-
Bug
-
Resolution: Fixed
-
Blocker
-
None
-
Windows Jenkins master version 2.361.3, Windows Jenkins slave Swarm client 3.37
-
Powered by SuggestiMate
After upgrade to Swarm client 3.37 the Jenkins slave cannot connect anymore to the Jenkins master with version 2.361.3. Error message i get is:
Nov 03, 2022 3:54:12 PM hudson.plugins.swarm.SwarmClient getCsrfCrumb
SEVERE: Could not obtain CSRF crumb. Response code: 400
<h1>Bad Message 400</h1><pre>reason: Bad Request</pre>
Nov 03, 2022 3:54:12 PM hudson.plugins.swarm.Client run
SEVERE: An error occurred
hudson.plugins.swarm.RetryException: Failed to create a Swarm agent on Jenkins. Response code: 400
<h1>Bad Message 400</h1><pre>reason: Bad Request</pre>
at hudson.plugins.swarm.SwarmClient.createSwarmAgent(SwarmClient.java:405)
at hudson.plugins.swarm.Client.run(Client.java:216)
at hudson.plugins.swarm.Client.main(Client.java:68)
Java version running is in C:\Program Files\AdoptOpenJDK\jdk-11.0.11.9-openj9\bin\java.exe
Jenkins slave is started with these commands:
"C:\Program Files (x86)\GnuWin32\bin\wget" -v --no-proxy -P . http://%MASTER%%MASTERPORT%/swarm/swarm-client.jar -O swarm%MASTER%.jar
java -jar swarm%MASTER%.jar -executors 1 -disableClientsUniqueId -deleteExistingClients -fsroot %JENKINS_WORKAREA% -labels "%COMPUTERNAME% %SUPPORTED_LABELS%" -master http://%MASTER%%MASTERPORT% -username %USERNAME% -password %PASSWD% -name %COMPUTERNAME% -description "%COMPUTERNAME% runs %SUPPORTED_LABELS%"
- OK_Ethernet2.txt
- 3 kB
- 80.txt
- 7 kB
- def.txt
- 7 kB
- FAILED.txt
- 2.12 MB
- OK.txt
- 4 kB
[JENKINS-70007] Could not obtain CSRF crumb. Response code: 400
First let me point out that when i update Jenkins master with Swarm version 3.37, then do a restart of my master and let the Slaves run Swarm client 3.36 all Slaves connect without any issue.
We run Swarm client with:
java -jar swarmcp-www527.gos.oce.net.jar -executors 1 -disableClientsUniqueId -deleteExistingClients -fsroot c:\oce\jenkins -labels "SIL000131 do_not_use" -master http://cp-www527.gos.oce.net:80 -username ovl-svc-embedded-ba -password ****** -name SIL000131 -description "SIL000131 runs do_not_use"
Just tested it with new Java:
c:\oce\Jenkins>java --version
openjdk 17.0.5 2022-10-18
OpenJDK Runtime Environment Temurin-17.0.5+8 (build 17.0.5+8)
OpenJDK 64-Bit Server VM Temurin-17.0.5+8 (build 17.0.5+8, mixed mode, sharing)
Same issue occurred:
INFO: Attempting to connect to http://cp-www527.gos.oce.net:80/
Nov 07, 2022 11:52:57 AM hudson.plugins.swarm.SwarmClient getCsrfCrumb
SEVERE: Could not obtain CSRF crumb. Response code: 400
<h1>Bad Message 400</h1><pre>reason: Bad Request</pre>
Nov 07, 2022 11:52:57 AM hudson.plugins.swarm.Client run
SEVERE: An error occurred
hudson.plugins.swarm.RetryException: Failed to create a Swarm agent on Jenkins. Response code: 400
<h1>Bad Message 400</h1><pre>reason: Bad Request</pre>
at hudson.plugins.swarm.SwarmClient.createSwarmAgent(SwarmClient.java:405)
at hudson.plugins.swarm.Client.run(Client.java:216)
at hudson.plugins.swarm.Client.main(Client.java:68)
Nov 07, 2022 11:52:57 AM hudson.plugins.swarm.Client run
Java command now is:
C:\Temp\jdk-17.0.5+8\bin\java.exe -Dorg.jenkinsci.plugins.gitclient.CliGitAPIImpl.useSETSID=true -jar swarm%MASTER%.jar -executors 1 -disableClientsUniqueId -deleteExistingClients -fsroot %JENKINS_WORKAREA% -labels "%COMPUTERNAME% %SUPPORTED_LABELS%" -master http://%MASTER%%MASTERPORT% -username %USERNAME% -password %PASSWD% -name %COMPUTERNAME% -description "%COMPUTERNAME% runs %SUPPORTED_LABELS%"
I do not see any additional logging.
I would guess that there is some networking problem between the swarm agent and the controller. I don't know what that problem might be.
Community.jenkins.io sometimes provides the following ideas to users that are seeing CSRF crumb issues:
Standard fixes to No valid crumb:
- Check jenkins error log
- Update all plugins
- Confirm hostname in /manage (or config.xml) matches hostname you are accessing jenkins with
- Confirm your load balancer/reverse proxy/etc. is doing X-Forwarded-Host, X-Forwarded-Proto and maybe X-Forwarded-Port are setup correctly
Tried the following:
- Checked Jenkins log: no log on master, only log on Slave which was added to this ticket
- Update all Plugins: In fact i upgraded Jenkins master to latest LTS then upgraded all plugins after a Jenkins restart.
Then rebooted a slave to test if everything still worked --> Here i found that Swarm client give the above error.
Set ONLY Swarm client back to 3.36 and did a restart of Jenkins master, then rebooted slave and it connected without any problems.
So all Plugins are up to date only Swarm 3.37 is giving this issue. - Checked hostname in /manage and config.xml and i confirm that it matches the hostname i am accessing jenkins with
- We do not use the reverse proxy, however i checked the X_Forwarded stuff all are ok.
To reproduce this issue is quite easy, steps are:
- Update Swarm Plugin on master and restart
- Login on slave and stop Jenkins service
- Download swarm jar file from master (<master URL>/swarm/swarm-client.jar)
- Start Jenkins service with newly downloaded swarm-client.jar
Unfortunately, I can't reproduce the issue. My steps are:
- Update to swarm plugin 3.37 on controller and restart (using Java 11.0;.16.1)
- Stop Jenkins swarm agent on Debian Linux 10 (using Java 11.0.16.1)
- Download swarm jar file from controller (<controller URL>/swarm/swarm-client.jar)
- Start Jenkins swarm agent from Debian Linux 10 with newly downloaded swarm-client.jar
I've run Jenkins 2.361.3 in that configuration recently and am now running Jenkins 2.375 in that configuration.
Please note that we use Windows 10 Pro as slave and Windows Server 2012 as master.
When i run in a cmd window a crumb GET API i get this:
$ curl -v -X GET http://cp-www527.gos.oce.net:8080/crumbIssuer/api/json --user <user>:<password>
Note: Unnecessary use of -X or --request, GET is already inferred.
* Trying 10.95.5.43:8080...
* Connected to cp-www527.gos.oce.net (10.95.5.43) port 8080 (#0)
* Server auth using Basic with user 'ovl-svc-embedded-ba'
> GET /crumbIssuer/api/json HTTP/1.1
> Host: cp-www527.gos.oce.net:8080
> Authorization: Basic ****************************==
> User-Agent: curl/7.83.1
> Accept: /
>
* Mark bundle as not supporting multiuse
* HTTP 1.0, assume close after body
< HTTP/1.0 404 Not Found
< Content-Type: text/plain;charset=UTF-8
< X-RBT-CLI: Name=ovl-steelhead-mgt; Ver=9.9.3a;
< Date: Tue, 08 Nov 2022 07:43:13 GMT
< Connection: close
<
Not Found
* Closing connection 0
Do you have any other things i can use/test?
Also set "Enable proxy compatibility" in "Configure Global Security" under "CSRF Protection", unfortunately same result...
Since Response code: 400 which normally is a Bad Request.
Is there a way how i can see/log which request and URL is sent ?
My curl output is quite different from yours:
$ curl -v GET http://mark-pc2.markwaite.net:8080/crumbIssuer/api/json * Trying 127.0.1.1:8080... * Connected to mark-pc2.markwaite.net (127.0.1.1) port 8080 (#0) > GET /crumbIssuer/api/json HTTP/1.1 > Host: mark-pc2.markwaite.net:8080 > User-Agent: curl/7.81.0 > Accept: */* > * Mark bundle as not supporting multiuse < HTTP/1.1 200 OK < Date: Tue, 08 Nov 2022 11:09:07 GMT < X-Content-Type-Options: nosniff < X-Jenkins: 2.375 < X-Jenkins-Session: dd627f5a < X-Frame-Options: deny < Content-Type: application/json;charset=utf-8 < Set-Cookie: JSESSIONID.5508e9d8=node01tbi6ys74pdgu1gi3rpt5bicza23.node0; Path=/; HttpOnly < Expires: Thu, 01 Jan 1970 00:00:00 GMT < Content-Length: 163 < Server: Jetty(10.0.12) < * Connection #0 to host mark-pc2.markwaite.net left intact {"_class":"hudson.security.csrf.DefaultCrumbIssuer","crumb":"38b5a0ba5e32986a50998bf62ddb4e307bceb0b13a0185721269898a06b2a950","crumbRequestField":"Jenkins-Crumb"}mwaite@mark-pc2:~/git/jenkins/jenkins.io
I'm sure there are ways to log details of the requests and responses, though I don't know how you would do that. You might check the global configuration of the Jenkins instance and compare it with a test instance that you install on some other machine. Maybe there is some setting in your production instance that is causing the behavior?
I checked your output and mine, difference 1 is Linux versus Windows, which made me figure out the following....
So this changes the curl to:
$ curl -v GET http://cp-www527.gos.oce.net:8080/crumbIssuer/api/json --user xxx:ppp
* Could not resolve host: GET
* Closing connection 0
This gives me the following new command:
$ curl -v http://cp-www527.gos.oce.net:8080/crumbIssuer/api/json --user xxx:ppp
* Trying 10.95.5.43:8080...
* Connected to cp-www527.gos.oce.net (10.95.5.43) port 8080 (#0)
* Server auth using Basic with user 'xxx'
> GET /crumbIssuer/api/json HTTP/1.1
> Host: cp-www527.gos.oce.net:8080
> Authorization: Basic*****************==
> User-Agent: curl/7.83.1
> Accept: /
>
* Mark bundle as not supporting multiuse
* HTTP 1.0, assume close after body
< HTTP/1.0 404 Not Found
< Content-Type: text/plain;charset=UTF-8
< X-RBT-CLI: Name=ovl-steelhead-mgt; Ver=9.9.3a;
< Date: Tue, 08 Nov 2022 12:23:46 GMT
< Connection: close
<
Not Found
* Closing connection 0
However when i change the port from 8080 to 80 i get this response (which is quite similar to yours)
$ curl -v http://cp-www527.gos.oce.net:80/crumbIssuer/api/json --user xxx:ppp
* Trying 10.95.5.43:80...
* Connected to cp-www527.gos.oce.net (10.95.5.43) port 80 (#0)
* Server auth using Basic with user 'xxxx'
> GET /crumbIssuer/api/json HTTP/1.1
> Host: cp-www527.gos.oce.net
{{{}> Authorization: Basic*****************==
{}}}> User-Agent: curl/7.83.1
> Accept: /
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Date: Tue, 08 Nov 2022 12:25:40 GMT
< X-Content-Type-Options: nosniff
< X-Jenkins: 2.361.3
< X-Jenkins-Session: d535f55c
< X-Frame-Options: deny
< Content-Type: application/json;charset=utf-8
< Set-Cookie: JSESSIONID.6782108b=node04t2l0b5dikkg1rhq96ynrg7ux9.node0; Path=/; HttpOnly
< Expires: Thu, 01 Jan 1970 00:00:00 GMT
< Server: Jetty(10.0.11)
< X-RBT-CLI: Name=ovl-steelhead-mgt; Ver=9.9.3a;
< Connection: keep-alive
< Content-Length: 163
<
{{
* Connection #0 to host cp-www527.gos.oce.net left intact}}
Could this mean that the port is wrong here ?
Checked port usage and found that port set in Slave script is 80
This means i tested with port 80 and 8080:
- port 80 gives: Response code: 400
- port 8080 gives: Response code: 404
See below....
$ java -jar swarmcp-www527.gos.oce.net.jar -executors 1 -disableClientsUniqueId -deleteExistingClients -fsroot c:\oce\jenkins -labels "SIL000131 do_not_use" -master http://cp-www527.gos.oce.net:8080 -username uuuuu -password pppppp -name SIL000131 -description "SIL000131 runs do_not_use"
Nov 08, 2022 1:38:59 PM hudson.plugins.swarm.Client logArguments
INFO: Client invoked with: -deleteExistingClients true -description SIL000131 runs do_not_use -disableClientsUniqueId true -executors 1 -fsroot c:\oce\jenkins -labels [SIL000131 do_not_use] -name SIL000131 -password ***** -url http://cp-www527.gos.oce.net:8080 -username *****
Nov 08, 2022 1:38:59 PM hudson.plugins.swarm.Client run
INFO: Connecting to Jenkins controller
Nov 08, 2022 1:38:59 PM hudson.plugins.swarm.Client run
INFO: Attempting to connect to http://cp-www527.gos.oce.net:8080/
Nov 08, 2022 1:38:59 PM hudson.plugins.swarm.SwarmClient getCsrfCrumb
SEVERE: Could not obtain CSRF crumb. Response code: 404
Not Found
$ java -jar swarmcp-www527.gos.oce.net.jar -executors 1 -disableClientsUniqueId -deleteExistingClients -fsroot c:\oce\jenkins -labels "SIL000131 do_not_use" -master http://cp-www527.gos.oce.net:80 -username uuuuu -password pppppp -name SIL000131 -description "SIL000131 runs do_not_use"
Nov 08, 2022 1:39:57 PM hudson.plugins.swarm.Client logArguments
INFO: Client invoked with: -deleteExistingClients true -description SIL000131 runs do_not_use -disableClientsUniqueId true -executors 1 -fsroot c:\oce\jenkins -labels [SIL000131 do_not_use] -name SIL000131 -password ***** -url http://cp-www527.gos.oce.net:80 -username *****
Nov 08, 2022 1:39:57 PM hudson.plugins.swarm.Client run
INFO: Connecting to Jenkins controller
Nov 08, 2022 1:39:57 PM hudson.plugins.swarm.Client run
INFO: Attempting to connect to http://cp-www527.gos.oce.net:80/
Nov 08, 2022 1:39:57 PM hudson.plugins.swarm.SwarmClient getCsrfCrumb
SEVERE: Could not obtain CSRF crumb. Response code: 400
<h1>Bad Message 400</h1><pre>reason: Bad Request</pre>
Nov 08, 2022 1:39:57 PM hudson.plugins.swarm.Client run
SEVERE: An error occurred
hudson.plugins.swarm.RetryException: Failed to create a Swarm agent on Jenkins. Response code: 400
<h1>Bad Message 400</h1><pre>reason: Bad Request</pre>
at hudson.plugins.swarm.SwarmClient.createSwarmAgent(SwarmClient.java:405)
at hudson.plugins.swarm.Client.run(Client.java:216)
at hudson.plugins.swarm.Client.main(Client.java:68)
Nov 08, 2022 1:39:57 PM hudson.plugins.swarm.Client run
INFO: Retrying in 10 seconds
Port 80 is a privileged port on Linux machines. User programs aren't allowed to listen on that port without special permissions. If there is some other program listening on port 80 on your Windows computer, that might cause erratic behavior. I don't know if Windows allow multiple processes to listen on a single network port.
May be worth checking the configuration of the controller to assure that the controller URL is configured correctly in "Manage Jenkins / System Configuration"
Here are the settings:
"Manage Jenkins / System Configuration"
Jenkins URL = http://cp-www527.gos.oce.net:80/
"Manage Jenkins / Global Security"
TCP port for inbound agents: Fixed to 8080
Slave tries to connects to:
SET MASTER=cp-www527.gos.oce.net
SET MASTERPORT=:80
-master http://%MASTER%%MASTERPORT%
So this is: http://cp-www527.gos.oce.net:80
Seems to be OK, have nu clue what to test next...
Just to notice, we have 7 Windows based master Jenkins machines, all of them have the same issue, when updating to Swarm client from 3.36 to 3.37 and restarting master and slaves, then all slaves give the same error:
Nov 03, 2022 3:54:12 PM hudson.plugins.swarm.Client run
SEVERE: An error occurred
hudson.plugins.swarm.RetryException: Failed to create a Swarm agent on Jenkins. Response code: 400
<h1>Bad Message 400</h1><pre>reason: Bad Request</pre>
at hudson.plugins.swarm.SwarmClient.createSwarmAgent(SwarmClient.java:405)
at hudson.plugins.swarm.Client.run(Client.java:216)
at hudson.plugins.swarm.Client.main(Client.java:68)
Unfortunately, I don't have any other ideas to offer that you might explore. Port 80 is a reasonable port to use. Port 8080 is a little bit surprising as the inbound agent port, but it should work.
When using Swarm client 3.38 or 3.39 the same issue occurs, no change
I could not reproduce any problems locally. I would recommend you step through the code in a debugger, or with WireShark, and compare the HTTP requests and responses (including headers and status codes) between 3.36 and 3.37 to see what has changed.
Added option -Xdebug -agentlib:jdwp=transport=dt_socket,server=y,address=5005,suspend=n to debug jar file, still getting this output:
Dec 12, 2022 9:02:41 AM hudson.plugins.swarm.SwarmClient getCsrfCrumb
SEVERE: Could not obtain CSRF crumb. Response code: 400
<h1>Bad Message 400</h1><pre>reason: Bad Request</pre>
Dec 12, 2022 9:02:41 AM hudson.plugins.swarm.Client run
SEVERE: An error occurred
hudson.plugins.swarm.RetryException: Failed to create a Swarm agent on Jenkins. Response code: 400
<h1>Bad Message 400</h1><pre>reason: Bad Request</pre>
at hudson.plugins.swarm.SwarmClient.createSwarmAgent(SwarmClient.java:405)
at hudson.plugins.swarm.Client.run(Client.java:216)
at hudson.plugins.swarm.Client.main(Client.java:68)
No change, i do not know how to debug or get other info from the jar file please advice.
Please note when trying to reproduce use a Windows 10 Jenkins master and Windows 10 Jenkins Slave !!!
Created WireShark data from Ethernet (Default), Ethernet and Loopback for both Failed (3.39) and OK (3.36) runs.
This is how the data was Captured:
- Start Caprure
- Enable Swarm slave
- Download wget
- Download swarm-client 3.39 from master
- Run Swarm jar
- Wait for error
- Stop Swarm slave
- End capture.
You would need to gather Wireshark data for the HTTP protocol before upgrade (passing scenario) and after upgrade (failing scenario) and compare the two to see what has changed. Fundamentally this is just making HTTP requests (URL, request headers, and request body) and getting HTTP responses (URL, response headers, and response body) in a sequence, so identifying the change is key to understanding the problem. The same information (request headers, request body, response headers, and response body for each URL) could also be gleaned from a Java debugger by setting a breakpoint at the internal methods of the HTTP client library.
Difference between OK and Failed is in crumb issuer:
OK
GET /crumbIssuer/api/xml?xpath=concat%28%2F%2FcrumbRequestField%2C%22%3A%22%2C%2F%2Fcrumb%29 HTTP/1.1
Host: 80
Accept-Encoding: gzip, x-gzip, deflate
Authorization: Basic <xxx>
Connection: keep-alive
Cookie: JSESSIONID.aae89c1b=node0adu21rnr8f0a5wqcu6ydpt5o7.node0
User-Agent: Apache-HttpClient/5.1.3 (Java/11.0.11)
HTTP/1.1 200 OK
Content-Length: 89
Connection: keep-alive
Content-Encoding: gzip
Content-Type: text/plain;charset=utf-8
Date:04 GMT
Server: Jetty(10.0.11)
X-Content-Type-Options: nosniff
X-Frame-Options: deny
X-Jenkins: 2.361.4
X-Jenkins-Session: 6bd9c682
X-Rbt-Cli: Name=ovl-steelhead-mgt; Ver=9.9.3a;
Jenkins-Crumb:dff39dec0991d889e101fb64062bdf8c1bf2140fae88f75d6915c0ee96ede584
FAILED:
GET /crumbIssuer/api/xml?xpath=concat%28%2F%2FcrumbRequestField%2C%22%3A%22%2C%2F%2Fcrumb%29 HTTP/1.1
Host: cp-www527.gos.oce.net
Authorization: Basic <xxxx>
Connection: Upgrade, HTTP2-Settings
Content-Length: 0
Http2-Settings: AAEAAEAAAAIAAAABAAMAAABkAAQBAAAAAAUAAEAA
Upgrade: h2c
User-Agent: Java-http-client/11.0.11
HTTP/1.1 400 Bad Request
Connection: close
Content-Length: 54
Content-Type: text/html;charset=iso-8859-1
Date:24 GMT
Server: Jetty(10.0.11)
X-Rbt-Cli: Name=ovl-steelhead-mgt; Ver=9.9.3a;
<h1>Bad Message 400</h1><pre>reason: Bad Request</pre>
Jenkins master is started on windows with these options:
-Xrs -Xmx1024m -Dhudson.lifecycle=hudson.lifecycle.WindowsServiceLifecycle -jar "E:\Jenkins\jenkins.war" --httpPort=80 --webroot="%LocalAppData%\Jenkins\war
JENKINS_HOME=E:\Jenkins\.jenkins
Java executable=E:\Java\jre-11.0.8+10\bin\java.exe
Was that the only HTTP request that was made, or were there others? If there are others, please include each HTTP sequence (request URL, request headers, response status code, response headers) in order, even if they are identical, so that we can see the sequence leading up to the failure.
OK request with Swarm 3.36
🇳🇱 134.188.170.175:57959 10.95.5.43:80 (GET)
GET /computer/SIL000131/slave-agent.jnlp HTTP/1.1
Host: cp-www527.gos.oce.net
Accept: text/html, image/gif, image/jpeg, ; q=.2, */; q=.2
Authorization: <---------------->
Connection: keep-alive
User-Agent: Java/11.0.11
HTTP/1.1 200 OK
Content-Length: 243
Connection: keep-alive
Content-Type: application/x-java-jnlp-file
Date:57 GMT
Server: Jetty(10.0.11)
X-Content-Type-Options: nosniff
X-Rbt-Cli: Name=ovl-steelhead-mgt; Ver=9.9.3a;
<jnlp><application-desc><argument>f0c12d6de5c3708a739cea6385fa8e19aefdab0a4c7486a5205b477131182dbc</argument><argument>SIL000131</argument><argument>-url</argument><argument>http://cp-www527.gos.oce.net:80/</argument></application-desc></jnlp>
GET /tcpSlaveAgentListener/ HTTP/1.1
Host: cp-www527.gos.oce.net
Accept: text/html, image/gif, image/jpeg, ; q=.2, */; q=.2
Authorization: Basic <---------------->
Connection: keep-alive
User-Agent: Java/11.0.11
HTTP/1.1 200 OK
Content-Length: 12
Connection: keep-alive
Content-Type: text/plain;charset=utf-8
Date:57 GMT
Server: Jetty(10.0.11)
X-Content-Type-Options: nosniff
X-Hudson-Jnlp-Port: 8080
X-Instance-Identity: MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAj9BVSxIaFhvM+iySInXgUxew3WWQC4J0gW0UKHpL70YP2/iywgo1J+mwLi0T4dypDQng39Nq5FHU4Y7vgLhisIZAciROiXsqMFaSKbPq0W+psoL1HiFefJ24o/nMZEG7Sd/PFhK6D4GYRF4Hi51WUxQ1WiHN5noCUPIkSkSK+QjRhpVf+xuk1uprkdqzfPvx18hMvAP/on/vrER5pUG3P5Hch9sObkoBJTLWp8X/TzZfzGseT7kf9JoBlOQiyJ7g74oPVhGYQtOC3sp7VO6MijersTqLUuTLvBZrfEQp7UF9eCeRNxWKs8EeR+cHHlKUt6dQDm8jP3qgGbX2ZVh/HQIDAQAB
X-Jenkins-Agent-Protocols: JNLP4-connect, Ping
X-Jenkins-Jnlp-Port: 8080
X-Rbt-Cli: Name=ovl-steelhead-mgt; Ver=9.9.3a;
X-Remoting-Minimum-Version: 4.2.1
Jenkins
🇳🇱 134.188.170.175:57958 10.95.5.43:80 (GET, POST)
GET /crumbIssuer/api/xml?xpath=concat%28%2F%2FcrumbRequestField%2C%22%3A%22%2C%2F%2Fcrumb%29 HTTP/1.1
Host: 80
Accept-Encoding: gzip, x-gzip, deflate
Authorization: Basic <---------------->
Connection: keep-alive
Cookie: JSESSIONID.f0735c1f=node06yjtxbv6el2buh4vtc6flc7b4.node0
User-Agent: Apache-HttpClient/5.1.3 (Java/11.0.11)
HTTP/1.1 200 OK
Content-Length: 89
Connection: keep-alive
Content-Encoding: gzip
Content-Type: text/plain;charset=utf-8
Date:57 GMT
Server: Jetty(10.0.11)
X-Content-Type-Options: nosniff
X-Frame-Options: deny
X-Jenkins: 2.361.4
X-Jenkins-Session: 0d1b6de9
X-Rbt-Cli: Name=ovl-steelhead-mgt; Ver=9.9.3a;
Jenkins-Crumb:7099eff409730c887b2f9f7ae66097fbaa6cee804b8126bb9b0ca0b88adca0e8
POST /plugin/swarm/createSlave?name=SIL000131&executors=1&remoteFsRoot=c%3A%5Coce%5Cjenkins&description=SIL000131+runs+do_not_use&labels=SIL000131+do_not_use&mode=NORMAL&hash=&deleteExistingClients=true&keepDisconnectedClients=false HTTP/1.1
Host: 80
Connection: close
Accept-Encoding: gzip, x-gzip, deflate
Authorization: Basic <---------------->
Connection: close
Content-Length: 0
Content-Type: text/plain; charset=ISO-8859-1
Cookie: JSESSIONID.f0735c1f=node06yjtxbv6el2buh4vtc6flc7b4.node0
Jenkins-Crumb: 7099eff409730c887b2f9f7ae66097fbaa6cee804b8126bb9b0ca0b88adca0e8
User-Agent: Apache-HttpClient/5.1.3 (Java/11.0.11)
HTTP/1.1 200 OK
Connection: close
Transfer-Encoding: chunked
Content-Encoding: gzip
Content-Type: text/plain;charset=iso-8859-1
Date:57 GMT
Server: Jetty(10.0.11)
X-Content-Type-Options: nosniff
X-Rbt-Cli: Name=ovl-steelhead-mgt; Ver=9.9.3a;
X-Rbt-Optimized-By: ovl-steelhead-mgt (RiOS 9.9.3a) IK
#
#Wed Dec 14 11:05:57 CET 2022
name=SIL000131
FAILED request with Swarm 3.39
🇳🇱 134.188.170.175:57760 10.95.5.43:80 (GET)
GET /crumbIssuer/api/xml?xpath=concat%28%2F%2FcrumbRequestField%2C%22%3A%22%2C%2F%2Fcrumb%29 HTTP/1.1
Host: cp-www527.gos.oce.net
Authorization: Basic <---------------->
Connection: Upgrade, HTTP2-Settings
Content-Length: 0
Http2-Settings: AAEAAEAAAAIAAAABAAMAAABkAAQBAAAAAAUAAEAA
Upgrade: h2c
User-Agent: Java-http-client/11.0.11
HTTP/1.1 400 Bad Request
Connection: close
Content-Length: 54
Content-Type: text/html;charset=iso-8859-1
Date:39 GMT
Server: Jetty(10.0.11)
X-Rbt-Cli: Name=ovl-steelhead-mgt; Ver=9.9.3a;
<h1>Bad Message 400</h1><pre>reason: Bad Request</pre>
🇳🇱 134.188.170.175:57761 10.95.5.43:80 (POST)
POST /plugin/swarm/createSlave?name=SIL000131&executors=1&remoteFsRoot=c%3A%5Coce%5Cjenkins&description=SIL000131+runs+do_not_use&labels=SIL000131+do_not_use&mode=NORMAL&hash=&deleteExistingClients=true&keepDisconnectedClients=false HTTP/1.1
Host: cp-www527.gos.oce.net
Authorization: Basic <---------------->
Connection: Upgrade, HTTP2-Settings
Content-Length: 0
Http2-Settings: AAEAAEAAAAIAAAABAAMAAABkAAQBAAAAAAUAAEAA
Upgrade: h2c
User-Agent: Java-http-client/11.0.11
HTTP/1.1 400 Bad Request
Connection: close
Content-Length: 54
Content-Type: text/html;charset=iso-8859-1
Date:40 GMT
Server: Jetty(10.0.11)
X-Rbt-Cli: Name=ovl-steelhead-mgt; Ver=9.9.3a;
<h1>Bad Message 400</h1><pre>reason: Bad Request</pre>
The diff is clearly in OK request:
Connection: keep-alive
User-Agent: Java/11.0.11
HTTP/1.1 200 OK
Content-Length: 243
Connection: keep-alive
Content-Type: application/x-java-jnlp-file
In FAILED request:
Connection: Upgrade, HTTP2-Settings
Content-Length: 0
Http2-Settings: AAEAAEAAAAIAAAABAAMAAABkAAQBAAAAAAUAAEAA
Upgrade: h2c
User-Agent: Java-http-client/11.0.11
So Connection Type and User-Agent are different.
I added the txt file captures.
In the passing scenario you showed two GET requests, one to /computer/SIL000131/slave-agent.jnlp with Host: cp-www527.gos.oce.net and one to /crumbIssuer/api/xml with Host: 80.
In the failing scenario you showed a GET and a POST request, one GET to /crumbIssuer/api/xml with Host: cp-www527.gos.oce.net and one POST to /plugin/swarm/createSlave with Host: cp-www527.gos.oce.net.
I could see how Swarm might plausibly be sending different headers or Host values from one version to the next, but I do not see how it could be sending completely different GET/POST requests as your data seems to show. Therefore I suspect your are not providing all of the data in this ticket. Please double check that you are providing all of the GET and POST requests in both the passing and failing scenarios.
Additionally, the Host: 80 line in your passing scenario seems suspect. This value should not be a port number. I took a closer look at your passing scenario and saw you are passing -url http://cp-www527.gos.oce.net:80/. Does the passing scenario on 3.36 remain passing when you remove the ":80" from the URL? If not, then the problem is not with the Swarm HTTP client upgrade but with your configuration. You might try generating a crumb with wget or curl as described in https://docs.cloudbees.com/docs/cloudbees-ci-kb/latest/client-and-managed-masters/csrf-protection-explained (with and without the ":80") - if it does not work there, then it certainly cannot work in Swarm. Overall I suspect there is a misconfiguration on your side that just happened to work by accident with the older Swarm HTTP client.
Tried this on the host:
C:\Users\xxx>curl -u "<user>:<pwd>" http://cp-www527.gos.oce.net:80/crumbIssuer/api/xml?xpath=concat(//crumbRequestField,":",//crumb) >80.txt
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 7003 100 7003 0 0 6513 0 {}:{}:{} {}:{}:{} {}:{}:{} 6514
C:\Users\xxx>curl -u "<user>:<pwd>" http://cp-www527.gos.oce.net/crumbIssuer/api/xml?xpath=concat(//crumbRequestField,":",//crumb) >def.txt
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 7003 100 7003 0 0 7243 0 {}:{}:{} {}:{}:{} {}:{}:{} 7241
Both 80.txt and def.txt contain the crumb. When i use http://cp-www527.gos.oce.net:8080 i get: Not Found
Seems that http://cp-www527.gos.oce.net:80/ and http://cp-www527.gos.oce.net/ work
See attached txt files.
Have you read the third paragraph in my last reply? If you have, you have not addressed that point.
Checked and double checked that i am adding both GET and POST requests. Attached txt files for 2nd run.
Your recent upload makes a lot more sense and shows that Swarm can't obtain a CSRF crumb from its initial GET to the crumb issuer API. Yet an older version of Swarm manages to (even with the wonky and incorrect "Host: 80" header) and so does curl. So my next recommendation for you would be to get curl to print out the headers it is using, and experiment a little bit more with making requests to the crumb issuer API with various combinations of headers until you managed to trigger the problem. Your goal should be to isolate which header is causing the server to return 400 Bad Request.
When running the curl command with -s -D - -o <file>.txt i get the output below, here it does not matter if i use :80 or leave it out:
> curl -u "<user>:<pass>" -s -D - -o <file>.txt http://cp-www527.gos.oce.net:80/crumbIssuer/api/xml?xpath=concat(//crumbRequestField,":",//crumb)
HTTP/1.1 500 Server Error
Date: Mon, 19 Dec 2022 08:00:47 GMT
X-Content-Type-Options: nosniff
X-Jenkins: 2.361.4
X-Jenkins-Session: 0d1b6de9
X-Frame-Options: deny
Set-Cookie: JSESSIONID.f0735c1f=node0gcucvm4deh8i1diyxulc6nfwx56.node0; Path=/; HttpOnly
Expires: 0
Content-Type: text/html;charset=utf-8
Cache-Control: no-cache,no-store,must-revalidate
X-Hudson-Theme: default
Referrer-Policy: same-origin
Cross-Origin-Opener-Policy: same-origin
X-Hudson: 1.395
X-Frame-Options: sameorigin
X-Instance-Identity: MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAj9BVSxIaFhvM+iySInXgUxew3WWQC4J0gW0UKHpL70YP2/iywgo1J+mwLi0T4dypDQng39Nq5FHU4Y7vgLhisIZAciROiXsqMFaSKbPq0W+psoL1HiFefJ24o/nMZEG7Sd/PFhK6D4GYRF4Hi51WUxQ1WiHN5noCUPIkSkSK+QjRhpVf+xuk1uprkdqzfPvx18hMvAP/on/vrER5pUG3P5Hch9sObkoBJTLWp8X/TzZfzGseT7kf9JoBlOQiyJ7g74oPVhGYQtOC3sp7VO6MijersTqLUuTLvBZrfEQp7UF9eCeRNxWKs8EeR+cHHlKUt6dQDm8jP3qgGbX2ZVh/HQIDAQAB
Server: Jetty(10.0.11)
X-RBT-CLI: Name=ovl-steelhead-mgt; Ver=9.9.3a;
Connection: keep-alive
Content-Length: 7003
Since i have no knowledge about crumb i would like to know i can make requests to the crumb issuer API with various combinations of headers ?
Please advise !
Since i have no knowledge about crumb i would like to know [how] i can make requests to the crumb issuer API with various combinations of headers ?
Pick an HTTP client (e.g. curl) and then search its documentation for how to set headers when making a request.
Tried, no luck, always get the same result. When specifying in curl -H "Connection: xxx", I get: unknown host xxx
I have no clue why and how i can test it with curl.
I get the feeling that something in Swarm client has changed because in Swarm 3.36 i see always "Connection: keep-alive", in Swarm version 3.37 or higher i see "Connection: Upgrade, HTTP2-Settings" or "Connection: close"
I have this Windows curl version installed:
curl --version
curl 7.83.1 (Windows) libcurl/7.83.1 Schannel
Release-Date: 2022-05-13
I get the feeling that something in Swarm client has changed because in Swarm 3.36 i see always "Connection: keep-alive", in Swarm version 3.37 or higher i see "Connection: Upgrade, HTTP2-Settings" or "Connection: close"
That much is obvious. But what you have failed to identify is which header change causes the problem for you.
When you do not have the knowledge on curl or on its headers it looks like you ask somebody to drive a car who is used to riding a camel...
So any hint would be nice
I appreciate that differential diagnosis of this form can be challenging for an end user (as opposed to a plugin developer). Believe me, if I could reproduce the problem locally, I would debug it and post a PR with the fix. But since I can't reproduce the problem locally, the debugging is going to need to take place on your side, where the problem is actually occurring. And unfortunately I would not be able to debug the problem for you even if you did give me access to your environment, at least not for free: that is getting into the territory of a Support or Professional Services engagement. Is there a Java developer at your company that you could enlist to help you debug this? I don't think it would be very difficult for someone with the right skillset. Unfortunately debugging "by telegram" is not only inefficient but usually ineffective: the debugging process requires exploration and on-the-fly reasoning, which is not simple to explain over long-form asynchronous communication. In the worst case you may have to wait for another user to hit the same problem and debug it. But look on the bright side: at least with open source software you have a chance at debugging the issue yourself: with proprietary and/or SaaS products the odds are much lower!
I know also for sure that this change caused the issue: https://github.com/jenkinsci/swarm-plugin/pull/493
That much is also obvious, pointing to the fact that if you are unable or unwilling to debug the issue you can always maintain a private fork with that commit reverted.
No sense in maintaining a private fork, we want to use the latest version or at least keep up-to-date, this is why we spend effort in trying to debug the issue, however no luck so far because we have a lack of knowledge and help is appreciated.
Still not able to debug the jar file, tried eclipse but i cannot view the source
However in trying to set up a test environment on my Personal LapTop connected trough VPN@home i found that my LapTop can connect using swarm client v3.39. Need to test this @office tomorrow.
What i also noticed in the logging is that there is a difference between my LapTop@VPN@Home and a dedicated desktop/server@Office-computer-room that the LapTop does not do a crumb request.
Here is the logging of my LapTop:
INFO: Attempting to connect to http://<jenkins URL>:80/
jan. 25, 2023 11:08:17 A.M. hudson.remoting.jnlp.Main createEngine
INFO: Setting up agent: <LAPTOP NAME>
The logging of the dedicated desktop/server:
INFO: Attempting to connect to http://<jenkins URL>:80/
Jan 25, 2023 11:07:17 AM hudson.plugins.swarm.SwarmClient getCsrfCrumb
SEVERE: Could not obtain CSRF crumb. Response code: 400
It seems that the swarm code running another request for the same settings, but possibly i am wrong...
Connected LapTop to wired and wifi @Office now i see the: error: Could not obtain CSRF crumb. Response code: 400
Tried to set -Dhttp.proxyHost and -Dhttp.proxyPort then i get error: 502
Somehow it seems that the swarm-client uses random ports to communicate and not the ports defined in Manage Jenkins->Security->TCP port for inbound agents
We have set this to "Fixed: 8080", but we see other ports used for communication., see below a capture from WireShark
Can it be that the issue is caused by the optimize option of the firewall?
Our IT specialists whitelisted the subnets on which the Jenkins nodes are placed to not use the optimization.
Now the issue seems to be solved, issue closed....
The Jenkins project does not test Jenkins or its components with the OpenJ9 Java virtual machine. Does the same problem happen when using a recent Hotspot JDK like Adoptium 11.0.17? If not, then please switch to the hotspot virtual machine rather than the OpenJ9 virtual machine.
If it fails with the hotspot virtual machine, then please provide more details so that others can duplicate the problem. I frequently run a swarm 3.37 agent with Jenkins 2.361.3. The agent command line that I use is: