-
Bug
-
Resolution: Fixed
-
Minor
-
Powered by SuggestiMate -
2.317
Testing CLI, agent connections with the new '-websocket' functionality added by
JEP-222.
Jetty access log shows:
172.18.0.3 - - [25/Feb/2020:01:48:06 +0000] "GET /cli/ws HTTP/1.1" 101 0 "-" "-"
CLI output:
javax.websocket.DeploymentException: Handshake response not received. at org.glassfish.tyrus.client.ClientManager$3$1.run(ClientManager.java:694) at org.glassfish.tyrus.client.ClientManager$3.run(ClientManager.java:712) ... at org.glassfish.tyrus.client.ClientManager$SameThreadExecutorService.execute(ClientManager.java:866) at java.base/java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:118) at org.glassfish.tyrus.client.ClientManager.connectToServer(ClientManager.java:511) at org.glassfish.tyrus.client.ClientManager.connectToServer(ClientManager.java:355) at hudson.cli.CLI.webSocketConnection(CLI.java:323) at hudson.cli.CLI._main(CLI.java:301) at hudson.cli.CLI.main(CLI.java:95)
When I attach a debugger to the Jenkins server it seems to get stuck here:
https://github.com/jenkinsci/jenkins/blob/master/core/src/main/java/hudson/cli/CLIAction.java#L255
- is duplicated by
-
JENKINS-61253 Remoting using WebSocket fails with "Handshake response not received" Exception
-
- Closed
-
-
JENKINS-66836 Handshake response not received with release 1.30.2 and We
-
- Closed
-
-
JENKINS-63011 No slave connection with remoting webSocket
-
- Closed
-
-
JENKINS-63313 WS agent can't connect to with jdk11, works with jdk8
-
- Closed
-
-
JENKINS-67517 Websocket agent connection fail
-
- Closed
-
[JENKINS-61212] CLI, Agent -websockets DeploymentException: Handshake response not received on jdk-11
Probably something specific to your environment. First try using JDK 8 on both client and server (11 is in general poorly tested in Jenkins). Then check your service setup. You are using the built-in Winstone/Jetty servlet container? How is HTTPS being terminated—by Jetty’s built-in options (never before tested with WebSocket AFAIK), or using some sort of reverse proxy (which)?
jglick - thanks for the prompt response.
You are probably right. I'll do testing with different combinations of jdk-8 and without HTTPS today.
It will take me a bit of time to come up with the smallest reproducible setup and test with jdk-8.
server-*.pem is a self-signed cert with :
localhost 127.0.0.1 jenkins-server
google chrome likes it when imported (Mac/Linux)
The client / agent are connecting using a different container over a bridge network.
server:
# winstone args --httpsPort=${HTTPS_PORT-8443} \ --httpsPrivateKey=/var/jenkins_home/tls/server-rsa.pem \ --httpsCertificate=/var/jenkins_home/tls/server-cert.pem} \ --extraLibFolder=/var/jenkins_home/extraLibs \ --accessLoggerClassName=winstone.accesslog.SimpleAccessLogger \ --simpleAccessLogger.format=combined \ --simpleAccessLogger.file=/var/jenkins_home/logs/access.log \
client:
java \ -Djavax.net.ssl.trustStore=/var/lib/jenkins/truststore.jks \ -Djavax.net.ssl.trustStorePassword=jenkins \ -Djava.util.logging.config.file=/var/lib/jenkins/debug-logging.properties \ -jar "../jenkins-cli.jar" \ -s "https://jenkins-server:8443" \ -webSocket \ -logger FINE \ -auth "$AGENT_AUTH" \ <args>
Definitely I never tried to test against Winstone’s built-in --httpsPort, only against TLS termination done by reverse proxies (specifically Kubernetes ingress). In principle it ought to work to the extent that Jetty supports TLS + WebSocket.
Offhand I suspect that the system properties you are passing to the client are not sufficient for the Tyrus library to recognize the custom certificate. Do you get the same error if you simply omit those properties? If so, this may just be a matter of inadequate error logging.
jglick - I retested with jdk8 and the web-socket inbound cli/agent are working.
I'll update with more findings later today when I have more info - and you can decide if this is a bug or a jdk-11 compat issue to be addressed as described at jenkins-on-java-11
BTW - I was https with jdk-8 also.
Interesting. Is it the client or the server that matters? There are smoke tests of WebSocket modes in the Jenkins core tree which get run on JDK 11 in CI, but this is not using Jetty-terminated TLS or custom certificates.
jglick - I'll try to distill it down to the simplest reproducible jdk-11 setup (without TLS), and test it with your smoke test and websocat.
jglick - hm... you are correct.
jdk-11 with HTTP is working.
TLS+websocket inbound cli/agent works with jdk-8, but NOT jdk-11.
Very nice. Using jdk-8 server / agent I can now use an AWS ALB for all Jenkins server ingress server-UI / CLI / Agents.
Still unclear from comments whether the issue is caused by using Java 11 for the client or the server (or both).
jglick just tested a bunch combinations, results below.
https+websockets
- jdk-11 server - the CLI/agent to hang (jdk-8 and jdk-11 agent).
- jdk-8 server - the CLI/agent connect as expected (jdk-8 and jdk-11 agent)
http+websockets
- all combinations of server / CLI/agent jdk work as expected.
TLDR - only jdk-11 server w/HTTPS port SERVER_URL hangs (client jdk doesn't matter in this case).
To get a self-signed cert that google chrome likes, while using docker for mac, and being limited to the 1 server url is a bit tricky.
When I can I'll try and extract out the simplest setup to reproduce the behavior.
Has there been any movement on this? We can't run JDK-8 and must use TLS for the traffic between agent and server. Forcing people to use an old JDK or NOT use encryption doesn't seem like a minor bug. It's a security issue. It's a hard blocker for us.
peterloron - I haven't gotten around around to creating a github repo with a small reproducible case yet.
Once we have that - the Jenkins core contributors devs are fairly responsive.
I can take stab at this later today.
I have no immediate plans to work on this—Java 11 support in Jenkins remains more or less experimental. I would recommend running on Java 8 unless you have a particular reason you must run on 11. (In which case you can continue to use TCP inbound agents.)
Java 11 support in Jenkins remains more or less experimental
More or less experimental than remoting over websocket?
FWIW we announced full Java 11 support more than a year ago, so even if there are some kinks due to very low usage numbers relative to Java 8, admins have a reasonable expectation of it working.
jglick, danielbeck, peterloron - Ok, I did put together a "minimal" Github repo to help reproduce the issue.
https://github.com/fred-vogt/jenkins-websockets-tester
However it doesn't exhibit the hanging issue. Hmm.
One config item missing in the sample here is global security.
I'll add that in and retest when I get a chance and report back.
This setup uses local HTTPS, with default host web browser.
Jenkins server, agent running in docker.
I should add that the motivating use case for JEP-222 is Jenkins running behind a reverse proxy (such as Kubernetes ingress), in which case TLS is typically terminated at the proxy. If the problem involves some sort of incompatibility between the TLS implementation in Tyrus (or whatever library it is using for that) on the client side vs. the TLS implementation in Jetty running on Java 11 on the server side, then it would presumably not affect this environment.
I've tried setting up a node with the websocket option and got the handshake error specified here. Both the node and the Jenkins host are using openJDK 11. The Jenkins host is behind Apache using mod_proxy. Apache is terminating the TLS connection. Switching off the websocket option and opening the JNLP port allows my node to connect to the host.
jpschewe - I haven't isolated the smallest reproducible case.
- https://github.com/fred-vogt/jenkins-websockets-tester - [no authentication] + [TLS] - no hang
- https://github.com/cicdenv/cicdenv/tree/master/jenkins - [Github OAuth] + [TLS] - hangs
You encountering this issue is at least another data point.
I'm starting my node on Linux with the following command inside a shell script. The shell script is launched as an unprivileged user from systemd.
nohup java -jar agent.jar -jnlpUrl https://${jenkins_host}/computer/${jenkins_node_name}/slave-agent.jnlp -secret ${jenkins_node_secret} -workDir "${HOME}" > "${HOME}"/jenkins-node.log 2>&1
I have the same error when I enable the websocket option for a node that is running on Windows 10 with Oracle Java 1.8.
UPD 2020-10-16: Caused by wrong configured Websocket on Apache Reverse Proxy
Jenkins Master on Ubuntu Server with OpenJDK 8 + Agent on Windows 10 with AdoptOpenJDK 8
Okt 01, 2020 1:01:04 PM hudson.remoting.jnlp.Main$CuiListener error SEVERE: Handshake error. io.jenkins.remoting.shaded.javax.websocket.DeploymentException: Handshake error. at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$3$1.run(ClientManager.java:674) at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$3.run(ClientManager.java:712) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$SameThreadExecutorService.execute(ClientManager.java:866) at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112) at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager.connectToServer(ClientManager.java:511) at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager.connectToServer(ClientManager.java:355) at hudson.remoting.Engine.runWebSocket(Engine.java:628) at hudson.remoting.Engine.run(Engine.java:470) Caused by: io.jenkins.remoting.shaded.org.glassfish.tyrus.core.HandshakeException: Response code was not 101: 400. at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.TyrusClientEngine.processResponse(TyrusClientEngine.java:320) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.ClientFilter.processRead(ClientFilter.java:189) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:134) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:136) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.SslFilter.handleRead(SslFilter.java:406) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.SslFilter.processRead(SslFilter.java:368) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:134) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:136) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.completed(TransportFilter.java:299) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.completed(TransportFilter.java:283) at sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:126) at sun.nio.ch.Invoker$2.run(Invoker.java:218) at sun.nio.ch.AsynchronousChannelGroupImpl$1.run(AsynchronousChannelGroupImpl.java:112) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
benbrummer this issue is about a specific condition (Java 11 server, TLS terminated at Jenkins rather than a reverse proxy). “Handshake errors” under other conditions are likely unrelated problems, possibly something with your environment.
At least superficially sounds like https://github.com/eclipse-ee4j/tyrus/issues/613 but could probably be unrelated.
Ought to try upgrading Tyrus as a first step.
https://github.com/eclipse-ee4j/tyrus/issues/676 also sounds similar, though that is about the Java version of the client.
Please try https://repo.jenkins-ci.org/incrementals/org/jenkins-ci/main/remoting/4.6-rc2798.e8ee0de5c872/remoting-4.6-rc2798.e8ee0de5c872.jar from https://github.com/jenkinsci/remoting/pull/403 to see if that makes a difference.
A quick note: in most cases that I have seen with the error "Response code was not 101: 400", it was due to a front load balancer not supporting Websockets. A concrete example is when using an AWS Classic Load Balancer with Layer 7 listeners (HTTP / HTTPS) doing TLS Termination: that particular load balancer when termintaing TLS just override headers and does not forward the websocket request headers (such as sec-websocket-key, sec-websocket-version, Upgrade: websocket, ...).
jglick I can confirm that that the issue is still present with:
- core `v2.263` / remoting `4.5` + jdk11
- core `v2.263` / remoting `4.6-rc2798.e8ee0de5c872` + jdk11
I tested this both as `4.5` (server) `4.6` (agent) , and `4.6` (server+agent).
I can confirm this issue is still present with AdoptOpenJDK 11.0.1+9 and Remoting 4.9.
Also confirm that this bug is present in Remoting 4.9. Tested on both CentOS 8.2 (openjdk version "1.8.0_252") and Mac OS X Big Bear (java 11.0.9 2020-10-20 LTS)
You can pass -Djdk.tls.client.protocols=TLSv1.2 to the agent JVM to workaround.
The root cause is documented in https://github.com/eclipse-ee4j/tyrus/issues/676, and apparently fixed by https://github.com/eclipse-ee4j/tyrus/pull/707 but not in a release yet.
If https://github.com/jenkinsci/remoting/pull/478 gets a green build, https://repo.jenkins-ci.org/native/incrementals/org/jenkins-ci/main/remoting/4.11-rc2954.2fca0f20eb1d/remoting-4.11-rc2954.2fca0f20eb1d.jar should be published which would be an agent JAR that (if I understand correctly) would solve this issue. Anyone able to reproduce this issue who cares to try it?
Indeed. CI seems to be broken in trunk—some sort of environmental issue (lack of RAM?), pending some work from a maintainer. If you are set up to do so, you could run a Maven build locally:
mvn -Pquick-build package
using the resulting target/remoting-4.11-SNAPSHOT.jar. I could of course attach a binary here but you should not be in the habit of running binaries built from my laptop!
I've faced this when tried to use jenkins/inbound-agent:4.10-2-alpine based Docker image for jnlp containers in Kubernetes pods, but, in my case, switching to Ubuntu 20.04 based image helped. Its configuration includes:
$ java -version openjdk version "11.0.11" 2021-04-20 OpenJDK Runtime Environment (build 11.0.11+9-Ubuntu-0ubuntu2.20.04) OpenJDK 64-Bit Server VM (build 11.0.11+9-Ubuntu-0ubuntu2.20.04, mixed mode, sharing)
and quite old agent.jar version:
$ java -jar /usr/share/jenkins/agent.jar -version 3.36
oxygenxo keep in mind that 3.36 lacks security fixes that have been published.
danielbeck markewaite - I noticed this was mentioned in the `v2.313` https://www.jenkins.io/changelog/.
I can update my original test environment to confirm the fix is in `v2.313`.
Thanks build_admiral. danielbeck I set this as "Closed" because it was mentioned in the 2.313 changelog and did not seem to have obvious other cases which would prompt deeper testing before marking it as "Fixed in 2.313".
Should I be doing more rigorous verification of fixes that are in Jenkins core weekly releases before they are listed as "Fixed in release x.yyy"?
I think this one is unusual, as even Jesse as PR author wrote he didn't test and we have no other confirmation yet. No need to build process around this.
build_admiral That would be very helpful.
This was certainly not fixed for agents; https://github.com/jenkinsci/remoting/pull/478 was just merged, and has not yet been released, much less incorporated into core and that released. It may have been fixed for the CLI.
You can try running https://repo.jenkins-ci.org/incrementals/org/jenkins-ci/main/remoting/4.11-rc2954.2fca0f20eb1d/remoting-4.11-rc2954.2fca0f20eb1d.jar as an agent JAR.
I was able to successfully launch WebSocket JDK11 agents using self-signed CA with the custom agent.jar posted above, via https://github.com/felipecrs/jenkins-agent-dind/pull/34 which can be tested with:
docker pull ghcr.io/felipecrs/jenkins-agent-dind:pr-34
In case someone else wants to try.
launch WebSocket JDK11 agents
Remember that, as per comment of 2020-03-02, the crucial issue seems to be whether the controller was running 8 or 11. So please test that specifically, if you have not done so already.
My controller is using JDK11 always. But regardless, I previously tested the Agent with JDK11 (without the fix) and it was not working.
Good news—that means we can have some confidence in marking this fixed once the remoting change is integrated into a core weekly.
* The fix was released in Remoting 4.11
- https://github.com/jenkinsci/jenkins/pull/5821 picks up the fix into the Weekly. Should be released next week in 2.317
CC jglick