Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-63014

Agent connection error using nginx proxy with WebSocket

    • Jenkins 2.248

      I'm using nginx in a proxy setup. One public/ingress with a single upstream Jenkins instance. The attached docker-compose file can be used to recreate the exact environment. My test docker is running on Windows 10. The domain "ingress" was added hosts file as "127.0.0.1 ingress" for convenience.

      Basic Jenkins setup steps, root URL is set to _http://ingress/jenkins/

      Agent is connecting via JNPL with WebSocket option enabled. The following error message is reported:

      C:\workdir\jenkins\http\agent-1>java -jar agent.jar -jnlpUrl http://ingress/jenkins/computer/agent-1/slave-agent.jnlp -secret 4068cc653d7d0ca16f72404ac6ad62d5fe19f5798f5b3f0807c6ecf50fba4353 -workDir "c:\workdir\jenkins\http\agent-1"
      Jul 08, 2020 8:33:25 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
      INFO: Using c:\workdir\jenkins\http\agent-1\remoting as a remoting work directory
      Jul 08, 2020 8:33:26 PM org.jenkinsci.remoting.engine.WorkDirManager setupLogging
      INFO: Both error and output logs will be printed to c:\workdir\jenkins\http\agent-1\remoting
      JNLP file http://ingress/jenkins/computer/agent-1/slave-agent.jnlp?encrypt=true has invalid arguments: [4068cc653d7d0ca16f72404ac6ad62d5fe19f5798f5b3f0807c6ecf50fba4353, agent-1, -webSocket, -workDir, c:\workdir\jenkins\http\agent-1, -internalDir, remoting, -url, http://ingress/jenkins/, -url, http://jenkins:8080/jenkins/, -headless, -workDir, c:\workdir\jenkins\http\agent-1, -internalDir, remoting]
      Most likely a configuration error in the master
      -webSocket supports only a single -url
      

      There are 2 URLs in the parameter list which is rejected.

      I cannot rule out the possibility of nginx configuration issue, but I followed all the guidelines.

       

          [JENKINS-63014] Agent connection error using nginx proxy with WebSocket

          Balazs Varnai added a comment -

           I tested the change, it works. I also agree with not passing passing extra urls, only the configured one.

          However without having to configure the wsagents URIs, Jenkins still thinks there is proxy issue aka "It appears that your reverse proxy set up is broken". But it's likely to be an nginx issue or something websocket proxy implementation specific. I'm not nginx expert either.

          Balazs Varnai added a comment -  I tested the change, it works. I also agree with not passing passing extra urls, only the configured one. However without having to configure the wsagents URIs, Jenkins still thinks there is proxy issue aka "It appears that your reverse proxy set up is broken". But it's likely to be an nginx issue or something websocket proxy implementation specific. I'm not nginx expert either.

          Jesse Glick added a comment -

          Jenkins still thinks there is proxy issue aka "It appears that your reverse proxy set up is broken".

          This is a good sign: it means the ReverseProxySetupMonitor is doing its job. It also helps confirm that the root problem was the nginx configuration, not anything about WebSocket per se.

          FWIW I do deal with nginx reverse proxies to Jenkins all the time (in fact this was a key motivation for JEP-222), but only via the Kubernetes ingress controller, which normally does all the tricky work for you.

          Jesse Glick added a comment - Jenkins still thinks there is proxy issue aka "It appears that your reverse proxy set up is broken". This is a good sign: it means the ReverseProxySetupMonitor is doing its job. It also helps confirm that the root problem was the nginx configuration, not anything about WebSocket per se. FWIW I do deal with nginx reverse proxies to Jenkins all the time (in fact this was a key motivation for JEP-222), but only via the Kubernetes ingress controller, which normally does all the tricky work for you.

          Jesse Glick added a comment -

          At one point I did testing against Apache: https://github.com/jglick/jenkins-demo-reverse-proxy

          Jesse Glick added a comment - At one point I did testing against Apache: https://github.com/jglick/jenkins-demo-reverse-proxy

          Jesse Glick added a comment -

          Marking as lts-candidate even though there is a workaround which is really a fix for underlying issues rather than a workaround, because I keep on hearing of people running into this and being confused (I suppose after dismissing or ignoring the warning about the broken reverse proxy).

          Jesse Glick added a comment - Marking as lts-candidate even though there is a workaround which is really a fix for underlying issues rather than a workaround, because I keep on hearing of people running into this and being confused (I suppose after dismissing or ignoring the warning about the broken reverse proxy).

          Oleg Nenashev added a comment - - edited

          We released it in Jenkins 2.248. https://github.com/jenkinsci/jenkins/pull/4839 basically removes the (unused?)feature from the  Jenkins core, so I am not sure it can be easily backported. Up for a discussion in the case of 2.235.3 backporting

           

          Oleg Nenashev added a comment - - edited We released it in Jenkins 2.248.  https://github.com/jenkinsci/jenkins/pull/4839  basically removes the (unused?)feature from the  Jenkins core, so I am not sure it can be easily backported. Up for a discussion in the case of 2.235.3 backporting  

          Jesse Glick added a comment -

          The non-trivial-lts-backporting label is misleading: the patch merely removes a couple of lines, so it should certainly be trivial to backport. Whether it should be backported is of course up for discussion, as would be true for any patch.

          I would not say that the deleted code was unused. If your reverse proxy configuration was wrong, or you were otherwise accessing Jenkins via a nonstandard URL for some reason, it would allow inbound TCP agents to connect using the nonstandard URL in case the standard URL were broken. The point is that this was a dubious decision when written, and became even less advisable after subsequent improvements in Jenkins to: guide you to define the root URL in the setup wizard; show an administrative monitor if you had not; and display an administrative monitor if it could be detected that the root URL was configured yet incorrect.

          Jesse Glick added a comment - The non-trivial-lts-backporting label is misleading: the patch merely removes a couple of lines, so it should certainly be trivial to backport. Whether it should be backported is of course up for discussion, as would be true for any patch. I would not say that the deleted code was unused. If your reverse proxy configuration was wrong, or you were otherwise accessing Jenkins via a nonstandard URL for some reason, it would allow inbound TCP agents to connect using the nonstandard URL in case the standard URL were broken. The point is that this was a dubious decision when written, and became even less advisable after subsequent improvements in Jenkins to: guide you to define the root URL in the setup wizard; show an administrative monitor if you had not; and display an administrative monitor if it could be detected that the root URL was configured yet incorrect.

          I think this fix introduced a regression in our setup and I'm not sure how to properly solve that.

          Our Jenkins master is currently exposed via 2 URLs:

          • One is "public", where our users connect on and authenticate against an authentication proxy before reaching Jenkins itself.
          • The other one is "internal", where only a subset of URLs are authorized, for our dynamically created Jenkins agents to connect back on the master using JNLP.

          In both cases, Jenkins is behind an nginx proxy is not reachable directly.

          Up until 2.248, the agents were receiving the 2 URLs to connect back on the master:

          Picked up _JAVA_OPTIONS: -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap
          Jul 26, 2020 1:03:39 PM hudson.remoting.jnlp.Main createEngine
          INFO: Setting up agent: jenkins-ops-docs-bpxmmgsmkz
          Jul 26, 2020 1:03:39 PM hudson.remoting.jnlp.Main$CuiListener <init>
          INFO: Jenkins agent is running in headless mode.
          Jul 26, 2020 1:03:39 PM hudson.remoting.Engine startEngine
          INFO: Using Remoting version: 4.3
          Jul 26, 2020 1:03:39 PM hudson.remoting.Engine startEngine
          WARNING: No Working Directory. Using the legacy JAR Cache location: /home/jenkins/.jenkins/cache/jars
          Jul 26, 2020 1:03:39 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Locating server among [https://EXTERNAL/jenkins/, http://INTERNAL/jenkins/]
          Jul 26, 2020 1:03:39 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve
          INFO: Remoting server accepts the following protocols: [JNLP4-connect, Ping]
          Jul 26, 2020 1:03:39 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver isPortVisible
          WARNING: Connection refused (Connection refused)
          Jul 26, 2020 1:03:39 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve
          INFO: Remoting server accepts the following protocols: [JNLP4-connect, Ping]
          Jul 26, 2020 1:03:39 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Agent discovery successful
            Agent address: INTERNAL
            Agent port:    36921
            Identity:      58:e8:9a:bd:ce:d2:c3:7f:d4:33:e3:cc:35:7d:15:a4
          Jul 26, 2020 1:03:39 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Handshaking
          Jul 26, 2020 1:03:39 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Connecting to INTERNAL:36921
          Jul 26, 2020 1:03:39 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Trying protocol: JNLP4-connect
          Jul 26, 2020 1:03:39 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Remote identity confirmed: 58:e8:9a:bd:ce:d2:c3:7f:d4:33:e3:cc:35:7d:15:a4
          Jul 26, 2020 1:03:39 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Connected
          

          The first URL (EXTERNAL) fails with WARNING: Connection refused (Connection refused) but the agent can then fallback on the second URL, which works.
          I believe the first one (EXTERNAL) doesn't work because the domain name resolve on an IP address where the firewall doesn't open random port. The INTERNAL domain is a different address which doesn't have the same settings.

          After upgrading to 2.248, only the first address (EXTERNAL) is sent to the agent, and the connection always fails and there's no fallback anymore.

          I'd be happy to update our configuration for the new behavior introduced with this change, but I'm not sure how I should proceed:

          • AFAIK, the "root URL" which is sent to the agent is the "Jenkins URL" has configured in the main configuration panel. I think it's also used for other purpose, such as building URLs which are sent externally (such a build results on GitHub for example?), so we don't really want to change this one and we would like to keep the "official" URL used by our users to be EXTERNAL.
          • BUT, we would like our agents (only) to connect using this INTERNAL URL, but then I'm not sure how (if?) to configure that.

          In any case, I'm not sure this should be backported as it in a LTS version, as it may break some setups.

          Jonathan Ballet added a comment - I think this fix introduced a regression in our setup and I'm not sure how to properly solve that. Our Jenkins master is currently exposed via 2 URLs: One is "public", where our users connect on and authenticate against an authentication proxy before reaching Jenkins itself. The other one is "internal", where only a subset of URLs are authorized, for our dynamically created Jenkins agents to connect back on the master using JNLP. In both cases, Jenkins is behind an nginx proxy is not reachable directly. Up until 2.248, the agents were receiving the 2 URLs to connect back on the master: Picked up _JAVA_OPTIONS: -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap Jul 26, 2020 1:03:39 PM hudson.remoting.jnlp.Main createEngine INFO: Setting up agent: jenkins-ops-docs-bpxmmgsmkz Jul 26, 2020 1:03:39 PM hudson.remoting.jnlp.Main$CuiListener <init> INFO: Jenkins agent is running in headless mode. Jul 26, 2020 1:03:39 PM hudson.remoting.Engine startEngine INFO: Using Remoting version: 4.3 Jul 26, 2020 1:03:39 PM hudson.remoting.Engine startEngine WARNING: No Working Directory. Using the legacy JAR Cache location: /home/jenkins/.jenkins/cache/jars Jul 26, 2020 1:03:39 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Locating server among [https: //EXTERNAL/jenkins/, http://INTERNAL/jenkins/] Jul 26, 2020 1:03:39 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve INFO: Remoting server accepts the following protocols: [JNLP4-connect, Ping] Jul 26, 2020 1:03:39 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver isPortVisible WARNING: Connection refused (Connection refused) Jul 26, 2020 1:03:39 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve INFO: Remoting server accepts the following protocols: [JNLP4-connect, Ping] Jul 26, 2020 1:03:39 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Agent discovery successful Agent address: INTERNAL Agent port: 36921 Identity: 58:e8:9a:bd:ce:d2:c3:7f:d4:33:e3:cc:35:7d:15:a4 Jul 26, 2020 1:03:39 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Handshaking Jul 26, 2020 1:03:39 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Connecting to INTERNAL:36921 Jul 26, 2020 1:03:39 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Trying protocol: JNLP4-connect Jul 26, 2020 1:03:39 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Remote identity confirmed: 58:e8:9a:bd:ce:d2:c3:7f:d4:33:e3:cc:35:7d:15:a4 Jul 26, 2020 1:03:39 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Connected The first URL ( EXTERNAL ) fails with WARNING: Connection refused (Connection refused) but the agent can then fallback on the second URL, which works. I believe the first one ( EXTERNAL ) doesn't work because the domain name resolve on an IP address where the firewall doesn't open random port. The INTERNAL domain is a different address which doesn't have the same settings. After upgrading to 2.248, only the first address ( EXTERNAL ) is sent to the agent, and the connection always fails and there's no fallback anymore. I'd be happy to update our configuration for the new behavior introduced with this change, but I'm not sure how I should proceed: AFAIK, the "root URL" which is sent to the agent is the "Jenkins URL" has configured in the main configuration panel. I think it's also used for other purpose, such as building URLs which are sent externally (such a build results on GitHub for example?), so we don't really want to change this one and we would like to keep the "official" URL used by our users to be EXTERNAL . BUT, we would like our agents (only) to connect using this INTERNAL URL, but then I'm not sure how (if?) to configure that. In any case, I'm not sure this should be backported as it in a LTS version, as it may break some setups.

          Oleg Nenashev added a comment -

          According to JENKINS-63222, this feature is in fact used by at least one plugin as additional option

          Oleg Nenashev added a comment - According to  JENKINS-63222 , this feature is in fact used by at least one plugin as additional option

          Kanstantsin Shautsou added a comment - - edited

          Having multiple urls is one of the things needed for HA scenario, even if master is a singleton. Once first failure the second could be run on the same JENKINS_HOME (hello CBE features?) and slaves should have ability to re-connect to master directly (no need to proxy via nginx etc internal isolated infrastructure). In general unability to control what urls as sent is a problem, but once "first external" is usually filtered on firewall with fast tcp reject it's wasn't a problem.

          Kanstantsin Shautsou added a comment - - edited Having multiple urls is one of the things needed for HA scenario, even if master is a singleton. Once first failure the second could be run on the same JENKINS_HOME (hello CBE features?) and slaves should have ability to re-connect to master directly (no need to proxy via nginx etc internal isolated infrastructure). In general unability to control what urls as sent is a problem, but once "first external" is usually filtered on firewall with fast tcp reject it's wasn't a problem.

          Jesse Glick added a comment -

          Kubernetes handles failover automatically (and CloudBees CI uses that ability), but there is no need for multiple URLs—the Service routes requests to the active pod. You can set up the same manually. You can use an alternate cluster-internal URL to bypass ingress, but this does not mean multiple -url arguments, just a different one, as described in JENKINS-63222 w.r.t. the kubernetes plugin.

          Again, if your architecture specifically requires multiple URLs with dynamic fallback, you can still do that with TCP agents, using the lower-level and more explicit launch mode. This change (JENKINS-63014) merely removes a heuristic from the higher-level *.jnlp launch mode. Many Jenkins web features will not work correctly if you actually access them via nonstandard URLs, but TCP agents are a special case in that the HTTP request is used solely to grab a host name and port from response headers.

          Jesse Glick added a comment - Kubernetes handles failover automatically (and CloudBees CI uses that ability), but there is no need for multiple URLs—the Service routes requests to the active pod. You can set up the same manually. You can use an alternate cluster-internal URL to bypass ingress, but this does not mean multiple -url arguments, just a different one, as described in JENKINS-63222 w.r.t. the kubernetes plugin. Again, if your architecture specifically requires multiple URLs with dynamic fallback, you can still do that with TCP agents, using the lower-level and more explicit launch mode. This change ( JENKINS-63014 ) merely removes a heuristic from the higher-level *.jnlp launch mode. Many Jenkins web features will not work correctly if you actually access them via nonstandard URLs, but TCP agents are a special case in that the HTTP request is used solely to grab a host name and port from response headers.

            jglick Jesse Glick
            balazs_varnai Balazs Varnai
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: