Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-64842

Docker Swarm agents not connecting to jenkins with remoting 4.6

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Component/s: remoting
    • Labels:
      None
    • Similar Issues:

      Description

      We recently set up a new Jenkins and Swarm setup, it came with the remoting 4.6.

      We are seeing our Agents not able to connect and get stuck here:

      In the Docker Swarm Cloud Config the Jenkins url is set to : http://10.XXX:8080/

      And the swarm url is set to http://10.XYZ:2376

      [6:35:30 PM] Creating Service with Name : agt-of_TestAgents_7-13
      [6:35:30 PM] ServiceSpec created with ID : fbpntizanolp4c1d2d7quyrca
      [6:35:30 PM] ServiceSpec request JSON : {"TaskTemplate":{"ContainerSpec":{"Image":"jenkins-docker-stage.artifactory.svcs.endurance.com/eigi-jenkins-agent-kubernetes","Command":["sh","-cx","curl --connect-timeout 20 --max-time 60 -o agent.jar $DOCKER_SWARM_PLUGIN_JENKINS_AGENT_JAR_URL && java -jar agent.jar -jnlpUrl $DOCKER_SWARM_PLUGIN_JENKINS_AGENT_JNLP_URL -secret $DOCKER_SWARM_PLUGIN_JENKINS_AGENT_SECRET -noReconnect -workDir /tmp"],"Env":["DOCKER_SWARM_PLUGIN_JENKINS_AGENT_SECRET=XXXX","DOCKER_SWARM_PLUGIN_JENKINS_AGENT_JAR_URL=http://10.XXX:8080/jnlpJars/agent.jar","DOCKER_SWARM_PLUGIN_JENKINS_AGENT_JNLP_URL=http://10.XXX:8080/computer/agt-of_TestAgents_7-13/slave-agent.jnlp","DOCKER_SWARM_PLUGIN_JENKINS_AGENT_NAME=agt-of_TestAgents_7-13"],"Dir":"/tmp","User":"jenkins","DNSConfig":{"Nameservers":[],"Search":[],"Options":[]},"Mounts":[{"Target":"/home/jenkins/.ssh","Source":"/var/lib/docker/volumes/JENKINS_SSH/_data","Type":"bind","VolumeOptions":null}],"Hosts":[],"Secrets":[],"Configs":[]},"RestartPolicy":{"Condition":"none","MaxAttempts":0},"Resources":{"Limits":{"NanoCPUs":0,"MemoryBytes":0},"Reservations":{"NanoCPUs":0,"MemoryBytes":0}},"Placement":{"Constraints":[]}},"EndpointSpec":{"Ports":[]},"Name":"agt-of_TestAgents_7-13","Labels":{"ROLE":"jenkins-agent"},"Networks":[]}
      c+ curl --connect-timeout 20 --max-time 60 -o agent.jar http://10.XXX:8080/jnlpJars/agent.jar
      P  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
      N                                 Dload  Upload   Total   Spent    Left  Speed
      �
        0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
      100 1471k  100 1471k    0     0  85.8M      0 --:--:-- --:--:-- --:--:-- 89.8M
      �+ java -jar agent.jar -jnlpUrl http://XXX:8080/computer/agt-of_TestAgents_7-13/slave-agent.jnlp -secret Xxx -noReconnect -workDir /tmp
      WFeb 09, 2021 6:35:32 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
      7INFO: Using /tmp/remoting as a remoting work directory
      RFeb 09, 2021 6:35:32 PM org.jenkinsci.remoting.engine.WorkDirManager setupLogging
      BINFO: Both error and output logs will be printed to /tmp/remoting
      ?Feb 09, 2021 6:35:32 PM hudson.remoting.jnlp.Main createEngine
      /INFO: Setting up agent: agt-of_TestAgents_7-13
      EFeb 09, 2021 6:35:32 PM hudson.remoting.jnlp.Main$CuiListener <init>
      1INFO: Jenkins agent is running in headless mode.
      ;Feb 09, 2021 6:35:32 PM hudson.remoting.Engine startEngine
      "INFO: Using Remoting version: 4.6
      WFeb 09, 2021 6:35:32 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
      7INFO: Using /tmp/remoting as a remoting work directory
      EFeb 09, 2021 6:35:32 PM hudson.remoting.jnlp.Main$CuiListener status
      HINFO: Locating server among [PRIMARYJENKINSURLIN_GLOBALCONFIG]
      

      What is unique is in the case where it does not work we only see ONE url in the `HINFO: Locating server among [PRIMARYJENKINSURLIN_GLOBALCONFIG]`

      We have another setup on another box with the same version of the Docker Swarm plugin (2.9) but remoting 4.5 and the candidateURLs have two urls `INFO: Locating server among [DOCKERSWARMURLFROMDOCKERCONFIG, PRIMARYJENKINSURL_GLOBALCONFIG]`

       

      We are trying to understand

      1. How do the candidateURLs get set for remoting. Where does it pull that from?

      2. Did something change in this area in remoting 4.6? I see some work with candidateURLs in the commit diff, but not sure entirely related

       

      We are not sure what else to check/try do to get both URLs to show up. If we were to change our JENKINS global config jenkins url to the 10.x url it works.. but we do not want that

        Attachments

          Activity

          Hide
          lorelei_mccollum Lorelei added a comment -

          We just tried updating the remoting to 4.6 on the working box that had 4.5 before and it still works and shows two candidate urls, so it may not be the remoting jar itself

          We need to understand where these candidate urls get set that get passed to the remoting

          Show
          lorelei_mccollum Lorelei added a comment - We just tried updating the remoting to 4.6 on the working box that had 4.5 before and it still works and shows two candidate urls, so it may not be the remoting jar itself We need to understand where these candidate urls get set that get passed to the remoting
          Hide
          lorelei_mccollum Lorelei added a comment -

          Found this ticket after some digging https://issues.jenkins.io/browse/JENKINS-63222

          It helped me proceed actually their issue is very similar although doesn't exactly line up.. but likely something in core broke us, the working box was 2.257 and the non working was latest

          Im gonna open a ticket on the Docker Swarm plugin page as well to let them know that their defaults no longer seem to work

          Using this command works 

          sh
          -cx
          curl --connect-timeout 20 --max-time 60 -o agent.jar $DOCKER_SWARM_PLUGIN_JENKINS_AGENT_JAR_URL && java -classpath agent.jar hudson.remoting.jnlp.Main -headless -url http://172.17.0.1:8080/  -workDir /tmp $DOCKER_SWARM_PLUGIN_JENKINS_AGENT_SECRET $DOCKER_SWARM_PLUGIN_JENKINS_AGENT_NAME
          
          Show
          lorelei_mccollum Lorelei added a comment - Found this ticket after some digging https://issues.jenkins.io/browse/JENKINS-63222 It helped me proceed actually their issue is very similar although doesn't exactly line up.. but likely something in core broke us, the working box was 2.257 and the non working was latest Im gonna open a ticket on the Docker Swarm plugin page as well to let them know that their defaults no longer seem to work Using this command works  sh -cx curl --connect-timeout 20 --max-time 60 -o agent.jar $DOCKER_SWARM_PLUGIN_JENKINS_AGENT_JAR_URL && java -classpath agent.jar hudson.remoting.jnlp.Main -headless -url http: //172.17.0.1:8080/ -workDir /tmp $DOCKER_SWARM_PLUGIN_JENKINS_AGENT_SECRET $DOCKER_SWARM_PLUGIN_JENKINS_AGENT_NAME
          Hide
          lorelei_mccollum Lorelei added a comment -
          Show
          lorelei_mccollum Lorelei added a comment - I opened this for cross reference  https://github.com/jenkinsci/docker-swarm-plugin/issues/105
          Hide
          jthompson Jeff Thompson added a comment -

          Yes, it sounds like something similar to JENKINS-63222. Have you seen this change https://github.com/jenkinsci/jenkins/pull/5153 ? It provides a new mechanism to deal with some of those scenarios. I don't know if there's anything specific in the swarm-plugin that relates.

          Show
          jthompson Jeff Thompson added a comment - Yes, it sounds like something similar to JENKINS-63222 . Have you seen this change https://github.com/jenkinsci/jenkins/pull/5153 ? It provides a new mechanism to deal with some of those scenarios. I don't know if there's anything specific in the swarm-plugin that relates.
          Hide
          lorelei_mccollum Lorelei added a comment -

          Yeah we saw from Jesse's post that basically our options were to change the jar commands or set that. We opted to change the jar commands for now.  I opened a bug for the Swarm Plugin too.. basically I think core changed something that these consumers need to update

          Show
          lorelei_mccollum Lorelei added a comment - Yeah we saw from Jesse's post that basically our options were to change the jar commands or set that. We opted to change the jar commands for now.  I opened a bug for the Swarm Plugin too.. basically I think core changed something that these consumers need to update

            People

            Assignee:
            jthompson Jeff Thompson
            Reporter:
            lorelei_mccollum Lorelei
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated: