Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-38704

Docker JNLP launcher should include -noReconnect option

    • Icon: Improvement Improvement
    • Resolution: Won't Fix
    • Icon: Minor Minor
    • docker-plugin
    • None
    • Jenkins 2.23
      Docker plugin: 0.16.2
      Docker: 1.11.1 (Using swarm)

      For reasons I don't understand, my installation produces a number of orphaned docker containers every day or so (although most of the time everything works). There are a number of variants, this issue concerns this one:

      + java -jar /home/jenkins/slave.jar -jnlpUrl http://jenkins.url.hidden:8080//computer/Bigmemory-Swarm-6b70e1946dc5//slave-agent.jnlp -secret XXXXXXX
      Oct 03, 2016 4:18:33 PM hudson.remoting.jnlp.Main createEngine
      INFO: Setting up slave: Bigmemory-Swarm-6b70e1946dc5
      Oct 03, 2016 4:18:33 PM hudson.remoting.jnlp.Main$CuiListener <init>
      INFO: Jenkins agent is running in headless mode.
      Oct 03, 2016 4:18:33 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Locating server among [http://jenkins.url.hidden:8080/]
      Oct 03, 2016 4:18:33 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Handshaking
      Oct 03, 2016 4:18:33 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Connecting to jenkins.url.hidden:34232
      Oct 03, 2016 4:18:33 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Server reports protocol JNLP3-connect not supported, skipping
      Oct 03, 2016 4:18:33 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Trying protocol: JNLP2-connect
      Oct 03, 2016 4:18:33 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Connected
      Failing to obtain http://jenkins.url.hidden:8080//computer/Bigmemory-Swarm-6b70e1946dc5//slave-agent.jnlp?encrypt=true
      java.io.IOException: Failed to load http://jenkins.url.hidden:8080//computer/Bigmemory-Swarm-6b70e1946dc5//slave-agent.jnlp?encrypt=true: 404 Not Found
              at hudson.remoting.Launcher.parseJnlpArguments(Launcher.java:280)
              at hudson.remoting.Launcher.run(Launcher.java:224)
              at hudson.remoting.Launcher.main(Launcher.java:197)
      Waiting 10 seconds before retry
      Failing to obtain http://jenkins.url.hidden:8080//computer/Bigmemory-Swarm-6b70e1946dc5//slave-agent.jnlp?encrypt=true
      java.io.IOException: Failed to load http://jenkins.url.hidden:8080//computer/Bigmemory-Swarm-6b70e1946dc5//slave-agent.jnlp?encrypt=true: 404 Not Found
              at hudson.remoting.Launcher.parseJnlpArguments(Launcher.java:280)
              at hudson.remoting.Launcher.run(Launcher.java:224)
              at hudson.remoting.Launcher.main(Launcher.java:197)
      Waiting 10 seconds before retry
      Failing to obtain http://jenkins.url.hidden:8080//computer/Bigmemory-Swarm-6b70e1946dc5//slave-agent.jnlp?encrypt=true
      java.io.IOException: Failed to load http://jenkins.url.hidden:8080//computer/Bigmemory-Swarm-6b70e1946dc5//slave-agent.jnlp?encrypt=true: 404 Not Found
      

      The retry continues forever, tying up the Docker swarm. I have to kill and remove the containers manually. I haven't been able to catch the master logs that correspond to this yet, since this happens only occasionally and it's a busy installation, so I don't know why the master doesn't know about the slave it's launching - perhaps a job abort before it is provisioned, or just a job abort at any point.

      I have the Cloud Connection Timeout and Read Timeout set to 10, and the individual template timeouts set, but of course these do not affect the operation of the JNLP jar. The only option I see from the jar is -noReconnect, which would give up on the first 404 in this case.

      I don't know with complete certainty that this would work for everyone, so perhaps it would be helpful to make it an option? Outside of this, I am left with having to rig up hacky monitoring and kill-old-containers scripts.

          [JENKINS-38704] Docker JNLP launcher should include -noReconnect option

          agent to reconnect on master is required for pipeline / continuable task support (jenkins may restart while the build is still running on agent)

           

          Your issue demonstrate jenkins not being able to remove a container after build completion, which should have been logged, but without more warning sent to administrator. I think docker-plugin could track such orphaned containers, maybe offer an Administrator Monitor to alert administrator, or even (re)try to stop them on a regular basis as a cleanup background task.

          Nicolas De Loof added a comment - agent to reconnect on master is required for pipeline / continuable task support (jenkins may restart while the build is still running on agent)   Your issue demonstrate jenkins not being able to remove a container after build completion, which should have been logged, but without more warning sent to administrator. I think docker-plugin could track such orphaned containers, maybe offer an Administrator Monitor to alert administrator, or even (re)try to stop them on a regular basis as a cleanup background task.

          orphaned containers is an issue to be addressed in a general way, not limited to JNLP slaves

          Nicolas De Loof added a comment - orphaned containers is an issue to be addressed in a general way, not limited to JNLP slaves

            ndeloof Nicolas De Loof
            akom Alexander Komarov
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: