Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-25490

Slave agent run with javaws breaks with "Unable to launch the application" when master is stopped

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Critical Critical
    • core
    • Jenkins 1.580.1 on Ubuntu 14.04 with slaves connected via JLNP.

      Since the former major LTS release I experience a kinda problematic problem with our Jenkins instance. Whenever the master is stopped (shutdown) with active connections to slaves, the JLNP client on the slaves stops working with the message: "Unable to launch the application". This was not the case before, so the client application was still running and reconnected once the master was back online.

      With the current behavior I have to step through all the >60 slave nodes, and start the JLNP client manually.

      Steps:
      1. Download Jenkins 1.580.1 and run it
      2. Setup a dumb slave via Java Web Start
      3. Connect the slave via the same machine
      4. Stop the master

      After step 4 the JLNP client should stay open, waiting for the master being online again. But it fails with the above message and the following stack:

      Exception:
      
      CouldNotLoadArgumentException[ Could not load file/URL specified: /tmp/javawJ3YZLo]
      	at com.sun.javaws.Main.launchApp(Unknown Source)
      	at com.sun.javaws.Main.continueInSecureThread(Unknown Source)
      	at com.sun.javaws.Main.access$000(Unknown Source)
      	at com.sun.javaws.Main$1.run(Unknown Source)
      	at java.lang.Thread.run(Thread.java:744)
      
      Wrapped Exception:
      
      java.io.FileNotFoundException: /tmp/javawJ3YZLo (No such file or directory)
      	at java.io.FileInputStream.open(Native Method)
      	at java.io.FileInputStream.<init>(FileInputStream.java:146)
      	at java.io.FileInputStream.<init>(FileInputStream.java:101)
      	at com.sun.javaws.jnl.LaunchDescFactory.buildDescriptor(Unknown Source)
      	at com.sun.javaws.Main.launchApp(Unknown Source)
      	at com.sun.javaws.Main.continueInSecureThread(Unknown Source)
      	at com.sun.javaws.Main.access$000(Unknown Source)
      	at com.sun.javaws.Main$1.run(Unknown Source)
      	at java.lang.Thread.run(Thread.java:744)
      

          [JENKINS-25490] Slave agent run with javaws breaks with "Unable to launch the application" when master is stopped

          Henrik Skupin added a comment -

          That is correct, yes. We don't have it running as service via upstart.

          Henrik Skupin added a comment - That is correct, yes. We don't have it running as service via upstart.

          Daniel Beck added a comment -

          Please try the workaround of using the java ... command line call in a terminal on the slave instead.

          Daniel Beck added a comment - Please try the workaround of using the java ... command line call in a terminal on the slave instead.

          Henrik Skupin added a comment -

          But isn't that the headless mode? The tests we are running require a GUI. That's why we used the JLNP method so far. Would that not be necessary? Means we could write our own little daemon (script) to ensure that slave.jar is the current version and (re-)starts the slave if it is not running?

          I tested it with the `java` command and it works at least on Ubuntu. I will also have to test on Windows, but would like to know more about my above question first. Thanks!

          Henrik Skupin added a comment - But isn't that the headless mode ? The tests we are running require a GUI. That's why we used the JLNP method so far. Would that not be necessary? Means we could write our own little daemon (script) to ensure that slave.jar is the current version and (re-)starts the slave if it is not running? I tested it with the `java` command and it works at least on Ubuntu. I will also have to test on Windows, but would like to know more about my above question first. Thanks!

          Daniel Beck added a comment -

          Not sure. All my Linux slaves are headless, but on Windows, the java started from cmd gets UI access. It's just the launch method not requiring a UI.

          but would like to know more about my above question first.

          How long can it possibly take to try and see what happens when a slave is launched this way? Please remember that there are many more users with questions than developers or users with the time to answer others' questions.

          I'd make sure to start it from a terminal in an interactive session though, so the process inherits the UI environment.

          Daniel Beck added a comment - Not sure. All my Linux slaves are headless, but on Windows, the java started from cmd gets UI access. It's just the launch method not requiring a UI. but would like to know more about my above question first. How long can it possibly take to try and see what happens when a slave is launched this way? Please remember that there are many more users with questions than developers or users with the time to answer others' questions. I'd make sure to start it from a terminal in an interactive session though, so the process inherits the UI environment.

          I suspect that the changes that made this happen are related to those that caused JENKINS-24272 but it is not the same failure. Namely between 1.554.3 and 1.565.1 Jenkins now restarts the whole slave process when the master restarts (to ensure that any memory leaks are gone) rather than just reconnecting to the master.

          Now it looks to me that JWS is deleting the downloaded file (slave.jar I would guess) and hence when the slave tries to restart it cannot find the executed file and hence the error seen occurs. This delete downloaded file behaviour is typical for what a web launched application would do to stop filling up the disk with unneeded junk. That might just be at odds with the whole restart JVM (as opposed to reconnect to master) approach.

          Richard Mortimer added a comment - I suspect that the changes that made this happen are related to those that caused JENKINS-24272 but it is not the same failure. Namely between 1.554.3 and 1.565.1 Jenkins now restarts the whole slave process when the master restarts (to ensure that any memory leaks are gone) rather than just reconnecting to the master. Now it looks to me that JWS is deleting the downloaded file (slave.jar I would guess) and hence when the slave tries to restart it cannot find the executed file and hence the error seen occurs. This delete downloaded file behaviour is typical for what a web launched application would do to stop filling up the disk with unneeded junk. That might just be at odds with the whole restart JVM (as opposed to reconnect to master) approach.

          Henrik Skupin added a comment -

          So I already tried the way via the terminal by connecting the slave via javaws, but the client dies the same way when the master gets shutdown. So this is not a workaround. And this only happens on Linux and OS X. I re-tested with Windows and I cannot observe this behavior. There the client re-connects successfully once the master is back online.

          I'm not that happy with the 'java' workaround given that it would require us to update all of our 70 machines, by downloading the slave.jar file first. Also not sure how often it was required to re-download it given updates to that file. I'm working on a puppet configuration for us but that is not ready yet.

          Henrik Skupin added a comment - So I already tried the way via the terminal by connecting the slave via javaws, but the client dies the same way when the master gets shutdown. So this is not a workaround. And this only happens on Linux and OS X. I re-tested with Windows and I cannot observe this behavior. There the client re-connects successfully once the master is back online. I'm not that happy with the 'java' workaround given that it would require us to update all of our 70 machines, by downloading the slave.jar file first. Also not sure how often it was required to re-download it given updates to that file. I'm working on a puppet configuration for us but that is not ready yet.

          Henrik Skupin added a comment -

          So I also tried to install jenkins as service via the JNLP connection window, but that failed on OS X. So it looks like we really have to use the 'java' command for now on both affected platforms.

          Henrik Skupin added a comment - So I also tried to install jenkins as service via the JNLP connection window, but that failed on OS X. So it looks like we really have to use the 'java' command for now on both affected platforms.

          Jesse Glick added a comment -

          The fix of JENKINS-24272 changes this behavior, and so far seems to fix it. Now when the master is stopped, the GUI window shows Terminated as expected, rather than closing. Then when the master comes back up, the window is closed, but now it reopens and connects. I am not sure if the javaws process is still being restarted, but if so, it somehow seems to work now.

          Jesse Glick added a comment - The fix of JENKINS-24272 changes this behavior, and so far seems to fix it. Now when the master is stopped, the GUI window shows Terminated as expected, rather than closing. Then when the master comes back up, the window is closed, but now it reopens and connects. I am not sure if the javaws process is still being restarted, but if so, it somehow seems to work now.

          I'm glad it is fixed. I wonder if the failure was occurring because JWS was trying to re-download the .jnlp file but couldn't because the master was restarting. The fix for JENKINS-24272 only restarts the slave when the master has restarted so the .jnlp file would be available then.

          Richard Mortimer added a comment - I'm glad it is fixed. I wonder if the failure was occurring because JWS was trying to re-download the .jnlp file but couldn't because the master was restarting. The fix for JENKINS-24272 only restarts the slave when the master has restarted so the .jnlp file would be available then.

          Jesse Glick added a comment -

          oldelvet that is a plausible explanation. If so, the fix is still pretty fragile: we do not really want to be relaunching javaws at all if we can help it.

          Jesse Glick added a comment - oldelvet that is a plausible explanation. If so, the fix is still pretty fragile: we do not really want to be relaunching javaws at all if we can help it.

            Unassigned Unassigned
            whimboo Henrik Skupin
            Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: