Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-15199

Slaves turns to "Dead" state after connecting to remote machine

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Blocker Blocker
    • core, remoting
    • Linux 2.6.27.42-0.1-default #1 SMP 2010-01-06 16:07:25 +0100 x86_64 x86_64 x86_64 GNU/Linux
      Distributor ID: SUSE LINUX
      Description: SUSE Linux Enterprise Desktop 11 (x86_64)
      Release: 11
      Codename: n/a

      Hi,

      I'm trying to run a slave on a remote computer, but it isn't working. For some reason the slave fails after the link has been established between the master and remote nodes.

      The error message looks like this:
      java.lang.NoClassDefFoundError: hudson/model/Run$RunExecution
      at java.lang.Class.getDeclaredConstructors0(Native Method)
      at java.lang.Class.privateGetDeclaredConstructors(Class.java:2389)
      at java.lang.Class.getConstructor0(Class.java:2699)
      at java.lang.Class.getConstructor(Class.java:1657)
      at hudson.model.AbstractProject.newBuild(AbstractProject.java:944)
      at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1159)
      at hudson.model.AbstractProject.createExecutable(AbstractProject.java:129)
      at hudson.model.Executor.run(Executor.java:214)
      Caused by: java.lang.ClassNotFoundException: hudson.model.Run$RunExecution
      at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
      at java.security.AccessController.doPrivileged(Native Method)
      at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
      at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
      at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
      ... 8 more

      Any idea about what might be causing this malfunctioning?

      Best Regards,
      Carlos

        1. error.jpg
          error.jpg
          111 kB
        2. slave-configuration.png
          slave-configuration.png
          48 kB
        3. slave-startup.txt
          2 kB

          [JENKINS-15199] Slaves turns to "Dead" state after connecting to remote machine

          Carlos André created issue -
          Carlos André made changes -
          Summary Original: Slaves turns to "Dead" state after conneting to remote machine New: Slaves turns to "Dead" state after connecting to remote machine

          Stewart Smith added a comment -

          I've gotten this too (on most of our ssh launched slaves):

          java.lang.NullPointerException
          at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:220)
          at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:66)
          at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1197)
          at hudson.model.AbstractProject.createExecutable(AbstractProject.java:136)
          at hudson.model.Executor.run(Executor.java:211)

          more info

          Stewart Smith added a comment - I've gotten this too (on most of our ssh launched slaves): java.lang.NullPointerException at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:220) at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:66) at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1197) at hudson.model.AbstractProject.createExecutable(AbstractProject.java:136) at hudson.model.Executor.run(Executor.java:211) more info

          Stewart Smith added a comment -

          Problem disappeared when downgrading to 1.482

          Stewart Smith added a comment - Problem disappeared when downgrading to 1.482

          Carlos André added a comment -

          I'll try this as workaround.

          Thanks a lot.

          Carlos André added a comment - I'll try this as workaround. Thanks a lot.

          Steven Aerts added a comment -

          Set assignee to Automatic so some of the core developers can take this up.

          Steven Aerts added a comment - Set assignee to Automatic so some of the core developers can take this up.
          Steven Aerts made changes -
          Assignee Original: Carlos André [ carlosandre ] New: Kohsuke Kawaguchi [ kohsuke ]
          Labels Original: fail jenkins slave startup New: fail jenkins matrix slave startup

          Steven Aerts added a comment - - edited

          We are seeing this problem on jenkins version 1.486.
          We see exact the same stacktrace as Stewart.

          We see this only on nodes coupled with matrix jobs.
          Some of our matrix jobs we are not able to run anymore. Whenever one of those matrix jobs is started its node crashes with this NPE exception.

          It is unclear for us what the discriminator is which makes a matrix job faulty.

          Doe anyone have an idea what could cause these matrix jobs to crash.

          Steven Aerts added a comment - - edited We are seeing this problem on jenkins version 1.486. We see exact the same stacktrace as Stewart . We see this only on nodes coupled with matrix jobs. Some of our matrix jobs we are not able to run anymore. Whenever one of those matrix jobs is started its node crashes with this NPE exception. It is unclear for us what the discriminator is which makes a matrix job faulty. Doe anyone have an idea what could cause these matrix jobs to crash.

          Steven Aerts added a comment -

          I have found a reproduction scenario for this bug.

          1. Define a matrix job which runs a few jobs (which takes some time) on a specific node
          2. Run this job, which will spawn a few matrix jobs
          3. Do a safeRestart of jenkins, this will persist some of the matrix jobs which are still waiting in the queue

          When jenkins comes up now, it will try to start a matrix job from the persisted queue, and this will fail with the above NPE exception.

          A quick workaround is removing/deleting the matrix jobs from the queue after jenkins is restarted. This allows you to restart the dead clients again.

          Steven Aerts added a comment - I have found a reproduction scenario for this bug. Define a matrix job which runs a few jobs (which takes some time) on a specific node Run this job, which will spawn a few matrix jobs Do a safeRestart of jenkins, this will persist some of the matrix jobs which are still waiting in the queue When jenkins comes up now, it will try to start a matrix job from the persisted queue, and this will fail with the above NPE exception. A quick workaround is removing/deleting the matrix jobs from the queue after jenkins is restarted. This allows you to restart the dead clients again.

          Brian Murrell added a comment - - edited

          Still no resolution to this issue?

          It would be really nice to fix this issue for all of the people who are going to be upgrading and run into this issue and have to apply the workaround in the previous comment.

          Even a mention in https://wiki.jenkins-ci.org/display/SECURITY/Jenkins+Security+Advisory+2013-01-04 of the issue and work-around would be useful instead of people wasting time as I did trying to figure out what the problem is.

          Brian Murrell added a comment - - edited Still no resolution to this issue? It would be really nice to fix this issue for all of the people who are going to be upgrading and run into this issue and have to apply the workaround in the previous comment. Even a mention in https://wiki.jenkins-ci.org/display/SECURITY/Jenkins+Security+Advisory+2013-01-04 of the issue and work-around would be useful instead of people wasting time as I did trying to figure out what the problem is.

            kohsuke Kohsuke Kawaguchi
            carlosandre Carlos André
            Votes:
            7 Vote for this issue
            Watchers:
            14 Start watching this issue

              Created:
              Updated:
              Resolved: