Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-6604

Possible race condition in RemoteClassLoader renders slave unusable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Blocker Blocker
    • core
    • CentOS 5.3, Sun JDK 1.6.0_19 64-bit

      We are restarting hudson each Sunday afternoon to evade problems with memory leaks and have a couple of nightly builds that kick in at midnight. The scenario is that Hudson is fresh when multiple builds kick in, that is its remote class loader did not have a chance to read any classes yet. We have 3 executors defined. I suppose that the SCM poll action that is sent in many build procedures causes multiple requests to load classes for the SCM (we use slightly hacked version of CVS SCM). We are getting the following exception:
      java.lang.LinkageError: loader (instance of hudson/remoting/RemoteClassLoader): attempted duplicate class definition for name: "hudson/model/ModelObject"

      I have looked around on the web and found this (http://jira.codehaus.org/browse/JETTY-418) that lead me to believe that lack of synchronization while loading classes in remote class loader is the cause.

      Full stack trace:

      Started on May 24, 2010 12:00:54 AM
      FATAL: remote file operation failed: /home/hudson-slave/workspace/BPE_8.1SR at hudson.remoting.Channel@1219b8c:slave-81
      hudson.util.IOException2: remote file operation failed: /home/hudson-slave/workspace/BPE_8.1SR at hudson.remoting.Channel@1219b8c:slave-81
      	at hudson.FilePath.act(FilePath.java:743)
      	at hudson.FilePath.act(FilePath.java:729)
      	at com.syncron.hudson.cvs2.CVS2.isUpdatable(CVS2.java:813)
      	at com.syncron.hudson.cvs2.CVS2.pollChanges(CVS2.java:310)
      	at hudson.scm.SCM.poll(SCM.java:370)
      	at hudson.model.AbstractProject.poll(AbstractProject.java:1153)
      	at hudson.triggers.SCMTrigger$Runner.runPolling(SCMTrigger.java:330)
      	at hudson.triggers.SCMTrigger$Runner.run(SCMTrigger.java:359)
      	at hudson.util.SequentialExecutionQueue$QueueEntry.run(SequentialExecutionQueue.java:118)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
      	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      	at java.lang.Thread.run(Thread.java:619)
      Caused by: java.io.IOException: Remote call on slave-81 failed
      	at hudson.remoting.Channel.call(Channel.java:560)
      	at hudson.FilePath.act(FilePath.java:736)
      	... 14 more
      Caused by: java.lang.LinkageError: loader (instance of  hudson/remoting/RemoteClassLoader): attempted  duplicate class definition for name: "hudson/model/ModelObject"
      	at java.lang.ClassLoader.defineClass1(Native Method)
      	at java.lang.ClassLoader.defineClassCond(ClassLoader.java:632)
      	at java.lang.ClassLoader.defineClass(ClassLoader.java:616)
      	at java.lang.ClassLoader.defineClass(ClassLoader.java:466)
      	at hudson.remoting.RemoteClassLoader.loadClassFile(RemoteClassLoader.java:151)
      	at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:131)
      	at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
      	at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
      	at java.lang.ClassLoader.defineClass1(Native Method)
      	at java.lang.ClassLoader.defineClassCond(ClassLoader.java:632)
      	at java.lang.ClassLoader.defineClass(ClassLoader.java:616)
      	at java.lang.ClassLoader.defineClass(ClassLoader.java:466)
      	at hudson.remoting.RemoteClassLoader.loadClassFile(RemoteClassLoader.java:151)
      	at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:131)
      	at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
      	at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
      	at java.lang.Class.getDeclaredFields0(Native Method)
      	at java.lang.Class.privateGetDeclaredFields(Class.java:2291)
      	at java.lang.Class.getDeclaredField(Class.java:1880)
      	at java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1610)
      	at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:52)
      	at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:425)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:413)
      	at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:310)
      	at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:547)
      	at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1583)
      	at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1496)
      	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1732)
      	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
      	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1947)
      	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1871)
      	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1753)
      	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
      	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1947)
      	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1871)
      	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1753)
      	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
      	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
      	at hudson.remoting.UserRequest.deserialize(UserRequest.java:178)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:98)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:48)
      	at hudson.remoting.Request$2.run(Request.java:270)
      	... 6 more
      Done. Took 63 ms
      No changes
      

      If we start single job manually after restart it executes properly. Any consecutive jobs will also run fine. However if we get that exception once, no other jobs that use the class mentioned in exception (pretty much all) will execute anymore until slave is restarted.

          [JENKINS-6604] Possible race condition in RemoteClassLoader renders slave unusable

          Erik Lovlie added a comment -

          Ah yes, it is the same stack trace as above. Didn't notice.

          Erik Lovlie added a comment - Ah yes, it is the same stack trace as above. Didn't notice.

          Jesse Glick added a comment -

          Jesse Glick added a comment - https://github.com/jenkinsci/remoting/pull/8

          Code changed in jenkins
          User: Jesse Glick
          Path:
          src/main/java/hudson/remoting/RemoteClassLoader.java
          src/test/java/hudson/remoting/ClassRemotingTest.java
          src/test/java/hudson/remoting/DummyClassLoader.java
          src/test/java/hudson/remoting/DummyClassLoaderTest.java
          src/test/java/hudson/remoting/TestCallable.java
          http://jenkins-ci.org/commit/remoting/fdd0f4bb1bc92fb68cb2dd5d0f5a8f80e19c78d9
          Log:
          JENKINS-6604 Race condition in RemoteClassLoader.

          Compare: https://github.com/jenkinsci/remoting/compare/9ac2a9e49238...fdd0f4bb1bc9

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: src/main/java/hudson/remoting/RemoteClassLoader.java src/test/java/hudson/remoting/ClassRemotingTest.java src/test/java/hudson/remoting/DummyClassLoader.java src/test/java/hudson/remoting/DummyClassLoaderTest.java src/test/java/hudson/remoting/TestCallable.java http://jenkins-ci.org/commit/remoting/fdd0f4bb1bc92fb68cb2dd5d0f5a8f80e19c78d9 Log: JENKINS-6604 Race condition in RemoteClassLoader. Compare: https://github.com/jenkinsci/remoting/compare/9ac2a9e49238...fdd0f4bb1bc9

          Code changed in jenkins
          User: olivier lamy
          Path:
          pom.xml
          http://jenkins-ci.org/commit/jenkins/76e31a3a5c039e317a84b4c3331e15c284d44435
          Log:
          use last remoting 2.19 fix for JENKINS-6604

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: olivier lamy Path: pom.xml http://jenkins-ci.org/commit/jenkins/76e31a3a5c039e317a84b4c3331e15c284d44435 Log: use last remoting 2.19 fix for JENKINS-6604

          Code changed in jenkins
          User: olivier lamy
          Path:
          changelog.html
          http://jenkins-ci.org/commit/jenkins/b80789474cacc64be4954f0f4473759311e80580
          Log:
          changelog entry for JENKINS-6604

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: olivier lamy Path: changelog.html http://jenkins-ci.org/commit/jenkins/b80789474cacc64be4954f0f4473759311e80580 Log: changelog entry for JENKINS-6604

          dogfood added a comment -

          Integrated in jenkins_main_trunk #2114
          use last remoting 2.19 fix for JENKINS-6604 (Revision 76e31a3a5c039e317a84b4c3331e15c284d44435)
          changelog entry for JENKINS-6604 (Revision b80789474cacc64be4954f0f4473759311e80580)

          Result = SUCCESS
          Olivier Lamy : 76e31a3a5c039e317a84b4c3331e15c284d44435
          Files :

          • pom.xml

          Olivier Lamy : b80789474cacc64be4954f0f4473759311e80580
          Files :

          • changelog.html

          dogfood added a comment - Integrated in jenkins_main_trunk #2114 use last remoting 2.19 fix for JENKINS-6604 (Revision 76e31a3a5c039e317a84b4c3331e15c284d44435) changelog entry for JENKINS-6604 (Revision b80789474cacc64be4954f0f4473759311e80580) Result = SUCCESS Olivier Lamy : 76e31a3a5c039e317a84b4c3331e15c284d44435 Files : pom.xml Olivier Lamy : b80789474cacc64be4954f0f4473759311e80580 Files : changelog.html

          Jesse Glick added a comment -

          With @olamy’s integration, should now be fixed.

          Jesse Glick added a comment - With @olamy’s integration, should now be fixed.

          Linards L added a comment - - edited

          Seems like upgrade from v1.492 to v1.494 also fixed the ancient bug causing inability to install more than one Slave in Windows 2k8 (R2) X64 (Datacenter) using standard method .. - still - before using standard method I renamed first Slave service...

          Linards L added a comment - - edited Seems like upgrade from v1.492 to v1.494 also fixed the ancient bug causing inability to install more than one Slave in Windows 2k8 (R2) X64 (Datacenter) using standard method .. - still - before using standard method I renamed first Slave service...

          Code changed in jenkins
          User: olivier lamy
          Path:
          pom.xml
          http://jenkins-ci.org/commit/jenkins/8cbe3f5ab6cc6289481cc13784952802392129e5
          Log:
          use last remoting 2.19 fix for JENKINS-6604
          (cherry picked from commit 76e31a3a5c039e317a84b4c3331e15c284d44435)

          Conflicts:
          pom.xml

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: olivier lamy Path: pom.xml http://jenkins-ci.org/commit/jenkins/8cbe3f5ab6cc6289481cc13784952802392129e5 Log: use last remoting 2.19 fix for JENKINS-6604 (cherry picked from commit 76e31a3a5c039e317a84b4c3331e15c284d44435) Conflicts: pom.xml

          Code changed in jenkins
          User: olivier lamy
          Path:
          changelog.html
          http://jenkins-ci.org/commit/jenkins/4b1a73f16567d812a0e9af5ebb0e5bcb5e8c5b0d
          Log:
          changelog entry for JENKINS-6604
          (cherry picked from commit b80789474cacc64be4954f0f4473759311e80580)

          Conflicts:
          changelog.html

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: olivier lamy Path: changelog.html http://jenkins-ci.org/commit/jenkins/4b1a73f16567d812a0e9af5ebb0e5bcb5e8c5b0d Log: changelog entry for JENKINS-6604 (cherry picked from commit b80789474cacc64be4954f0f4473759311e80580) Conflicts: changelog.html

            jglick Jesse Glick
            michal_grzejszczak michal_grzejszczak
            Votes:
            5 Vote for this issue
            Watchers:
            13 Start watching this issue

              Created:
              Updated:
              Resolved: