Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-6604

Possible race condition in RemoteClassLoader renders slave unusable

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved (View Workflow)
    • Blocker
    • Resolution: Fixed
    • core
    • CentOS 5.3, Sun JDK 1.6.0_19 64-bit

    Description

      We are restarting hudson each Sunday afternoon to evade problems with memory leaks and have a couple of nightly builds that kick in at midnight. The scenario is that Hudson is fresh when multiple builds kick in, that is its remote class loader did not have a chance to read any classes yet. We have 3 executors defined. I suppose that the SCM poll action that is sent in many build procedures causes multiple requests to load classes for the SCM (we use slightly hacked version of CVS SCM). We are getting the following exception:
      java.lang.LinkageError: loader (instance of hudson/remoting/RemoteClassLoader): attempted duplicate class definition for name: "hudson/model/ModelObject"

      I have looked around on the web and found this (http://jira.codehaus.org/browse/JETTY-418) that lead me to believe that lack of synchronization while loading classes in remote class loader is the cause.

      Full stack trace:

      Started on May 24, 2010 12:00:54 AM
      FATAL: remote file operation failed: /home/hudson-slave/workspace/BPE_8.1SR at hudson.remoting.Channel@1219b8c:slave-81
      hudson.util.IOException2: remote file operation failed: /home/hudson-slave/workspace/BPE_8.1SR at hudson.remoting.Channel@1219b8c:slave-81
      	at hudson.FilePath.act(FilePath.java:743)
      	at hudson.FilePath.act(FilePath.java:729)
      	at com.syncron.hudson.cvs2.CVS2.isUpdatable(CVS2.java:813)
      	at com.syncron.hudson.cvs2.CVS2.pollChanges(CVS2.java:310)
      	at hudson.scm.SCM.poll(SCM.java:370)
      	at hudson.model.AbstractProject.poll(AbstractProject.java:1153)
      	at hudson.triggers.SCMTrigger$Runner.runPolling(SCMTrigger.java:330)
      	at hudson.triggers.SCMTrigger$Runner.run(SCMTrigger.java:359)
      	at hudson.util.SequentialExecutionQueue$QueueEntry.run(SequentialExecutionQueue.java:118)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
      	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      	at java.lang.Thread.run(Thread.java:619)
      Caused by: java.io.IOException: Remote call on slave-81 failed
      	at hudson.remoting.Channel.call(Channel.java:560)
      	at hudson.FilePath.act(FilePath.java:736)
      	... 14 more
      Caused by: java.lang.LinkageError: loader (instance of  hudson/remoting/RemoteClassLoader): attempted  duplicate class definition for name: "hudson/model/ModelObject"
      	at java.lang.ClassLoader.defineClass1(Native Method)
      	at java.lang.ClassLoader.defineClassCond(ClassLoader.java:632)
      	at java.lang.ClassLoader.defineClass(ClassLoader.java:616)
      	at java.lang.ClassLoader.defineClass(ClassLoader.java:466)
      	at hudson.remoting.RemoteClassLoader.loadClassFile(RemoteClassLoader.java:151)
      	at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:131)
      	at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
      	at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
      	at java.lang.ClassLoader.defineClass1(Native Method)
      	at java.lang.ClassLoader.defineClassCond(ClassLoader.java:632)
      	at java.lang.ClassLoader.defineClass(ClassLoader.java:616)
      	at java.lang.ClassLoader.defineClass(ClassLoader.java:466)
      	at hudson.remoting.RemoteClassLoader.loadClassFile(RemoteClassLoader.java:151)
      	at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:131)
      	at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
      	at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
      	at java.lang.Class.getDeclaredFields0(Native Method)
      	at java.lang.Class.privateGetDeclaredFields(Class.java:2291)
      	at java.lang.Class.getDeclaredField(Class.java:1880)
      	at java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1610)
      	at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:52)
      	at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:425)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:413)
      	at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:310)
      	at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:547)
      	at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1583)
      	at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1496)
      	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1732)
      	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
      	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1947)
      	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1871)
      	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1753)
      	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
      	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1947)
      	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1871)
      	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1753)
      	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
      	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
      	at hudson.remoting.UserRequest.deserialize(UserRequest.java:178)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:98)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:48)
      	at hudson.remoting.Request$2.run(Request.java:270)
      	... 6 more
      Done. Took 63 ms
      No changes
      

      If we start single job manually after restart it executes properly. Any consecutive jobs will also run fine. However if we get that exception once, no other jobs that use the class mentioned in exception (pretty much all) will execute anymore until slave is restarted.

      Attachments

        Issue Links

          Activity

            dogfood dogfood added a comment -

            Integrated in jenkins_main_trunk #2114
            use last remoting 2.19 fix for JENKINS-6604 (Revision 76e31a3a5c039e317a84b4c3331e15c284d44435)
            changelog entry for JENKINS-6604 (Revision b80789474cacc64be4954f0f4473759311e80580)

            Result = SUCCESS
            Olivier Lamy : 76e31a3a5c039e317a84b4c3331e15c284d44435
            Files :

            • pom.xml

            Olivier Lamy : b80789474cacc64be4954f0f4473759311e80580
            Files :

            • changelog.html
            dogfood dogfood added a comment - Integrated in jenkins_main_trunk #2114 use last remoting 2.19 fix for JENKINS-6604 (Revision 76e31a3a5c039e317a84b4c3331e15c284d44435) changelog entry for JENKINS-6604 (Revision b80789474cacc64be4954f0f4473759311e80580) Result = SUCCESS Olivier Lamy : 76e31a3a5c039e317a84b4c3331e15c284d44435 Files : pom.xml Olivier Lamy : b80789474cacc64be4954f0f4473759311e80580 Files : changelog.html
            jglick Jesse Glick added a comment -

            With @olamy’s integration, should now be fixed.

            jglick Jesse Glick added a comment - With @olamy’s integration, should now be fixed.
            hx_unbanned Linards L added a comment - - edited

            Seems like upgrade from v1.492 to v1.494 also fixed the ancient bug causing inability to install more than one Slave in Windows 2k8 (R2) X64 (Datacenter) using standard method .. - still - before using standard method I renamed first Slave service...

            hx_unbanned Linards L added a comment - - edited Seems like upgrade from v1.492 to v1.494 also fixed the ancient bug causing inability to install more than one Slave in Windows 2k8 (R2) X64 (Datacenter) using standard method .. - still - before using standard method I renamed first Slave service...

            Code changed in jenkins
            User: olivier lamy
            Path:
            pom.xml
            http://jenkins-ci.org/commit/jenkins/8cbe3f5ab6cc6289481cc13784952802392129e5
            Log:
            use last remoting 2.19 fix for JENKINS-6604
            (cherry picked from commit 76e31a3a5c039e317a84b4c3331e15c284d44435)

            Conflicts:
            pom.xml

            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: olivier lamy Path: pom.xml http://jenkins-ci.org/commit/jenkins/8cbe3f5ab6cc6289481cc13784952802392129e5 Log: use last remoting 2.19 fix for JENKINS-6604 (cherry picked from commit 76e31a3a5c039e317a84b4c3331e15c284d44435) Conflicts: pom.xml

            Code changed in jenkins
            User: olivier lamy
            Path:
            changelog.html
            http://jenkins-ci.org/commit/jenkins/4b1a73f16567d812a0e9af5ebb0e5bcb5e8c5b0d
            Log:
            changelog entry for JENKINS-6604
            (cherry picked from commit b80789474cacc64be4954f0f4473759311e80580)

            Conflicts:
            changelog.html

            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: olivier lamy Path: changelog.html http://jenkins-ci.org/commit/jenkins/4b1a73f16567d812a0e9af5ebb0e5bcb5e8c5b0d Log: changelog entry for JENKINS-6604 (cherry picked from commit b80789474cacc64be4954f0f4473759311e80580) Conflicts: changelog.html

            People

              jglick Jesse Glick
              michal_grzejszczak michal_grzejszczak
              Votes:
              5 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: