Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-23271

Intermittent Invalid Object ID in remoting module

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Blocker Blocker
    • remoting
    • 1.544.2, 1.563, 1.625.1 - Windows or Linux Master, various slaves

      For the last two weeks, I have been seeing an intermittent failure during the clearcase stage of our builds. It throws a Java exception along the lines of

      FATAL: Invalid object ID 18649 iuota=18470
      java.lang.IllegalStateException: Invalid object ID 18469 iota=18470
      at hudson.remoting.ExportTable.diagnoseInvalidId(ExportTable.java:277)
      etc...

      It always happens during the Clearcase segment of the build, but on a different step each time. Sometimes after lsview, sometimes after lshistory, sometimes after mkview ...

      It happens a very low percentage of the time, and it happens on a different project each time. I have around 300 projects running between a number of matrix jobs- we are seeing one or two failures on each runthrough of this, always on different jobs and on different slaves. It started a couple weeks ago- our clearcase server crashed, and jenkins was having some trouble interacting with the server. I upgraded to 1.563 to try and solve that problem (I wound up having to manually delete all the views that had been created in the period between the backup and the crash). Once that issue had been resolved though, we started seeing this problem.

      I upgraded the clearcase plugin to 1.5, and then tried downgrading Jenkins to 1.554.2. I cannot swear to it, but I thought I was on 1.558 before the first problem popped up, and had been very stable for a long time with all build failures being attributed to bad code, so suddenly having Jenkins fail some is kind of jarring.

      At this point, I am not sure what I changed to cause the problem. I remember I had to upgrade slave.jar to version 2.41 from whatever it was before when I did the first upgrade to 1.563. Other then that, nothing else substantial has changed. The Jenkins is on a restricted network with no internet access- I am going to type the stack trace below, so forgive any typos.

      FATAL: Invalid object ID 18649 iuota=18470
      java.lang.IllegalStateException: Invalid object ID 18469 iota=18470
      at hudson.remoting.ExportTable.diagnoseInvalidId(ExportTable.java:277)
      at hudson.remoting.ExportTable.get(ExportTable.java:261)
      at hudson.remoting.Channel.getExportedObject(Channel.java:596)
      at hudson.remoting.RemoteInvocationHandler$RPCRequest.perform(RemoteInvocationHandler.java:292
      at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:281)
      at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:240)
      at hudson.remoting.UserRequest.perform(UserRequest.java:118)
      at hudson.remoting.UserRequest.perform(UserRequest.java:48)
      at hudson.remoting.Request$2.run(Request.java:328)
      at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
      at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
      at java.util.concurrent.FutureTask.run(UNKNOWN SOURCE)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
      at hudson.remoting.Engine$1$1.run(Engine.java:63)
      at java.lang.Thread.run(Unknown Source)
      Cased by: java.lang.Exception: Object was recently deallocated
      #18649 (ref.0) : hudson.Launcher$RemoteLaunchCallable$1
      created at Fri May 30 19:27:39 EDT 2014
      at hudson.remoting.ExportTable$Entry.<init>(ExportTable.java:86)
      at hudson.remoting.ExportTable.export(ExportTable.java:239)
      at hudson.remoting.Channel.export(Channel.java:592)
      at hudson.remoting.Channel.export(Channel.java:583)
      at hudson.remoting.Channel.export(Channel.java:553)
      at hudson.Launcher$RemoteLaunchCallable.call(Launcher.java:1025)
      at hudson.Launcher$RemoteLaunchCallable.call(Launcher.java:990)
      at hudson.remoting.UserRequest.perform(UserRequest.java:118)
      at hudson.remoting.UserRequest.perform(UserRequest.java:48)
      at hudson.remoting.Request$2.run(Request.java:328)
      at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
      at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
      at java.util.concurrent.FutureTask.run(Unknown Source)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
      at hudson.remoting.Engine$1$1.run(Engine.java:63)
      at java.lang.Thread.run(Unknown Source)
      Released at Fri May 30 19:27:39 EDT 2014
      at hudson.remoting.ExportTable$Entry.release(ExportTable.java:115)
      at hudson.remoting.ExportTable.unexportByOid(ExportTable.java:303)
      at hudson.remoting.Channel.unexport(Channel.java:600)
      at hudson.remoting.UnexportCommand.execute(UnexportCommand.java:38)
      at hudson.remoting.Channel$2.handle(Channel.java:475)
      at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:60)

      at hudson.remoting.ExportTable.diagnoseInvalidId(ExportTable.java:270)
      ... 15 more
      Caused by: Released at Fri May 30 19:27:39 EDT 2014
      at hudson.remoting.ExportTable$Entry.release(ExportTable.java:115)
      at hudson.remoting.ExportTable.unexportByOid(ExportTable.java:303)
      at hudson.remoting.Channel.unexport(Channel.java:600)
      at hudson.remoting.UnexportCommand.execute(UnexportCommand.java:38)
      at hudson.remoting.Channel$2.handle(Channel.java:475)
      at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:60)

          [JENKINS-23271] Intermittent Invalid Object ID in remoting module

          Oleg Nenashev added a comment -

          bruce Give me several hours. I will create a pull request with a wider solution. Then you will be able to take a PoC from the CI just to verify if it helps

          Oleg Nenashev added a comment - bruce Give me several hours. I will create a pull request with a wider solution. Then you will be able to take a PoC from the CI just to verify if it helps

          Oleg Nenashev added a comment -

          Oleg Nenashev added a comment - Created https://github.com/jenkinsci/jenkins/pull/2653

          We have been seeing this error periodically as well.

          Just out of curiosity, is there any indication on which version(s) of Jenkins (ie: v1 / v2) the fix will be applied once one has been approved?

          Kevin Phillips added a comment - We have been seeing this error periodically as well. Just out of curiosity, is there any indication on which version(s) of Jenkins (ie: v1 / v2) the fix will be applied once one has been approved?

          Oleg Nenashev added a comment -

          Jenkins 1.x is not supported by the community anymore (though I'm not a fan of the current release model).
          The fix will be integrated into 2.35 and likely then backported to 2.32.1

          Oleg Nenashev added a comment - Jenkins 1.x is not supported by the community anymore (though I'm not a fan of the current release model). The fix will be integrated into 2.35 and likely then backported to 2.32.1

          Code changed in jenkins
          User: Oleg Nenashev
          Path:
          core/src/main/java/hudson/Launcher.java
          core/src/main/java/hudson/Proc.java
          http://jenkins-ci.org/commit/jenkins/29893354a52efac20efcb4e0e91723f3bd889059
          Log:
          JENKINS-23271 - Process statuses of Remote process join() operations directly inside methods (#2653)

          • JENKINS-23271 - Process statuses of Remote process join() operations directly inside methods
          • JENKINS-23271 - Also prevent the issue when the kill() command is the last call in the usage sequence

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Oleg Nenashev Path: core/src/main/java/hudson/Launcher.java core/src/main/java/hudson/Proc.java http://jenkins-ci.org/commit/jenkins/29893354a52efac20efcb4e0e91723f3bd889059 Log: JENKINS-23271 - Process statuses of Remote process join() operations directly inside methods (#2653) JENKINS-23271 - Process statuses of Remote process join() operations directly inside methods JENKINS-23271 - Also prevent the issue when the kill() command is the last call in the usage sequence

          Andrew Gray added a comment -

          Should this now be resolved with release of 2.35?

          Andrew Gray added a comment - Should this now be resolved with release of 2.35?

          Oleg Nenashev added a comment -

          At least more failure cases should be solved

          Oleg Nenashev added a comment - At least more failure cases should be solved

          Code changed in jenkins
          User: Oleg Nenashev
          Path:
          core/src/main/java/hudson/Launcher.java
          http://jenkins-ci.org/commit/jenkins/fcfd271ad5ad5947075052d37177249c760f6184
          Log:
          JENKINS-23271 - Prevent early deallocation of the Proc instance by GC in ProcStarter#join() (#2635)

          • JENKINS-23271 - Prevent the prelimimary deallocation of the Proc instance by GC

          It is a hackish way, which likely prevents a preliminary deallocation of the spawned RemoteProc instance, which we see in JENKINS-23271. Proc instance was not actually required in the original code since we were creating and using RemoteInvocationHandler wrapper only, and the theory discussed with @stephenc was that object gets removed by Java8 garbage collector before we get into join().

          This fix enforces the persistency of ProcStarter#start() result by adding logging and the enforced volatile field (maybe the last one is not really required, but JIT compiler in Java implementations may be smart enough to skip unused loggers)

          This is a pretty old fix from August, which has been soak tested on my instance for several weeks (mid-August => Jenkins World). On the reference instance (just a small Jenkins instance with 4 agents and very frequent builds with CommandInterpreter steps) I saw 2 failures over the period. On the fixed instance - 0. It does not proof anything, but at least the fix was soak tested a bit

          • JENKINS-23271 - Also put the check into the finally statement as @stephenc proposed
          • Remove assert

          (cherry picked from commit fd6c6aff929be9818f4eb4b84ed6b4593356853f)

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Oleg Nenashev Path: core/src/main/java/hudson/Launcher.java http://jenkins-ci.org/commit/jenkins/fcfd271ad5ad5947075052d37177249c760f6184 Log: JENKINS-23271 - Prevent early deallocation of the Proc instance by GC in ProcStarter#join() (#2635) JENKINS-23271 - Prevent the prelimimary deallocation of the Proc instance by GC It is a hackish way, which likely prevents a preliminary deallocation of the spawned RemoteProc instance, which we see in JENKINS-23271 . Proc instance was not actually required in the original code since we were creating and using RemoteInvocationHandler wrapper only, and the theory discussed with @stephenc was that object gets removed by Java8 garbage collector before we get into join(). This fix enforces the persistency of ProcStarter#start() result by adding logging and the enforced volatile field (maybe the last one is not really required, but JIT compiler in Java implementations may be smart enough to skip unused loggers) This is a pretty old fix from August, which has been soak tested on my instance for several weeks (mid-August => Jenkins World). On the reference instance (just a small Jenkins instance with 4 agents and very frequent builds with CommandInterpreter steps) I saw 2 failures over the period. On the fixed instance - 0. It does not proof anything, but at least the fix was soak tested a bit JENKINS-23271 - Get rid of the procHolderForJoin field JENKINS-23271 - Also put the check into the finally statement as @stephenc proposed Remove assert (cherry picked from commit fd6c6aff929be9818f4eb4b84ed6b4593356853f)

          Code changed in jenkins
          User: Oleg Nenashev
          Path:
          core/src/main/java/hudson/Launcher.java
          core/src/main/java/hudson/Proc.java
          http://jenkins-ci.org/commit/jenkins/5500e5c00bc19e6c421c9aaa88f6dbb57c666db3
          Log:
          JENKINS-23271 - Process statuses of Remote process join() operations directly inside methods (#2653)

          • JENKINS-23271 - Process statuses of Remote process join() operations directly inside methods
          • JENKINS-23271 - Also prevent the issue when the kill() command is the last call in the usage sequence

          (cherry picked from commit 29893354a52efac20efcb4e0e91723f3bd889059)

          Compare: https://github.com/jenkinsci/jenkins/compare/dacbe03e0d4e...5500e5c00bc1

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Oleg Nenashev Path: core/src/main/java/hudson/Launcher.java core/src/main/java/hudson/Proc.java http://jenkins-ci.org/commit/jenkins/5500e5c00bc19e6c421c9aaa88f6dbb57c666db3 Log: JENKINS-23271 - Process statuses of Remote process join() operations directly inside methods (#2653) JENKINS-23271 - Process statuses of Remote process join() operations directly inside methods JENKINS-23271 - Also prevent the issue when the kill() command is the last call in the usage sequence (cherry picked from commit 29893354a52efac20efcb4e0e91723f3bd889059) Compare: https://github.com/jenkinsci/jenkins/compare/dacbe03e0d4e...5500e5c00bc1

          Code changed in jenkins
          User: Oleg Nenashev
          Path:
          docs/configuration.md
          src/main/java/hudson/remoting/Base64.java
          src/main/java/hudson/remoting/Channel.java
          src/main/java/hudson/remoting/ErrorPropagatingOutputStream.java
          src/main/java/hudson/remoting/ExportTable.java
          src/main/java/hudson/remoting/LocalChannel.java
          src/main/java/hudson/remoting/RemoteInvocationHandler.java
          src/main/java/hudson/remoting/VirtualChannel.java
          http://jenkins-ci.org/commit/remoting/27bb51dd14db44952348539d15ee004c90c248b2
          Log:
          Document and annotate object Export/Unexport logic

          During last releases we have fixed several issues related to the object export/unexport operations (JENKINS-23271, JENKINS-41852). I decided that it makes sense to improve the method documentation just to make Javadocs more explicit.

          The change also adds missing documentation of hudson.remoting.ExportTable.unexportLogSize

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Oleg Nenashev Path: docs/configuration.md src/main/java/hudson/remoting/Base64.java src/main/java/hudson/remoting/Channel.java src/main/java/hudson/remoting/ErrorPropagatingOutputStream.java src/main/java/hudson/remoting/ExportTable.java src/main/java/hudson/remoting/LocalChannel.java src/main/java/hudson/remoting/RemoteInvocationHandler.java src/main/java/hudson/remoting/VirtualChannel.java http://jenkins-ci.org/commit/remoting/27bb51dd14db44952348539d15ee004c90c248b2 Log: Document and annotate object Export/Unexport logic During last releases we have fixed several issues related to the object export/unexport operations ( JENKINS-23271 , JENKINS-41852 ). I decided that it makes sense to improve the method documentation just to make Javadocs more explicit. The change also adds missing documentation of hudson.remoting.ExportTable.unexportLogSize

            oleg_nenashev Oleg Nenashev
            pbecotte Paul Becotte
            Votes:
            83 Vote for this issue
            Watchers:
            102 Start watching this issue

              Created:
              Updated:
              Resolved: