-
Bug
-
Resolution: Done
-
Major
-
None
-
Platform: All, OS: All
As reported in this thread on the dev list, we have been seeing some odd
intermittent behavior since the Channel.unexport fix from issue 4045 was
released in Hudson 1.317:
http://www.nabble.com/Error-in-SubversionSCM-"Unable-to-call-getCredential"-td24799653.html
java.lang.IllegalStateException: Unable to call getCredential. Invalid
object ID 476
at
hudson.remoting.RemoteInvocationHandler$RPCRequest.perform(RemoteInvocationHandler.java:259)
at
hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:246)
at
hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:206)
at hudson.remoting.UserRequest.perform(UserRequest.java:92)
at hudson.remoting.UserRequest.perform(UserRequest.java:46)
at hudson.remoting.Request$2.run(Request.java:236)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
The issue is related to the fact that RemotableSVNAuthenticationProviderImpl is
a singleton that is exported multiple times during overlapping builds on the
same node. Here is roughly what happens in a two-executor scenario with two
SVN-based builds set up to trigger at the same time:
1. Executor #1 calls Entry.addRef as part of the first call to FilePath.act in
SubversionSCM (call A; reference count is now 1; proxy A is exported to slave)
2. Executor #0 calls Entry.addRef as part of the first call to FilePath.act in
SubversionSCM (call B; reference count is now 2; proxy B is exported to slave)
3. Executor #0 calls Entry.release as part of the first call to FilePath.act in
SubversionSCM (call B; reference count is now 1; proxy B is unexported, but
still uncollected on slave)
4. Executor #0 calls Entry.addRef as part of the second call to FilePath.act in
SubversionSCM (call C; reference count is now 2; proxy C is exported to slave)
5. Executor #1 calls Entry.release as part of the first call to FilePath.act in
SubversionSCM (call A; reference count is now 1; proxy A is unexported, but
still uncollected on slave)
6. Executor #1 calls Entry.addRef as part of the second call to FilePath.act in
SubversionSCM (call D; reference count is now 2; proxy D is exported to slave)
7. GC is triggered on the slave, causing proxy A and proxy B to be collected and
triggering UnexportCommand to be sent for each from RemoteInvocationHandler.finalize
8. Channel reader calls Entry.release in response to the first UnexportCommand
(reference count is now 1)
9. Channel reader calls Entry.release in response to the second UnexportCommand
(reference count is now 0; Entry is removed from the ExportTable, but still in
the ExportLists for calls C and D)
10. Slave Executor #0 invokes getCredential, causing an RPCRequest to be sent
back to the master with the oid that was just removed, resulting in exception
11. Slave Executor #1 invokes getCredential, causing an RPCRequest to be sent
back to the master with the oid that was just removed, resulting in exception
12. Executor #0 calls Entry.release as part of the second call to FilePath.act
in SubversionSCM (call C; reference count is now -1)
13. Executor #1 calls Entry.release as part of the second call to FilePath.act
in SubversionSCM (call D; reference count is now -2)
The reason this is intermittent is that the GC that cleans up the proxies it not
predictable. If it doesn't happen until after call C and call D complete, the
UnexportCommand's are simply ignored. I just got lucky being able to reproduce
it because I noticed that it would happen for the first build right after I
disconnected and reconnected the slave. The reason I believe this ended up
letting me reproduce it reliably was that the slave VM starts off with a low
default heap size and the chances of it needing to GC immediately when the two
builds hit it at the same time are much higher.
Also, the reason this can't happen in a single executor setup is that the two
FilePath.act calls are serialized and the Proxy for the auth provider has a
different oid each time. It is only in the case of these closely overlapping
calls with an intervening GC on the slave that you see the spurious ref-count
decrement happen.
I've successfully fixed this in our internal copy of Hudson by changing the
RemotableSVNAuthenticationProviderImpl from a true singleton to a per-thread
singleton tied to each executor thread. I have a patch for this and will provide
it shortly.
Also, this issue is not going to be limited to
RemotableSVNAuthenticationProviderImpl. Any place where a singleton is multiply
exported from overlapping calls could see the same problem. The reason we didn't
see this before I fixed issue 4045 was that the Channel.unexport(int) command
used to be broken and didn't have an effect on the reference count. This meant
that the UnexportCommands sent in step #7 above were effectively ignored by the
master.