Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-24965

Personal builds can encounter deadlock when fetching source

    XMLWordPrintable

Details

    Description

      Sometimes, when running personal builds from RTC, the job will hang indefinitely. The console output for the job shows that the job is in the team concert pluging and is fetching source from RTC.

      Looking at the output from jenkins/ThreadDump, it would seem there is a deadlock situation occurring. Here is some example output:

      Executor #1 for master : executing Mainline personal build #458

      "Executor #1 for master : executing Mainline personal build #458" Id=945 Group=main TIMED_WAITING on java.util.ArrayList@5663b111
      at java.lang.Object.wait(Native Method)

      • waiting on java.util.ArrayList@5663b111
        at com.ibm.team.filesystem.client.internal.copyfileareas.BatchingLock.acquire(BatchingLock.java:446)
        at com.ibm.team.filesystem.client.internal.copyfileareas.CopyFileAreaManager.deregister(CopyFileAreaManager.java:350)
        at com.ibm.team.filesystem.client.internal.SharingManager.deregister(SharingManager.java:1358)
        at com.ibm.team.filesystem.client.internal.SharingManager.deregister(SharingManager.java:1316)
        at com.ibm.team.build.internal.hjplugin.rtc.RepositoryConnection.checkout(RepositoryConnection.java:457)
        at com.ibm.team.build.internal.hjplugin.rtc.RTCFacade.checkout(RTCFacade.java:390)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:622)
        at com.ibm.team.build.internal.hjplugin.RTCFacadeFactory$RTCFacadeWrapper.invoke(RTCFacadeFactory.java:115)
        at com.ibm.team.build.internal.hjplugin.RTCCheckoutTask.invoke(RTCCheckoutTask.java:166)
        at com.ibm.team.build.internal.hjplugin.RTCCheckoutTask.invoke(RTCCheckoutTask.java:32)
        at hudson.FilePath.act(FilePath.java:920)
        at hudson.FilePath.act(FilePath.java:893)
        at com.ibm.team.build.internal.hjplugin.RTCScm.checkout(RTCScm.java:1079)
        at hudson.model.AbstractProject.checkout(AbstractProject.java:1252)
        at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:615)
        at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
        at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:524)
        at hudson.model.Run.execute(Run.java:1706)
        at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
        at hudson.model.ResourceController.execute(ResourceController.java:88)
        at hudson.model.Executor.run(Executor.java:232)

      I spoke with Scott Cowan over Sametime and he suggested that it was trying to access a lock on a file on the master filesystem and there might be some shared state between builds. I've had a quick look over our builds and I can't see anything obvious that would cause the problem but I will continue to investigate.

      It should be noted that this only seems to occur for personal builds where a particular user's workspace is being used. Our normal builds have not seen this issue.

      Thanks,
      James

      Attachments

        Activity

          jcdclark James Clark added a comment -

          Hi Heather,

          I've attached a screenshot of the activities tab for the build result as viewed in RTC to the jazz.net work item that you posted.

          I unfortunately can't tell if all the source code was loaded as subsequent personal builds have been done which have re-used the workspace.

          The thread dump is no longer available for that build, however I did look at it and it was waiting for a lock at the same place.

          I can't say for this build in particular but generally, we have 2 executors running and they will both be RTC builds. Therefore it is likely that there was another build running at the same time that would have checked out source from RTC etc.

          The previous personal build before that had failed for a legitimate reason. If you're talking about a build for any job that was running at the same time as the one that hanged, I'm not sure what build that was as the timing info for the hanging build has been lost after killing the Jenkins process.

          Thanks,
          James

          jcdclark James Clark added a comment - Hi Heather, I've attached a screenshot of the activities tab for the build result as viewed in RTC to the jazz.net work item that you posted. I unfortunately can't tell if all the source code was loaded as subsequent personal builds have been done which have re-used the workspace. The thread dump is no longer available for that build, however I did look at it and it was waiting for a lock at the same place. I can't say for this build in particular but generally, we have 2 executors running and they will both be RTC builds. Therefore it is likely that there was another build running at the same time that would have checked out source from RTC etc. The previous personal build before that had failed for a legitimate reason. If you're talking about a build for any job that was running at the same time as the one that hanged, I'm not sure what build that was as the timing info for the hanging build has been lost after killing the Jenkins process. Thanks, James

          Hi James,

          I think the thread dump (of all the threads) would help (for next time).

          By previous build, I mean the Jenkins build for the same job that just ran previous to it. For example it looks like TRUNKP was the job that was run with the personal build workspace. If that was build #73 then how did the TRUNKP build #72 end? Legitimate failure (test failure as opposed to not a failure during the load step, not cancelled). Just trying to exclude the idea of a previous build impacting the next one.

          hfraserdube Heather Fraser-Dube added a comment - Hi James, I think the thread dump (of all the threads) would help (for next time). By previous build, I mean the Jenkins build for the same job that just ran previous to it. For example it looks like TRUNKP was the job that was run with the personal build workspace. If that was build #73 then how did the TRUNKP build #72 end? Legitimate failure (test failure as opposed to not a failure during the load step, not cancelled). Just trying to exclude the idea of a previous build impacting the next one.
          jcdclark James Clark added a comment -

          Hi Heather,

          The build had failed before that. The load step had passed and it was a legitimate failure during the build.

          I'll try and grab all thread data next time .

          Thanks,
          James

          jcdclark James Clark added a comment - Hi Heather, The build had failed before that. The load step had passed and it was a legitimate failure during the build. I'll try and grab all thread data next time . Thanks, James
          jcdclark James Clark added a comment -

          Hi,

          Just updated the jazz work item with the full threadDump as reported from Jenkins at the time of another personal build hanging.

          Thanks,
          James

          jcdclark James Clark added a comment - Hi, Just updated the jazz work item with the full threadDump as reported from Jenkins at the time of another personal build hanging. Thanks, James
          pjdarton pjdarton added a comment - - edited

          I've had this happen on my setup (Jenkins 2.7.4, RTC-Jenkins plugin version 1.2.0.1) too, but I'm not using "Personal builds", just the impersonal kind.
          Judging from the comments in https://jazz.net/jazz/web/projects/Rational%20Team%20Concert#action=com.ibm.team.workitem.viewWorkItem&id=336647 it looks like this was caused by a fault in the RTC build toolkit (which the Jenkins-RTC plugin uses) rather than the Jenkins-RTC plugin itself, and that the cure is to replace the build toolkit with a bugfixed one.
          So that "just" leaves the challenge of obtaining a bugfixed one for the version of RTC you're using, as one needs to use a toolkit that's the same version as the server, or at most one major version behind the server, so that'll require a backport of the bugfix to the RTC4 and RTC5 toolkits...

          jcdclark Did you ever raise a PMR and obtain a backport? If so, are the bugfixed versions available anywhere (internally or externally...)

          pjdarton pjdarton added a comment - - edited I've had this happen on my setup (Jenkins 2.7.4, RTC-Jenkins plugin version 1.2.0.1) too, but I'm not using "Personal builds", just the impersonal kind. Judging from the comments in https://jazz.net/jazz/web/projects/Rational%20Team%20Concert#action=com.ibm.team.workitem.viewWorkItem&id=336647 it looks like this was caused by a fault in the RTC build toolkit (which the Jenkins-RTC plugin uses) rather than the Jenkins-RTC plugin itself, and that the cure is to replace the build toolkit with a bugfixed one. So that "just" leaves the challenge of obtaining a bugfixed one for the version of RTC you're using, as one needs to use a toolkit that's the same version as the server, or at most one major version behind the server, so that'll require a backport of the bugfix to the RTC4 and RTC5 toolkits... jcdclark Did you ever raise a PMR and obtain a backport? If so, are the bugfixed versions available anywhere (internally or externally...)

          People

            Unassigned Unassigned
            jcdclark James Clark
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: