Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-24965

Personal builds can encounter deadlock when fetching source

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • teamconcert-plugin
    • Jenkins 1.565.1
      TeamConcert 1.1.8
      Host: Red Hat Enterprise Linux Server 6.5
      RTC 4.0.3
      Jenkins running with 2 executors

      Sometimes, when running personal builds from RTC, the job will hang indefinitely. The console output for the job shows that the job is in the team concert pluging and is fetching source from RTC.

      Looking at the output from jenkins/ThreadDump, it would seem there is a deadlock situation occurring. Here is some example output:

      Executor #1 for master : executing Mainline personal build #458

      "Executor #1 for master : executing Mainline personal build #458" Id=945 Group=main TIMED_WAITING on java.util.ArrayList@5663b111
      at java.lang.Object.wait(Native Method)

      • waiting on java.util.ArrayList@5663b111
        at com.ibm.team.filesystem.client.internal.copyfileareas.BatchingLock.acquire(BatchingLock.java:446)
        at com.ibm.team.filesystem.client.internal.copyfileareas.CopyFileAreaManager.deregister(CopyFileAreaManager.java:350)
        at com.ibm.team.filesystem.client.internal.SharingManager.deregister(SharingManager.java:1358)
        at com.ibm.team.filesystem.client.internal.SharingManager.deregister(SharingManager.java:1316)
        at com.ibm.team.build.internal.hjplugin.rtc.RepositoryConnection.checkout(RepositoryConnection.java:457)
        at com.ibm.team.build.internal.hjplugin.rtc.RTCFacade.checkout(RTCFacade.java:390)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:622)
        at com.ibm.team.build.internal.hjplugin.RTCFacadeFactory$RTCFacadeWrapper.invoke(RTCFacadeFactory.java:115)
        at com.ibm.team.build.internal.hjplugin.RTCCheckoutTask.invoke(RTCCheckoutTask.java:166)
        at com.ibm.team.build.internal.hjplugin.RTCCheckoutTask.invoke(RTCCheckoutTask.java:32)
        at hudson.FilePath.act(FilePath.java:920)
        at hudson.FilePath.act(FilePath.java:893)
        at com.ibm.team.build.internal.hjplugin.RTCScm.checkout(RTCScm.java:1079)
        at hudson.model.AbstractProject.checkout(AbstractProject.java:1252)
        at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:615)
        at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
        at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:524)
        at hudson.model.Run.execute(Run.java:1706)
        at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
        at hudson.model.ResourceController.execute(ResourceController.java:88)
        at hudson.model.Executor.run(Executor.java:232)

      I spoke with Scott Cowan over Sametime and he suggested that it was trying to access a lock on a file on the master filesystem and there might be some shared state between builds. I've had a quick look over our builds and I can't see anything obvious that would cause the problem but I will continue to investigate.

      It should be noted that this only seems to occur for personal builds where a particular user's workspace is being used. Our normal builds have not seen this issue.

      Thanks,
      James

          [JENKINS-24965] Personal builds can encounter deadlock when fetching source

          Hi James,

          Sorry for the delay. Things have been a bit busy here and unfortunately I am away next week. I've looked over our code in this area but don't see anything obvious. The problem is at a lower level than us, but I don't quite understand what would trigger it.

          So a bunch of questions...

          Does this only happen for 1 user's personal build or is it randomly affecting all personal builds?
          If it affects just 1 (or a given set of) user, can you check their workspace and see if its visible to the user that the Jenkins job is configured to use. Also see if there are missing or extra components from what is normally in the build workspace. If there are extra components, are they visible to the user that the Jenkins job is configured to use. I don't think this is the problem, but I want to rule it out too.

          The sandbox is where the RTC repository workspace is to be loaded. Its defined on the Source Control tab of the build definition as the Load directory. If you set the path to ".", it will end up being in the Jenkins job's workspace.

          I am interested in knowing if you use any special features wrt loading (i.e. in the build definition, do you have delete directory before loading checked? Is Create folders for components checked?)

          When it happens, can you look at the job's log. I am interested in all the messages that are related to the check out (or possibly corrupted metadata). It will also help to understand how far it got during the load. I think that they would follow the "RTC Checkout : Source control setup" message.

          Does the build result have an activity for Fetching Files... If so, was it completed?

          Does trying to cancel the build help when this happens?

          Heather Fraser-Dube added a comment - Hi James, Sorry for the delay. Things have been a bit busy here and unfortunately I am away next week. I've looked over our code in this area but don't see anything obvious. The problem is at a lower level than us, but I don't quite understand what would trigger it. So a bunch of questions... Does this only happen for 1 user's personal build or is it randomly affecting all personal builds? If it affects just 1 (or a given set of) user, can you check their workspace and see if its visible to the user that the Jenkins job is configured to use. Also see if there are missing or extra components from what is normally in the build workspace. If there are extra components, are they visible to the user that the Jenkins job is configured to use. I don't think this is the problem, but I want to rule it out too. The sandbox is where the RTC repository workspace is to be loaded. Its defined on the Source Control tab of the build definition as the Load directory. If you set the path to ".", it will end up being in the Jenkins job's workspace. I am interested in knowing if you use any special features wrt loading (i.e. in the build definition, do you have delete directory before loading checked? Is Create folders for components checked?) When it happens, can you look at the job's log. I am interested in all the messages that are related to the check out (or possibly corrupted metadata). It will also help to understand how far it got during the load. I think that they would follow the "RTC Checkout : Source control setup" message. Does the build result have an activity for Fetching Files... If so, was it completed? Does trying to cancel the build help when this happens?

          James Clark added a comment -

          Hi Heather, no worries - late replying myself .

          I've seen it happen to a handful of different people, including myself. My workspace is public (as we encourage all of our team to do) so visibility shouldn't be an issue. We only have a couple of components that rarely change so I imagine it's unlikely any of the hanging builds had something different, component-wise.

          The load directory for the build definition is set to 'src' so that it checks out to

          {JOB_WORKSPACE}

          /src.

          There are no special actions checked nor any excludes set for the load actions or accept options. I guess these are default values but I can send you exactly what option has what value if you need (assuming some might not be blank by default).

          There haven't been many personal builds lately so I unfortunately don't have the output log to hand. Next time I see a hanging build, I will update this ticket.

          Not quite sure what you mean by the build result having an activity for fetching files but the build doesn't get to the point of actually building anything. It tries to checkout the src code and never comes back from there.

          Trying to abort the build via Jenkins has no effect. The only way to stop it is to kill the master Jenkins process.

          Hope that helps, let me know if you need anything else!

          Thanks,
          James

          James Clark added a comment - Hi Heather, no worries - late replying myself . I've seen it happen to a handful of different people, including myself. My workspace is public (as we encourage all of our team to do) so visibility shouldn't be an issue. We only have a couple of components that rarely change so I imagine it's unlikely any of the hanging builds had something different, component-wise. The load directory for the build definition is set to 'src' so that it checks out to {JOB_WORKSPACE} /src. There are no special actions checked nor any excludes set for the load actions or accept options. I guess these are default values but I can send you exactly what option has what value if you need (assuming some might not be blank by default). There haven't been many personal builds lately so I unfortunately don't have the output log to hand. Next time I see a hanging build, I will update this ticket. Not quite sure what you mean by the build result having an activity for fetching files but the build doesn't get to the point of actually building anything. It tries to checkout the src code and never comes back from there. Trying to abort the build via Jenkins has no effect. The only way to stop it is to kill the master Jenkins process. Hope that helps, let me know if you need anything else! Thanks, James

          James Clark added a comment -

          Hi Heather,

          Had another hanging build today. Here is the Jenkins console output:

          Started by user anonymous
          Building on master in workspace /ccbdata/build/TRUNKP
          RTC : checkout...
          RTC : Build initiated by request from RTC
          RTC Checkout : Source control setup
          RTC Checkout : Fetching files to fetch destination "/ccbdata/build/TRUNKP/src" ...

          It got stuck there and just had the loading symbol.

          Thanks,
          James

          James Clark added a comment - Hi Heather, Had another hanging build today. Here is the Jenkins console output: Started by user anonymous Building on master in workspace /ccbdata/build/TRUNKP RTC : checkout... RTC : Build initiated by request from RTC RTC Checkout : Source control setup RTC Checkout : Fetching files to fetch destination "/ccbdata/build/TRUNKP/src" ... It got stuck there and just had the loading symbol. Thanks, James

          Hi James

          Can you look at the RTC Build result for this build that was started in RTC. When you open the build result there are tabs of different pieces of info. Is there an Activities tab? Does the Activities tab have any entries. Specifically I am interested in one that says "Fetching files" and whether it was complete or not.

          Can you tell if the source code was all loaded or not?
          Was the stack trace showing it was stuck in the same place again?
          Were there other threads (or the other executor) running and were any of them running RTC or Plugin code?

          Did the build prior to this one finish normally or was it cancelled?

          Heather Fraser-Dube added a comment - Hi James Can you look at the RTC Build result for this build that was started in RTC. When you open the build result there are tabs of different pieces of info. Is there an Activities tab? Does the Activities tab have any entries. Specifically I am interested in one that says "Fetching files" and whether it was complete or not. Can you tell if the source code was all loaded or not? Was the stack trace showing it was stuck in the same place again? Were there other threads (or the other executor) running and were any of them running RTC or Plugin code? Did the build prior to this one finish normally or was it cancelled?

          Heather Fraser-Dube added a comment - I have raised : 336647: Personal builds can encounter deadlock when fetching source for this in jazz.net

          James Clark added a comment -

          Hi Heather,

          I've attached a screenshot of the activities tab for the build result as viewed in RTC to the jazz.net work item that you posted.

          I unfortunately can't tell if all the source code was loaded as subsequent personal builds have been done which have re-used the workspace.

          The thread dump is no longer available for that build, however I did look at it and it was waiting for a lock at the same place.

          I can't say for this build in particular but generally, we have 2 executors running and they will both be RTC builds. Therefore it is likely that there was another build running at the same time that would have checked out source from RTC etc.

          The previous personal build before that had failed for a legitimate reason. If you're talking about a build for any job that was running at the same time as the one that hanged, I'm not sure what build that was as the timing info for the hanging build has been lost after killing the Jenkins process.

          Thanks,
          James

          James Clark added a comment - Hi Heather, I've attached a screenshot of the activities tab for the build result as viewed in RTC to the jazz.net work item that you posted. I unfortunately can't tell if all the source code was loaded as subsequent personal builds have been done which have re-used the workspace. The thread dump is no longer available for that build, however I did look at it and it was waiting for a lock at the same place. I can't say for this build in particular but generally, we have 2 executors running and they will both be RTC builds. Therefore it is likely that there was another build running at the same time that would have checked out source from RTC etc. The previous personal build before that had failed for a legitimate reason. If you're talking about a build for any job that was running at the same time as the one that hanged, I'm not sure what build that was as the timing info for the hanging build has been lost after killing the Jenkins process. Thanks, James

          Hi James,

          I think the thread dump (of all the threads) would help (for next time).

          By previous build, I mean the Jenkins build for the same job that just ran previous to it. For example it looks like TRUNKP was the job that was run with the personal build workspace. If that was build #73 then how did the TRUNKP build #72 end? Legitimate failure (test failure as opposed to not a failure during the load step, not cancelled). Just trying to exclude the idea of a previous build impacting the next one.

          Heather Fraser-Dube added a comment - Hi James, I think the thread dump (of all the threads) would help (for next time). By previous build, I mean the Jenkins build for the same job that just ran previous to it. For example it looks like TRUNKP was the job that was run with the personal build workspace. If that was build #73 then how did the TRUNKP build #72 end? Legitimate failure (test failure as opposed to not a failure during the load step, not cancelled). Just trying to exclude the idea of a previous build impacting the next one.

          James Clark added a comment -

          Hi Heather,

          The build had failed before that. The load step had passed and it was a legitimate failure during the build.

          I'll try and grab all thread data next time .

          Thanks,
          James

          James Clark added a comment - Hi Heather, The build had failed before that. The load step had passed and it was a legitimate failure during the build. I'll try and grab all thread data next time . Thanks, James

          James Clark added a comment -

          Hi,

          Just updated the jazz work item with the full threadDump as reported from Jenkins at the time of another personal build hanging.

          Thanks,
          James

          James Clark added a comment - Hi, Just updated the jazz work item with the full threadDump as reported from Jenkins at the time of another personal build hanging. Thanks, James

          pjdarton added a comment - - edited

          I've had this happen on my setup (Jenkins 2.7.4, RTC-Jenkins plugin version 1.2.0.1) too, but I'm not using "Personal builds", just the impersonal kind.
          Judging from the comments in https://jazz.net/jazz/web/projects/Rational%20Team%20Concert#action=com.ibm.team.workitem.viewWorkItem&id=336647 it looks like this was caused by a fault in the RTC build toolkit (which the Jenkins-RTC plugin uses) rather than the Jenkins-RTC plugin itself, and that the cure is to replace the build toolkit with a bugfixed one.
          So that "just" leaves the challenge of obtaining a bugfixed one for the version of RTC you're using, as one needs to use a toolkit that's the same version as the server, or at most one major version behind the server, so that'll require a backport of the bugfix to the RTC4 and RTC5 toolkits...

          jcdclark Did you ever raise a PMR and obtain a backport? If so, are the bugfixed versions available anywhere (internally or externally...)

          pjdarton added a comment - - edited I've had this happen on my setup (Jenkins 2.7.4, RTC-Jenkins plugin version 1.2.0.1) too, but I'm not using "Personal builds", just the impersonal kind. Judging from the comments in https://jazz.net/jazz/web/projects/Rational%20Team%20Concert#action=com.ibm.team.workitem.viewWorkItem&id=336647 it looks like this was caused by a fault in the RTC build toolkit (which the Jenkins-RTC plugin uses) rather than the Jenkins-RTC plugin itself, and that the cure is to replace the build toolkit with a bugfixed one. So that "just" leaves the challenge of obtaining a bugfixed one for the version of RTC you're using, as one needs to use a toolkit that's the same version as the server, or at most one major version behind the server, so that'll require a backport of the bugfix to the RTC4 and RTC5 toolkits... jcdclark Did you ever raise a PMR and obtain a backport? If so, are the bugfixed versions available anywhere (internally or externally...)

            Unassigned Unassigned
            jcdclark James Clark
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: