Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-24965

Personal builds can encounter deadlock when fetching source

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • teamconcert-plugin
    • Jenkins 1.565.1
      TeamConcert 1.1.8
      Host: Red Hat Enterprise Linux Server 6.5
      RTC 4.0.3
      Jenkins running with 2 executors

      Sometimes, when running personal builds from RTC, the job will hang indefinitely. The console output for the job shows that the job is in the team concert pluging and is fetching source from RTC.

      Looking at the output from jenkins/ThreadDump, it would seem there is a deadlock situation occurring. Here is some example output:

      Executor #1 for master : executing Mainline personal build #458

      "Executor #1 for master : executing Mainline personal build #458" Id=945 Group=main TIMED_WAITING on java.util.ArrayList@5663b111
      at java.lang.Object.wait(Native Method)

      • waiting on java.util.ArrayList@5663b111
        at com.ibm.team.filesystem.client.internal.copyfileareas.BatchingLock.acquire(BatchingLock.java:446)
        at com.ibm.team.filesystem.client.internal.copyfileareas.CopyFileAreaManager.deregister(CopyFileAreaManager.java:350)
        at com.ibm.team.filesystem.client.internal.SharingManager.deregister(SharingManager.java:1358)
        at com.ibm.team.filesystem.client.internal.SharingManager.deregister(SharingManager.java:1316)
        at com.ibm.team.build.internal.hjplugin.rtc.RepositoryConnection.checkout(RepositoryConnection.java:457)
        at com.ibm.team.build.internal.hjplugin.rtc.RTCFacade.checkout(RTCFacade.java:390)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:622)
        at com.ibm.team.build.internal.hjplugin.RTCFacadeFactory$RTCFacadeWrapper.invoke(RTCFacadeFactory.java:115)
        at com.ibm.team.build.internal.hjplugin.RTCCheckoutTask.invoke(RTCCheckoutTask.java:166)
        at com.ibm.team.build.internal.hjplugin.RTCCheckoutTask.invoke(RTCCheckoutTask.java:32)
        at hudson.FilePath.act(FilePath.java:920)
        at hudson.FilePath.act(FilePath.java:893)
        at com.ibm.team.build.internal.hjplugin.RTCScm.checkout(RTCScm.java:1079)
        at hudson.model.AbstractProject.checkout(AbstractProject.java:1252)
        at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:615)
        at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
        at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:524)
        at hudson.model.Run.execute(Run.java:1706)
        at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
        at hudson.model.ResourceController.execute(ResourceController.java:88)
        at hudson.model.Executor.run(Executor.java:232)

      I spoke with Scott Cowan over Sametime and he suggested that it was trying to access a lock on a file on the master filesystem and there might be some shared state between builds. I've had a quick look over our builds and I can't see anything obvious that would cause the problem but I will continue to investigate.

      It should be noted that this only seems to occur for personal builds where a particular user's workspace is being used. Our normal builds have not seen this issue.

      Thanks,
      James

          [JENKINS-24965] Personal builds can encounter deadlock when fetching source

          James Clark created issue -

          What version of the build toolkit are you using?

          Do you allow the same Jenkins job to run with more than one instance? By default I think Jenkins allows only 1 instance of a job to run at a time.

          Do you know if the checkout of the job was successful or if an error occurred. The stack trace shows that it is trying to release the lock on the sandbox.

          Do the personal builds use the same sandbox? If so, then it is not a good idea to run them in parallel (one will overwrite the sandbox with different contents in the middle of the build potentially.

          Are the jobs running on the Jenkins master or on slaves when this occurs?

          Heather Fraser-Dube added a comment - What version of the build toolkit are you using? Do you allow the same Jenkins job to run with more than one instance? By default I think Jenkins allows only 1 instance of a job to run at a time. Do you know if the checkout of the job was successful or if an error occurred. The stack trace shows that it is trying to release the lock on the sandbox. Do the personal builds use the same sandbox? If so, then it is not a good idea to run them in parallel (one will overwrite the sandbox with different contents in the middle of the build potentially. Are the jobs running on the Jenkins master or on slaves when this occurs?

          James Clark added a comment -

          Hi Heather, thanks for the reply.

          We're using the 4.0.3 version of the toolkit.

          There is only ever one instance of a job running at one time.

          I'm not sure whether the checkout finished or not. I believe the console output in Jenkins keeps saying that's it's fetching the source. I'll have to confirm next time I see it happen.

          I'm not too sure on the meaning of sandbox here but the when starting a personal build, each user selects their workspace with changes checked in for the build to use. In this respect, each repository workspace is different. All personal builds use the same job workspace as configured in Jenkins, but as there is only one job running at a time, there shouldn't be any overwriting going on.

          The jobs run on the Jenkins master.

          Thanks!
          JC

          James Clark added a comment - Hi Heather, thanks for the reply. We're using the 4.0.3 version of the toolkit. There is only ever one instance of a job running at one time. I'm not sure whether the checkout finished or not. I believe the console output in Jenkins keeps saying that's it's fetching the source. I'll have to confirm next time I see it happen. I'm not too sure on the meaning of sandbox here but the when starting a personal build, each user selects their workspace with changes checked in for the build to use. In this respect, each repository workspace is different. All personal builds use the same job workspace as configured in Jenkins, but as there is only one job running at a time, there shouldn't be any overwriting going on. The jobs run on the Jenkins master. Thanks! JC

          Hi James,

          Sorry for the delay. Things have been a bit busy here and unfortunately I am away next week. I've looked over our code in this area but don't see anything obvious. The problem is at a lower level than us, but I don't quite understand what would trigger it.

          So a bunch of questions...

          Does this only happen for 1 user's personal build or is it randomly affecting all personal builds?
          If it affects just 1 (or a given set of) user, can you check their workspace and see if its visible to the user that the Jenkins job is configured to use. Also see if there are missing or extra components from what is normally in the build workspace. If there are extra components, are they visible to the user that the Jenkins job is configured to use. I don't think this is the problem, but I want to rule it out too.

          The sandbox is where the RTC repository workspace is to be loaded. Its defined on the Source Control tab of the build definition as the Load directory. If you set the path to ".", it will end up being in the Jenkins job's workspace.

          I am interested in knowing if you use any special features wrt loading (i.e. in the build definition, do you have delete directory before loading checked? Is Create folders for components checked?)

          When it happens, can you look at the job's log. I am interested in all the messages that are related to the check out (or possibly corrupted metadata). It will also help to understand how far it got during the load. I think that they would follow the "RTC Checkout : Source control setup" message.

          Does the build result have an activity for Fetching Files... If so, was it completed?

          Does trying to cancel the build help when this happens?

          Heather Fraser-Dube added a comment - Hi James, Sorry for the delay. Things have been a bit busy here and unfortunately I am away next week. I've looked over our code in this area but don't see anything obvious. The problem is at a lower level than us, but I don't quite understand what would trigger it. So a bunch of questions... Does this only happen for 1 user's personal build or is it randomly affecting all personal builds? If it affects just 1 (or a given set of) user, can you check their workspace and see if its visible to the user that the Jenkins job is configured to use. Also see if there are missing or extra components from what is normally in the build workspace. If there are extra components, are they visible to the user that the Jenkins job is configured to use. I don't think this is the problem, but I want to rule it out too. The sandbox is where the RTC repository workspace is to be loaded. Its defined on the Source Control tab of the build definition as the Load directory. If you set the path to ".", it will end up being in the Jenkins job's workspace. I am interested in knowing if you use any special features wrt loading (i.e. in the build definition, do you have delete directory before loading checked? Is Create folders for components checked?) When it happens, can you look at the job's log. I am interested in all the messages that are related to the check out (or possibly corrupted metadata). It will also help to understand how far it got during the load. I think that they would follow the "RTC Checkout : Source control setup" message. Does the build result have an activity for Fetching Files... If so, was it completed? Does trying to cancel the build help when this happens?

          James Clark added a comment -

          Hi Heather, no worries - late replying myself .

          I've seen it happen to a handful of different people, including myself. My workspace is public (as we encourage all of our team to do) so visibility shouldn't be an issue. We only have a couple of components that rarely change so I imagine it's unlikely any of the hanging builds had something different, component-wise.

          The load directory for the build definition is set to 'src' so that it checks out to

          {JOB_WORKSPACE}

          /src.

          There are no special actions checked nor any excludes set for the load actions or accept options. I guess these are default values but I can send you exactly what option has what value if you need (assuming some might not be blank by default).

          There haven't been many personal builds lately so I unfortunately don't have the output log to hand. Next time I see a hanging build, I will update this ticket.

          Not quite sure what you mean by the build result having an activity for fetching files but the build doesn't get to the point of actually building anything. It tries to checkout the src code and never comes back from there.

          Trying to abort the build via Jenkins has no effect. The only way to stop it is to kill the master Jenkins process.

          Hope that helps, let me know if you need anything else!

          Thanks,
          James

          James Clark added a comment - Hi Heather, no worries - late replying myself . I've seen it happen to a handful of different people, including myself. My workspace is public (as we encourage all of our team to do) so visibility shouldn't be an issue. We only have a couple of components that rarely change so I imagine it's unlikely any of the hanging builds had something different, component-wise. The load directory for the build definition is set to 'src' so that it checks out to {JOB_WORKSPACE} /src. There are no special actions checked nor any excludes set for the load actions or accept options. I guess these are default values but I can send you exactly what option has what value if you need (assuming some might not be blank by default). There haven't been many personal builds lately so I unfortunately don't have the output log to hand. Next time I see a hanging build, I will update this ticket. Not quite sure what you mean by the build result having an activity for fetching files but the build doesn't get to the point of actually building anything. It tries to checkout the src code and never comes back from there. Trying to abort the build via Jenkins has no effect. The only way to stop it is to kill the master Jenkins process. Hope that helps, let me know if you need anything else! Thanks, James

          James Clark added a comment -

          Hi Heather,

          Had another hanging build today. Here is the Jenkins console output:

          Started by user anonymous
          Building on master in workspace /ccbdata/build/TRUNKP
          RTC : checkout...
          RTC : Build initiated by request from RTC
          RTC Checkout : Source control setup
          RTC Checkout : Fetching files to fetch destination "/ccbdata/build/TRUNKP/src" ...

          It got stuck there and just had the loading symbol.

          Thanks,
          James

          James Clark added a comment - Hi Heather, Had another hanging build today. Here is the Jenkins console output: Started by user anonymous Building on master in workspace /ccbdata/build/TRUNKP RTC : checkout... RTC : Build initiated by request from RTC RTC Checkout : Source control setup RTC Checkout : Fetching files to fetch destination "/ccbdata/build/TRUNKP/src" ... It got stuck there and just had the loading symbol. Thanks, James

          Hi James

          Can you look at the RTC Build result for this build that was started in RTC. When you open the build result there are tabs of different pieces of info. Is there an Activities tab? Does the Activities tab have any entries. Specifically I am interested in one that says "Fetching files" and whether it was complete or not.

          Can you tell if the source code was all loaded or not?
          Was the stack trace showing it was stuck in the same place again?
          Were there other threads (or the other executor) running and were any of them running RTC or Plugin code?

          Did the build prior to this one finish normally or was it cancelled?

          Heather Fraser-Dube added a comment - Hi James Can you look at the RTC Build result for this build that was started in RTC. When you open the build result there are tabs of different pieces of info. Is there an Activities tab? Does the Activities tab have any entries. Specifically I am interested in one that says "Fetching files" and whether it was complete or not. Can you tell if the source code was all loaded or not? Was the stack trace showing it was stuck in the same place again? Were there other threads (or the other executor) running and were any of them running RTC or Plugin code? Did the build prior to this one finish normally or was it cancelled?

          Heather Fraser-Dube added a comment - I have raised : 336647: Personal builds can encounter deadlock when fetching source for this in jazz.net

          James Clark added a comment -

          Hi Heather,

          I've attached a screenshot of the activities tab for the build result as viewed in RTC to the jazz.net work item that you posted.

          I unfortunately can't tell if all the source code was loaded as subsequent personal builds have been done which have re-used the workspace.

          The thread dump is no longer available for that build, however I did look at it and it was waiting for a lock at the same place.

          I can't say for this build in particular but generally, we have 2 executors running and they will both be RTC builds. Therefore it is likely that there was another build running at the same time that would have checked out source from RTC etc.

          The previous personal build before that had failed for a legitimate reason. If you're talking about a build for any job that was running at the same time as the one that hanged, I'm not sure what build that was as the timing info for the hanging build has been lost after killing the Jenkins process.

          Thanks,
          James

          James Clark added a comment - Hi Heather, I've attached a screenshot of the activities tab for the build result as viewed in RTC to the jazz.net work item that you posted. I unfortunately can't tell if all the source code was loaded as subsequent personal builds have been done which have re-used the workspace. The thread dump is no longer available for that build, however I did look at it and it was waiting for a lock at the same place. I can't say for this build in particular but generally, we have 2 executors running and they will both be RTC builds. Therefore it is likely that there was another build running at the same time that would have checked out source from RTC etc. The previous personal build before that had failed for a legitimate reason. If you're talking about a build for any job that was running at the same time as the one that hanged, I'm not sure what build that was as the timing info for the hanging build has been lost after killing the Jenkins process. Thanks, James

          Hi James,

          I think the thread dump (of all the threads) would help (for next time).

          By previous build, I mean the Jenkins build for the same job that just ran previous to it. For example it looks like TRUNKP was the job that was run with the personal build workspace. If that was build #73 then how did the TRUNKP build #72 end? Legitimate failure (test failure as opposed to not a failure during the load step, not cancelled). Just trying to exclude the idea of a previous build impacting the next one.

          Heather Fraser-Dube added a comment - Hi James, I think the thread dump (of all the threads) would help (for next time). By previous build, I mean the Jenkins build for the same job that just ran previous to it. For example it looks like TRUNKP was the job that was run with the personal build workspace. If that was build #73 then how did the TRUNKP build #72 end? Legitimate failure (test failure as opposed to not a failure during the load step, not cancelled). Just trying to exclude the idea of a previous build impacting the next one.

            Unassigned Unassigned
            jcdclark James Clark
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: