Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-9675

Deleting a Job hangs and blocks all other Perforce activities

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • p4-plugin
    • None
    • Master and Slaves on Linux (RedHat), observed behavior on both Jenkins 1.3999 and 1.411, Perforce Client is Rev. P4/LINUX26X86_64/2010.1/265509 (2010/09/23), Perforce server is P4D/LINUX26X86_64/2010.1/265509 (2010/09/23)

      I'm not sure with what version of the Perforce plugin this started, but currently, using 1.2.5, when we try to delete a job using the UI, it sometimes completes normally, printing the following to the log:

      May 11, 2011 4:48:24 PM hudson.plugins.perforce.PerforceSCM processWorkspaceBeforeDeletion
      INFO: Workspace is being deleted; enabling one-time force sync.
      May 11, 2011 4:48:24 PM hudson.plugins.perforce.PerforceSCM processWorkspaceBeforeDeletion
      INFO: Using remote perforce client: hudson-cjo2011.03-initdb-76240347
      May 11, 2011 4:48:24 PM hudson.plugins.perforce.PerforceSCM processWorkspaceBeforeDeletion
      INFO: [cjo2011.03-initdb] $ p4 workspace -o hudson-cjo2011.03-initdb-76240347
      May 11, 2011 4:48:24 PM hudson.plugins.perforce.PerforceSCM processWorkspaceBeforeDeletion
      INFO: Changing P4 Client Root to: /home/cruise/hudson/workspace/cjo2011.03-initdb
      May 11, 2011 4:48:24 PM hudson.plugins.perforce.PerforceSCM processWorkspaceBeforeDeletion
      INFO: Saving modified client hudson-cjo2011.03-initdb-76240347
      May 11, 2011 4:48:24 PM hudson.plugins.perforce.PerforceSCM processWorkspaceBeforeDeletion
      INFO: [cjo2011.03-initdb] $ p4 -s client -i
      May 11, 2011 4:48:24 PM hudson.plugins.perforce.PerforceSCM processWorkspaceBeforeDeletion
      INFO: [cjo2011.03-initdb] $ p4 sync -k //hudson-cjo2011.03-initdb-76240347/...#0

      But more often it only gets this far:

      May 11, 2011 4:49:13 PM hudson.plugins.perforce.PerforceSCM processWorkspaceBeforeDeletion
      INFO: Workspace is being deleted; enabling one-time force sync.
      May 11, 2011 4:49:13 PM hudson.plugins.perforce.PerforceSCM processWorkspaceBeforeDeletion
      INFO: Using remote perforce client: hudson-cjo2011.03-webservices-e2e-76240347
      May 11, 2011 4:49:13 PM hudson.plugins.perforce.PerforceSCM processWorkspaceBeforeDeletion
      INFO: [cjo2011.03-webservices-e2e] $ p4 workspace -o hudson-cjo2011.03-webservices-e2e-76240347

      At this point the thread handling the request seems to be stuck at:

      Handling POST /view/bugfix/job/cjo2011.03-webservices-e2e/doDelete : RequestHandlerThread62
      java.lang.Object.wait(Native Method)
      hudson.remoting.FastPipedInputStream.read(FastPipedInputStream.java:173)
      ...
      java.io.BufferedReader.readLine(BufferedReader.java:362)
      com.tek42.perforce.parse.AbstractPerforceTemplate.getPerforceResponse(AbstractPerforceTemplate.java:329)
      com.tek42.perforce.parse.AbstractPerforceTemplate.getPerforceResponse(AbstractPerforceTemplate.java:291)
      com.tek42.perforce.parse.Workspaces.getWorkspace(Workspaces.java:53)
      hudson.plugins.perforce.PerforceSCM.getPerforceWorkspace(PerforceSCM.java:1183)
      hudson.plugins.perforce.PerforceSCM.processWorkspaceBeforeDeletion(PerforceSCM.java:2202)

      I've attached the full stack trace from the 'monitoring' page (jenkinsDeleteThread_stackTrace.txt). When this happens, all perforce polling stops. If I kill this thread using the monitor page, perforce activities start up again, but the job that was to be deleted is still present.

      We've also tried deleting a job with the CLI client, and this also hangs, but if we kill the client, the job does get deleted. I've also attached the log output from when this hung client is killed (jenkinsKilledCLIClient_logOutput.txt).

      Let me know if there is any other information I can provide to help debug this.

          [JENKINS-9675] Deleting a Job hangs and blocks all other Perforce activities

          Rob Petti added a comment - - edited
          1. Does it always hang on the same perforce command, or does it hang on different ones occasionally?
          2. When it hangs, what is the output of
            p4 monitor show -ale
          3. What are the java processes doing cpu/memory wise during a hang? Is it using up more and more memory, or 100% cpu? How about the p4 process?
          4. If you run
            p4 workspace -o hudson-cjo2011.03-webservices-e2e-76240347

            from the slave, does it hang as well? If not, how many lines are returned?

          5. Are your slave agents up to date?
          6. Do you ever experience hangs like this during builds and/or polling normally?
          7. And just to clarify, you are trying to delete the job completely correct? Not just wipe out the workspace?

          Rob Petti added a comment - - edited Does it always hang on the same perforce command, or does it hang on different ones occasionally? When it hangs, what is the output of p4 monitor show -ale What are the java processes doing cpu/memory wise during a hang? Is it using up more and more memory, or 100% cpu? How about the p4 process? If you run p4 workspace -o hudson-cjo2011.03-webservices-e2e-76240347 from the slave, does it hang as well? If not, how many lines are returned? Are your slave agents up to date? Do you ever experience hangs like this during builds and/or polling normally? And just to clarify, you are trying to delete the job completely correct? Not just wipe out the workspace?

          emmulator added a comment -

          1. Every time I've seen it hang, it was at the same place.
          2-3. Of course, today I can't replicate it, but as soon as I do, I will answer these points. We pull a branch every month, and the Jobs for those monthly branches only need to live for two months before we get rid of them. I created a bunch of new Jobs to delete today, and they all worked. So maybe there's something special about the Jobs for the monthly branches that wasn't replicated by the test Jobs I made today. I will try again next week and let you know as soon as I can replicate the issue again.
          4. I did run the workspace command on the server when we were seeing the hang, and it ouput the workspace description as normal.
          5. Yes, they all have the slave.jar from the same version running on the master.
          6. We've seen 'stuck polling threads' before on the 'Manage Jenkins' page, but neither they nor anything else we've seen blocks all perforce activity in this manner.
          7. Yes, we've only seen this when trying to delete the Job, not when just cleaning the workspace.

          emmulator added a comment - 1. Every time I've seen it hang, it was at the same place. 2-3. Of course, today I can't replicate it, but as soon as I do, I will answer these points. We pull a branch every month, and the Jobs for those monthly branches only need to live for two months before we get rid of them. I created a bunch of new Jobs to delete today, and they all worked. So maybe there's something special about the Jobs for the monthly branches that wasn't replicated by the test Jobs I made today. I will try again next week and let you know as soon as I can replicate the issue again. 4. I did run the workspace command on the server when we were seeing the hang, and it ouput the workspace description as normal. 5. Yes, they all have the slave.jar from the same version running on the master. 6. We've seen 'stuck polling threads' before on the 'Manage Jenkins' page, but neither they nor anything else we've seen blocks all perforce activity in this manner. 7. Yes, we've only seen this when trying to delete the Job, not when just cleaning the workspace.

          Rob Petti added a comment -

          Do all perforce operations hang, or just those from Jenkins?

          Also, do you have any remote depots on your perforce server?

          I'm sort of at a loss as to why this would happen, so I admit that I'm grasping at straws here. It may be time to implement a global timeout/retry/fail on all perforce operations performed from the plugin.

          Rob Petti added a comment - Do all perforce operations hang, or just those from Jenkins? Also, do you have any remote depots on your perforce server? I'm sort of at a loss as to why this would happen, so I admit that I'm grasping at straws here. It may be time to implement a global timeout/retry/fail on all perforce operations performed from the plugin.

          Rob Petti added a comment -

          Is this still occurring for you with the latest p4d updates?

          Rob Petti added a comment - Is this still occurring for you with the latest p4d updates?

          Keith Richardson added a comment - - edited

          Hi,

          I have been facing this problem for some time though. It happens occasionally; I've not been able to reproduce it reliably either.

          Can the perforce plugin implement the global timeout you suggested earlier?

          I am not related to emmulator nor do I know if he still faces this problem.

          FWIW - I have only seen this happen during complete job deletion. Other threads block while trying to acquire a lock already held by the problematic thread (job deletion thread).

          PS. For others which are facing this problem, you can use our workaround which does not require a restart. We interrupt the blocked thread manually using a groovy script. I'll attach it to the defect.

          Tomcat 6.0.29

          Hudson 1.395
          Slave type - SSH Slave 0.14
          Perforce plugin 1.2.2
          P4 Server - 2010.1/251161
          P4 Client - Rev. P4/LINUX26X86/2007.3/143793 (2008/01/09).

          Full stack trace is attached as kr1chard.stack.trace

          Keith Richardson added a comment - - edited Hi, I have been facing this problem for some time though. It happens occasionally; I've not been able to reproduce it reliably either. Can the perforce plugin implement the global timeout you suggested earlier? I am not related to emmulator nor do I know if he still faces this problem. FWIW - I have only seen this happen during complete job deletion. Other threads block while trying to acquire a lock already held by the problematic thread (job deletion thread). PS. For others which are facing this problem, you can use our workaround which does not require a restart. We interrupt the blocked thread manually using a groovy script. I'll attach it to the defect. Tomcat 6.0.29 Hudson 1.395 Slave type - SSH Slave 0.14 Perforce plugin 1.2.2 P4 Server - 2010.1/251161 P4 Client - Rev. P4/LINUX26X86/2007.3/143793 (2008/01/09). Full stack trace is attached as kr1chard.stack.trace

          Rob Petti added a comment -

          Does this block ALL perforce operations, or just the ones from Jenkins? In other words, does the perforce server stop responding, or continue to work normally?

          Rob Petti added a comment - Does this block ALL perforce operations, or just the ones from Jenkins? In other words, does the perforce server stop responding, or continue to work normally?

          Rob Petti added a comment -

          Also, your perforce plugin version is quite out of date. Would you mind trying the latest?

          Rob Petti added a comment - Also, your perforce plugin version is quite out of date. Would you mind trying the latest?

          emmulator added a comment -

          We are currently using Plugin version 1.2.7 on Jenkins 1.417, and we do still see the issue.

          Perforce is only blocked in terms of Jenkins, not for other processes or clients accessing the server. That is, only Jenkins' perforce activity is blocked, but all other uses of perforce continue as normal. I haven't seen any unusual CPU or Memory activity on the server when this happens. And I still haven't had a chance to coordinate with our perforce administrator to get him to run that 'p4 monitor show -ale' command on the server when this is blocked.

          As Keith mentioned, and as I mentioned in the initial report, we are able to work around this by killing the 'delete' thread, and we mostly use the CLI client to delete jobs now, since as I noted, that way the Job is still deleted when we kill the client when it hangs.

          emmulator added a comment - We are currently using Plugin version 1.2.7 on Jenkins 1.417, and we do still see the issue. Perforce is only blocked in terms of Jenkins, not for other processes or clients accessing the server. That is, only Jenkins' perforce activity is blocked, but all other uses of perforce continue as normal. I haven't seen any unusual CPU or Memory activity on the server when this happens. And I still haven't had a chance to coordinate with our perforce administrator to get him to run that 'p4 monitor show -ale' command on the server when this is blocked. As Keith mentioned, and as I mentioned in the initial report, we are able to work around this by killing the 'delete' thread, and we mostly use the CLI client to delete jobs now, since as I noted, that way the Job is still deleted when we kill the client when it hangs.

          Rob Petti added a comment -

          Are the slaves that the job ran on still up and running when you delete the job?

          Basically, when you request a job deletion (regardless of whether you use the CLI or the web GUI), the perforce plugin will sync the workspace to revision #0 (see ENKINS-8118). If for some reason the slave is unstable or in the process of shutting down, it will hang.

          Rob Petti added a comment - Are the slaves that the job ran on still up and running when you delete the job? Basically, when you request a job deletion (regardless of whether you use the CLI or the web GUI), the perforce plugin will sync the workspace to revision #0 (see ENKINS-8118). If for some reason the slave is unstable or in the process of shutting down, it will hang.

          emmulator added a comment -

          We have definitely seen this behavior when all the slaves were up and responding normally.

          emmulator added a comment - We have definitely seen this behavior when all the slaves were up and responding normally.

          Rob Petti added a comment -

          And it only ever hangs when deleting a job?

          I'm trying to get a handle on why this perforce action would be different from any other that takes place in the plugin...

          Rob Petti added a comment - And it only ever hangs when deleting a job? I'm trying to get a handle on why this perforce action would be different from any other that takes place in the plugin...

          emmulator added a comment -

          Yes, we've only ever seen it hang when deleting a Job.

          emmulator added a comment - Yes, we've only ever seen it hang when deleting a Job.

          emmulator added a comment -

          I've received some notifications recently for fixes on <https://issues.jenkins-ci.org/browse/JENKINS-9673> and <https://issues.jenkins-ci.org/browse/JENKINS-9674>, which are duplicates of this issue. All three claim to have been created by me, but I don't recall creating the issue multiple times or having any trouble when I created it. However, if the issue has been fixed, this one should be updated too. I can't actually see what changes were made for the other issues, could someone please take a look at those and update this issue appropriately? Also, it's not clear from those other issues whether the fix affected the core or just the Perforce plugin?

          emmulator added a comment - I've received some notifications recently for fixes on < https://issues.jenkins-ci.org/browse/JENKINS-9673 > and < https://issues.jenkins-ci.org/browse/JENKINS-9674 >, which are duplicates of this issue. All three claim to have been created by me, but I don't recall creating the issue multiple times or having any trouble when I created it. However, if the issue has been fixed, this one should be updated too. I can't actually see what changes were made for the other issues, could someone please take a look at those and update this issue appropriately? Also, it's not clear from those other issues whether the fix affected the core or just the Perforce plugin?

          I think that JIRA has been creating multiple copies of some issues (including this one). The notifications that you received were where those "additional" copies were being closed down as duplicates of this one.

          I am not aware of a fix being made for your issue yet.

          Richard Mortimer added a comment - I think that JIRA has been creating multiple copies of some issues (including this one). The notifications that you received were where those "additional" copies were being closed down as duplicates of this one. I am not aware of a fix being made for your issue yet.

          Rob Petti added a comment -

          Yeah, you can see that the issues were resolved with the "Duplicate" resolution, rather than "Fixed". This issue is still not fixed.

          Rob Petti added a comment - Yeah, you can see that the issues were resolved with the "Duplicate" resolution, rather than "Fixed". This issue is still not fixed.

          emmulator added a comment -

          Aha – you're right. I guess I got confused because they were actually each closed as duplicates of each other, and neither of them mentioned this ticket, so I thought something had been done there but not here. Now I understand, thanks.

          emmulator added a comment - Aha – you're right. I guess I got confused because they were actually each closed as duplicates of each other, and neither of them mentioned this ticket, so I thought something had been done there but not here. Now I understand, thanks.

          Ben Sluis added a comment -

          I also see this issue on the latest version of the plugin (1.3.5) with Jenkins 1.437 on Windows OS for master and slaves.
          It occurs every time I try to delete a job that executed on slave machines.

          Log:
          Dec 12, 2011 12:56:23 PM hudson.plugins.perforce.PerforceSCM processWorkspaceBeforeDeletion
          INFO: [build_debug] $ "C:\Program Files\Perforce\p4.exe" workspace o jenkins_build_debug-1971592563

          Dec 12, 2011 12:56:22 PM hudson.plugins.perforce.PerforceSCM processWorkspaceBeforeDeletion
          INFO: Using remote perforce client: jenkins_build_debug--1971592563

          Dec 12, 2011 12:56:20 PM hudson.plugins.perforce.PerforceSCM processWorkspaceBeforeDeletion
          INFO: Workspace is being deleted; enabling one-time force sync.

          How do I view the stack trace?

          Ben Sluis added a comment - I also see this issue on the latest version of the plugin (1.3.5) with Jenkins 1.437 on Windows OS for master and slaves. It occurs every time I try to delete a job that executed on slave machines. Log: Dec 12, 2011 12:56:23 PM hudson.plugins.perforce.PerforceSCM processWorkspaceBeforeDeletion INFO: [build_debug] $ "C:\Program Files\Perforce\p4.exe" workspace o jenkins_build_debug -1971592563 Dec 12, 2011 12:56:22 PM hudson.plugins.perforce.PerforceSCM processWorkspaceBeforeDeletion INFO: Using remote perforce client: jenkins_build_debug--1971592563 Dec 12, 2011 12:56:20 PM hudson.plugins.perforce.PerforceSCM processWorkspaceBeforeDeletion INFO: Workspace is being deleted; enabling one-time force sync. How do I view the stack trace?

          Rob Petti added a comment -

          http://jenkinsurl/threadDump will show you the stacks for all threads.

          I fixed a potential hang in remote execution calls that might resolve this issue, at least partially. You can find the snapshot here: http://files.robpetti.com/perforce-plugin/target/perforce.hpi or just wait until the next release (which should be within a week).

          Rob Petti added a comment - http://jenkinsurl/threadDump will show you the stacks for all threads. I fixed a potential hang in remote execution calls that might resolve this issue, at least partially. You can find the snapshot here: http://files.robpetti.com/perforce-plugin/target/perforce.hpi or just wait until the next release (which should be within a week).

            Unassigned Unassigned
            emmulator emmulator
            Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: