[JENKINS-8173] SCM polling always fails using Perforce plugin due to "workspace offline" and causes infinite build loop

Type: Bug
Resolution: Fixed
Priority: Critical
Component/s: p4-plugin
Labels:
None

Similar Issues:
Powered by SuggestiMate

Show

1. Set up an "in demand" slave.
2. Set up a project that only builds on that slave
3. The slave will go offline
4. The SCM poll will see that the workspace is offline and trigger a new build
5. The slave comes online, the build completes
6. The slave goes back offline again as in 3.
7. And here the infinite build loop happens since we now send up back at 4

I'm not sure why Hudson is trying to get the content of the workspace? The perforce plugin knows the last change list number used for the last build, so it should also know by polling perforce that the last changlist number is different and so a build should be triggered?

In our environment each slave is actually a virtual machine controlled by scripts, after a build completes its taken offline so checking the workspace will NEVER work. This completely breaks our setup because builds are only set to trigger for SCM changes.

When a slave is brought back online the VM is reverted to its snapshot to ensure the build is "clean", so again this means checking the workspace content will always fail.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

fix_build_master.patch
3 kB
2011-03-08 21:52
fix_build_master.patch
2 kB
2011-03-08 19:54

Rob Petti added a comment - 2010-11-25 13:33

Polling operations are run on the nodes that the job is configured to run on. Polling won't work if there aren't any nodes available for it to run on. I agree that it shouldn't start a build in that case, but it still needs a valid node in order to poll. It's looking for the workspace so it can acquire a lock on it. If there's already a build running on that workspace, it wouldn't make any sense to try and poll using it, since the client workspace spec might be in the process of changing.

Rob Petti added a comment - 2010-11-25 13:33 Polling operations are run on the nodes that the job is configured to run on. Polling won't work if there aren't any nodes available for it to run on. I agree that it shouldn't start a build in that case, but it still needs a valid node in order to poll. It's looking for the workspace so it can acquire a lock on it. If there's already a build running on that workspace, it wouldn't make any sense to try and poll using it, since the client workspace spec might be in the process of changing.

paulmoran added a comment - 2010-11-26 02:37

Is it not possible to run them on the master too? In my setup there is only a node online when a project is being built since the slaves are always "in demand".

Also the perforce plugin knows the last change list number used to create the last build.. so its just a case of checking is the current changelist number greater than the one used to do the last build?

I figured it would use the master to poll Perforce and then kick off a node when it sees there are changes (hence the reason for the local perforce path??).

It just seems crazy that other people would have this issue unless they always leave their slaves online, perhaps the poll SCM option should be removed if the slave is set to "in demand".

In order to achieve what I want does this mean I will have to make a new Peforce plugin build trigger that polls on the master?

paulmoran added a comment - 2010-11-26 02:37 Is it not possible to run them on the master too? In my setup there is only a node online when a project is being built since the slaves are always "in demand". Also the perforce plugin knows the last change list number used to create the last build.. so its just a case of checking is the current changelist number greater than the one used to do the last build? I figured it would use the master to poll Perforce and then kick off a node when it sees there are changes (hence the reason for the local perforce path??). It just seems crazy that other people would have this issue unless they always leave their slaves online, perhaps the poll SCM option should be removed if the slave is set to "in demand". In order to achieve what I want does this mean I will have to make a new Peforce plugin build trigger that polls on the master?

paulmoran added a comment - 2010-11-26 03:00

Also FYI:

~~JENKINS-1348~~ - seems to be the same and has a patch
JENKINS-8053 - almost the same problem

paulmoran added a comment - 2010-11-26 03:00 Also FYI: JENKINS-1348 - seems to be the same and has a patch JENKINS-8053 - almost the same problem

Rob Petti added a comment - 2010-11-26 07:31

This is specific to the perforce plugin as far as I know. Those other bugs are for different SCM plugins entirely, so ~~JENKINS-1348~~ is a fix JUST for the subversion plugin. This is a rare problem, since I find a lot of people leave their nodes online 100% of the time. In many cases it's simply more efficient to do that than start/stop them constantly.

The issue here is that the configuration provided in the job config can only be considered valid for the nodes in which the job is configured to run. I don't believe it makes sense to try and assume that the same settings will work on the master. This was why it was changed from polling on the master, to polling on the configured nodes, people were complaining that their settings weren't working.

For example, say I have a linux master, and my job is configured to run on windows slaves. The path to P4 will look like "C:\Program Files\Perforce\p4.exe" which obviously won't work on the master, so if we try to use the master to poll, it will fail miserably.

~~JENKINS-5120~~ and JENKINS-2947 will make this a moot point when I finally get around to implementing them, since then the user can ensure a valid configuration for every node, regardless of their network topology (the same perforce server hostname might be valid on on node, but invalid for another, for example) or the location of the p4 binary on disk.

At the moment your best shot would be to create an upstream build that's configured to run on the master, which then polls for you. When the plugin code finishes getting migrated to git, I'll throw in a case that will at least TRY to use those settings on the master when no node is available. Not the best solution, but it will have to do for now until we have time to refactor the plugin configuration.

Rob Petti added a comment - 2010-11-26 07:31 This is specific to the perforce plugin as far as I know. Those other bugs are for different SCM plugins entirely, so JENKINS-1348 is a fix JUST for the subversion plugin. This is a rare problem, since I find a lot of people leave their nodes online 100% of the time. In many cases it's simply more efficient to do that than start/stop them constantly. The issue here is that the configuration provided in the job config can only be considered valid for the nodes in which the job is configured to run. I don't believe it makes sense to try and assume that the same settings will work on the master. This was why it was changed from polling on the master, to polling on the configured nodes, people were complaining that their settings weren't working. For example, say I have a linux master, and my job is configured to run on windows slaves. The path to P4 will look like "C:\Program Files\Perforce\p4.exe" which obviously won't work on the master, so if we try to use the master to poll, it will fail miserably. JENKINS-5120 and JENKINS-2947 will make this a moot point when I finally get around to implementing them, since then the user can ensure a valid configuration for every node, regardless of their network topology (the same perforce server hostname might be valid on on node, but invalid for another, for example) or the location of the p4 binary on disk. At the moment your best shot would be to create an upstream build that's configured to run on the master, which then polls for you. When the plugin code finishes getting migrated to git, I'll throw in a case that will at least TRY to use those settings on the master when no node is available. Not the best solution, but it will have to do for now until we have time to refactor the plugin configuration.

paulmoran added a comment - 2010-12-01 06:17

Point taken about the nodes always being online, but this can never be in the case in my set up, in my environment they are taken offline because they are virtual machines (there are more VM's than the hardware can run at the same time), its the only way to ensure a build is "clean" using a totally fresh machine/slave (due to self polluting build environments).

I have a plugin that runs a script after the build is finished which takes the node offline, and then shutdowns the VM. When the next build needs the slave Hudson runs another script which reverts the snapshot, powers it on, downloads slave.jar using psexec and executes slave.jar using psexec to bring the node back online.

Also regarding the path the p4 binary, should this not always just be "p4" since it should be in the path on all OS'es? This happens on Windows by default but manually on Linux since it has no installer (have to add to path or place in /usr/bin/p4).

I don't quite understand how this workaround is supposed to work either? Create a job on the master node? Because this will poll perforce locally on the master without needing a workspace? A do nothing build that just triggers the build that runs on the slave?

paulmoran added a comment - 2010-12-01 06:17 Point taken about the nodes always being online, but this can never be in the case in my set up, in my environment they are taken offline because they are virtual machines (there are more VM's than the hardware can run at the same time), its the only way to ensure a build is "clean" using a totally fresh machine/slave (due to self polluting build environments). I have a plugin that runs a script after the build is finished which takes the node offline, and then shutdowns the VM. When the next build needs the slave Hudson runs another script which reverts the snapshot, powers it on, downloads slave.jar using psexec and executes slave.jar using psexec to bring the node back online. Also regarding the path the p4 binary, should this not always just be "p4" since it should be in the path on all OS'es? This happens on Windows by default but manually on Linux since it has no installer (have to add to path or place in /usr/bin/p4). I don't quite understand how this workaround is supposed to work either? Create a job on the master node? Because this will poll perforce locally on the master without needing a workspace? A do nothing build that just triggers the build that runs on the slave?

Rob Petti added a comment - 2010-12-01 07:42

I agree that the path to p4 should always be "p4" and that people should add it to the path, but not all our users feel the same way. This is why the path to p4 option exists. In the future it will be refactored into a global config option that can be overridden in the node configuration. I should note that I only used this as the most common example of what could go wrong when trying to poll on a node that the job wasn't configured to use.

The workaround I proposed is pretty much what you've described, only that the polling job still uses a workspace.

Create a job tied to the master node that has no build steps, and simply checks out the code into a workspace on the master. Set it up to poll the SCM. Since the master is always available, the plugin can poll using that workspace without any problem. Then just set your actual job as a downstream job to your new "polling" job, and it will get triggered whenever there are changes.

Rob Petti added a comment - 2010-12-01 07:42 I agree that the path to p4 should always be "p4" and that people should add it to the path, but not all our users feel the same way. This is why the path to p4 option exists. In the future it will be refactored into a global config option that can be overridden in the node configuration. I should note that I only used this as the most common example of what could go wrong when trying to poll on a node that the job wasn't configured to use. The workaround I proposed is pretty much what you've described, only that the polling job still uses a workspace. Create a job tied to the master node that has no build steps, and simply checks out the code into a workspace on the master. Set it up to poll the SCM. Since the master is always available, the plugin can poll using that workspace without any problem. Then just set your actual job as a downstream job to your new "polling" job, and it will get triggered whenever there are changes.

paulmoran added a comment - 2010-12-01 07:50

Thanks I'm happy there is a workaround for this issue

paulmoran added a comment - 2010-12-01 07:50 Thanks I'm happy there is a workaround for this issue

robsimon added a comment - 2010-12-01 07:57

This workaround works only if the master isn't used for other builds as it is in our case. I think I'd rather wait for a real fix and live with the too many builds instead of using this approach via the Master Node.

robsimon added a comment - 2010-12-01 07:57 This workaround works only if the master isn't used for other builds as it is in our case. I think I'd rather wait for a real fix and live with the too many builds instead of using this approach via the Master Node.

paulmoran added a comment - 2010-12-01 08:05

What does that mean? Your master slave is set to "in demand"?

paulmoran added a comment - 2010-12-01 08:05 What does that mean? Your master slave is set to "in demand"?

Paul M added a comment - 2011-02-27 23:14

Hi could a solution to this issue to be add a global option for the Perforce plugin for the "requires workspace" option for polling?

AFAIK the job only needs to track the last synced change list and compare it with the current which means no workspace is needed.

However I'm not sure how plugins deal with the case where a user syncs an older changelist/label to build as this would cause another build right after to build the latest version. Perhaps the polling logic should be:

1. Get the current perforce revision number
2. Check none of the previous builds have used this number to handle the user manually building an older revision
3. If no build has used this revision number then we can assume we need to do a build.

I think that would handle the case of the user syncing rev 2, rev 1, rev 3, rev 1, and then the newest change being 7. It would know that 7 hasn't been used and the last change that was synced is 1, thus it needs to start a build.

Paul M added a comment - 2011-02-27 23:14 Hi could a solution to this issue to be add a global option for the Perforce plugin for the "requires workspace" option for polling? AFAIK the job only needs to track the last synced change list and compare it with the current which means no workspace is needed. However I'm not sure how plugins deal with the case where a user syncs an older changelist/label to build as this would cause another build right after to build the latest version. Perhaps the polling logic should be: 1. Get the current perforce revision number 2. Check none of the previous builds have used this number to handle the user manually building an older revision 3. If no build has used this revision number then we can assume we need to do a build. I think that would handle the case of the user syncing rev 2, rev 1, rev 3, rev 1, and then the newest change being 7. It would know that 7 hasn't been used and the last change that was synced is 1, thus it needs to start a build.

Rob Petti added a comment - 2011-02-28 06:20 - edited

It's not that simple. The plugin tells hudson/jenkins that a workspace is required so that the polling will actually take place on the slave that the job is configured for. Each job is configured for the specific environment it runs in, so it's unreasonable to assume that the same configuration will work on the master (see my comments above...)

Until the plugin supports node-specific perforce configurations, this issue cannot be fixed.

Rob Petti added a comment - 2011-02-28 06:20 - edited It's not that simple. The plugin tells hudson/jenkins that a workspace is required so that the polling will actually take place on the slave that the job is configured for. Each job is configured for the specific environment it runs in, so it's unreasonable to assume that the same configuration will work on the master (see my comments above...) Until the plugin supports node-specific perforce configurations, this issue cannot be fixed.

Paul M added a comment - 2011-02-28 09:04

Is there another issue raised on node specific configuration options? Is there a framework in Jenkins to implement this functionality or will it require a change to the core?

Also even if you configure the perforce binary locations in each slave that will not fix the issue because if the slave is offline then it still can't do the polling. There would still need to be an option that says either "Poll only on Master" or "Don't require workspace for polling".

Paul M added a comment - 2011-02-28 09:04 Is there another issue raised on node specific configuration options? Is there a framework in Jenkins to implement this functionality or will it require a change to the core? Also even if you configure the perforce binary locations in each slave that will not fix the issue because if the slave is offline then it still can't do the polling. There would still need to be an option that says either "Poll only on Master" or "Don't require workspace for polling".

Rob Petti added a comment - 2011-02-28 15:09

Yes there is an issue open for it: ~~JENKINS-5120~~. And yes, there is a framework available.

Once it has been implemented, it's a very simple change to allow polling to be performed on the slave or the master. No additional option is necessary.

Rob Petti added a comment - 2011-02-28 15:09 Yes there is an issue open for it: JENKINS-5120 . And yes, there is a framework available. Once it has been implemented, it's a very simple change to allow polling to be performed on the slave or the master. No additional option is necessary.

martin franklin added a comment - 2011-03-08 18:04

This worked upto version 1.1.9 which had my fix in it for this very issue - http://issues.jenkins-ci.org/browse/JENKINS-6575

Changes made to the plugin after this, in 1.1.10 cause a null pointer issue. Been using 1.1.9 uptil now and hadnt noticed.

I'll take a look and see if I can submit a 'patch' which will get this working again.

martin franklin added a comment - 2011-03-08 18:04 This worked upto version 1.1.9 which had my fix in it for this very issue - http://issues.jenkins-ci.org/browse/JENKINS-6575 Changes made to the plugin after this, in 1.1.10 cause a null pointer issue. Been using 1.1.9 uptil now and hadnt noticed. I'll take a look and see if I can submit a 'patch' which will get this working again.

Rob Petti added a comment - 2011-03-08 18:22

As mentioned, simply changing the output of requiresWorkspaceForPolling will not fix this in all cases. It is trivial to change it, but it will have far reaching consequences, such as polling simply not working at all in most configurations. This is because polling would then be executed on the master with the slave's perforce options, which would only match in the ideal case.

Martin, can you please file a new issue for the NPE (and be sure to include the log dump and perforce plugin version). You might also want to try the latest version, since you are really out of date.

Rob Petti added a comment - 2011-03-08 18:22 As mentioned, simply changing the output of requiresWorkspaceForPolling will not fix this in all cases. It is trivial to change it, but it will have far reaching consequences, such as polling simply not working at all in most configurations. This is because polling would then be executed on the master with the slave's perforce options, which would only match in the ideal case. Martin, can you please file a new issue for the NPE (and be sure to include the log dump and perforce plugin version). You might also want to try the latest version, since you are really out of date.

martin franklin added a comment - 2011-03-08 19:45

Dont think it's worth it to issue a NPE given that the code I was using was so out of date.

martin franklin added a comment - 2011-03-08 19:45 Dont think it's worth it to issue a NPE given that the code I was using was so out of date.

martin franklin added a comment - 2011-03-08 19:54

Here's the patch I've created which 'gets this working' in a limited sense.

If requiresWorkspaceForPolling returns false then you can perform a poll on the master without a node or a workspace. This is what previously worked for us, we just set the slave client format to ${basename}.

Then the AbstractProject class calls scm.pollChanges(this, null, null, listener);

I've changed the code so that if no buildnode is found, instead of just returning false, pollchanges further also checks to see if requiresWorkspaceForPolling returns false. If so it just proceeds to use the master as previously in 1.1.0.

I havent completely tested this yet. But my preliminary tests seem to show that it works.

I believe that my NPE's are related to AbstractProject invoking pollChanges with launcher and workspace set to null. If I still see these occuring I'll file a NPE issue and try to create another patch.

Martin

martin franklin added a comment - 2011-03-08 19:54 Here's the patch I've created which 'gets this working' in a limited sense. If requiresWorkspaceForPolling returns false then you can perform a poll on the master without a node or a workspace. This is what previously worked for us, we just set the slave client format to ${basename}. Then the AbstractProject class calls scm.pollChanges(this, null, null, listener); I've changed the code so that if no buildnode is found, instead of just returning false, pollchanges further also checks to see if requiresWorkspaceForPolling returns false. If so it just proceeds to use the master as previously in 1.1.0. I havent completely tested this yet. But my preliminary tests seem to show that it works. I believe that my NPE's are related to AbstractProject invoking pollChanges with launcher and workspace set to null. If I still see these occuring I'll file a NPE issue and try to create another patch. Martin

martin franklin added a comment - 2011-03-08 21:52

if requiresWorkspaceForPolling returns false - never use a build node, so set it to null. This will force p4 to use the master without needed a workspace synced to the file system. Otherwise proceed as before.
If use this with multiple slaves, dynamic or not, you must use the force sync option.

martin franklin added a comment - 2011-03-08 21:52 if requiresWorkspaceForPolling returns false - never use a build node, so set it to null. This will force p4 to use the master without needed a workspace synced to the file system. Otherwise proceed as before. If use this with multiple slaves, dynamic or not, you must use the force sync option.

Rob Petti added a comment - 2011-03-08 21:58

Setting buildNode to null doesn't actually force it to run on any particular node. It will still use the launcher provided by Jenkins, which could be running on anything.

Rob Petti added a comment - 2011-03-08 21:58 Setting buildNode to null doesn't actually force it to run on any particular node. It will still use the launcher provided by Jenkins, which could be running on anything.

Rob Petti added a comment - 2011-06-14 22:19

This issue is moot now that there is an option to poll only on the master. See ~~JENKINS-9067~~.

Rob Petti added a comment - 2011-06-14 22:19 This issue is moot now that there is an option to poll only on the master. See JENKINS-9067 .

Assignee:: Rob Petti

Reporter:: paulmoran

Votes:: 2 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2010-11-25 04:02

Updated:: 2019-12-11 06:02

Resolved:: 2011-06-14 22:19

Jenkins

Details

Description

Attachments

Attachments

Activity

Collapse comment: Rob Petti added a comment - 2010-11-25 13:33

Expand comment: Rob Petti added a comment - 2010-11-25 13:33

Collapse comment: paulmoran added a comment - 2010-11-26 02:37

Expand comment: paulmoran added a comment - 2010-11-26 02:37

Collapse comment: paulmoran added a comment - 2010-11-26 03:00

Expand comment: paulmoran added a comment - 2010-11-26 03:00

Collapse comment: Rob Petti added a comment - 2010-11-26 07:31

Expand comment: Rob Petti added a comment - 2010-11-26 07:31

Collapse comment: paulmoran added a comment - 2010-12-01 06:17

Expand comment: paulmoran added a comment - 2010-12-01 06:17

Collapse comment: Rob Petti added a comment - 2010-12-01 07:42

Expand comment: Rob Petti added a comment - 2010-12-01 07:42

Collapse comment: paulmoran added a comment - 2010-12-01 07:50

Expand comment: paulmoran added a comment - 2010-12-01 07:50

Collapse comment: robsimon added a comment - 2010-12-01 07:57

Expand comment: robsimon added a comment - 2010-12-01 07:57

Collapse comment: paulmoran added a comment - 2010-12-01 08:05

Expand comment: paulmoran added a comment - 2010-12-01 08:05

Collapse comment: Paul M added a comment - 2011-02-27 23:14

Expand comment: Paul M added a comment - 2011-02-27 23:14

Collapse comment: Rob Petti added a comment - 2011-02-28 06:20, Edited by Rob Petti - 2011-02-28 06:31

Expand comment: Rob Petti added a comment - 2011-02-28 06:20, Edited by Rob Petti - 2011-02-28 06:31

Collapse comment: Paul M added a comment - 2011-02-28 09:04

Expand comment: Paul M added a comment - 2011-02-28 09:04

Collapse comment: Rob Petti added a comment - 2011-02-28 15:09

Expand comment: Rob Petti added a comment - 2011-02-28 15:09

Collapse comment: martin franklin added a comment - 2011-03-08 18:04

Expand comment: martin franklin added a comment - 2011-03-08 18:04

Collapse comment: Rob Petti added a comment - 2011-03-08 18:22

Expand comment: Rob Petti added a comment - 2011-03-08 18:22

Collapse comment: martin franklin added a comment - 2011-03-08 19:45

Expand comment: martin franklin added a comment - 2011-03-08 19:45

Collapse comment: martin franklin added a comment - 2011-03-08 19:54

Expand comment: martin franklin added a comment - 2011-03-08 19:54

Collapse comment: martin franklin added a comment - 2011-03-08 21:52

Expand comment: martin franklin added a comment - 2011-03-08 21:52

Collapse comment: Rob Petti added a comment - 2011-03-08 21:58

Expand comment: Rob Petti added a comment - 2011-03-08 21:58

Collapse comment: Rob Petti added a comment - 2011-06-14 22:19

Expand comment: Rob Petti added a comment - 2011-06-14 22:19

People

Dates