[JENKINS-8173] SCM polling always fails using Perforce plugin due to "workspace offline" and causes infinite build loop

Type: Bug
Resolution: Fixed
Priority: Critical
Component/s: p4-plugin
Labels:
None

Similar Issues:
Powered by SuggestiMate

Show

1. Set up an "in demand" slave.
2. Set up a project that only builds on that slave
3. The slave will go offline
4. The SCM poll will see that the workspace is offline and trigger a new build
5. The slave comes online, the build completes
6. The slave goes back offline again as in 3.
7. And here the infinite build loop happens since we now send up back at 4

I'm not sure why Hudson is trying to get the content of the workspace? The perforce plugin knows the last change list number used for the last build, so it should also know by polling perforce that the last changlist number is different and so a build should be triggered?

In our environment each slave is actually a virtual machine controlled by scripts, after a build completes its taken offline so checking the workspace will NEVER work. This completely breaks our setup because builds are only set to trigger for SCM changes.

When a slave is brought back online the VM is reverted to its snapshot to ensure the build is "clean", so again this means checking the workspace content will always fail.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

fix_build_master.patch
3 kB
2011-03-08 21:52
fix_build_master.patch
2 kB
2011-03-08 19:54

paulmoran created issue - 2010-11-25 04:02

Rob Petti added a comment - 2010-11-25 13:33

Polling operations are run on the nodes that the job is configured to run on. Polling won't work if there aren't any nodes available for it to run on. I agree that it shouldn't start a build in that case, but it still needs a valid node in order to poll. It's looking for the workspace so it can acquire a lock on it. If there's already a build running on that workspace, it wouldn't make any sense to try and poll using it, since the client workspace spec might be in the process of changing.

Rob Petti added a comment - 2010-11-25 13:33 Polling operations are run on the nodes that the job is configured to run on. Polling won't work if there aren't any nodes available for it to run on. I agree that it shouldn't start a build in that case, but it still needs a valid node in order to poll. It's looking for the workspace so it can acquire a lock on it. If there's already a build running on that workspace, it wouldn't make any sense to try and poll using it, since the client workspace spec might be in the process of changing.

Rob Petti made changes - 2010-11-25 13:33

Assignee

New: Rob Petti [ rpetti ]

paulmoran added a comment - 2010-11-26 02:37

Is it not possible to run them on the master too? In my setup there is only a node online when a project is being built since the slaves are always "in demand".

Also the perforce plugin knows the last change list number used to create the last build.. so its just a case of checking is the current changelist number greater than the one used to do the last build?

I figured it would use the master to poll Perforce and then kick off a node when it sees there are changes (hence the reason for the local perforce path??).

It just seems crazy that other people would have this issue unless they always leave their slaves online, perhaps the poll SCM option should be removed if the slave is set to "in demand".

In order to achieve what I want does this mean I will have to make a new Peforce plugin build trigger that polls on the master?

paulmoran added a comment - 2010-11-26 02:37 Is it not possible to run them on the master too? In my setup there is only a node online when a project is being built since the slaves are always "in demand". Also the perforce plugin knows the last change list number used to create the last build.. so its just a case of checking is the current changelist number greater than the one used to do the last build? I figured it would use the master to poll Perforce and then kick off a node when it sees there are changes (hence the reason for the local perforce path??). It just seems crazy that other people would have this issue unless they always leave their slaves online, perhaps the poll SCM option should be removed if the slave is set to "in demand". In order to achieve what I want does this mean I will have to make a new Peforce plugin build trigger that polls on the master?

paulmoran added a comment - 2010-11-26 03:00

Also FYI:

~~JENKINS-1348~~ - seems to be the same and has a patch
JENKINS-8053 - almost the same problem

paulmoran added a comment - 2010-11-26 03:00 Also FYI: JENKINS-1348 - seems to be the same and has a patch JENKINS-8053 - almost the same problem

Rob Petti added a comment - 2010-11-26 07:31

This is specific to the perforce plugin as far as I know. Those other bugs are for different SCM plugins entirely, so ~~JENKINS-1348~~ is a fix JUST for the subversion plugin. This is a rare problem, since I find a lot of people leave their nodes online 100% of the time. In many cases it's simply more efficient to do that than start/stop them constantly.

The issue here is that the configuration provided in the job config can only be considered valid for the nodes in which the job is configured to run. I don't believe it makes sense to try and assume that the same settings will work on the master. This was why it was changed from polling on the master, to polling on the configured nodes, people were complaining that their settings weren't working.

For example, say I have a linux master, and my job is configured to run on windows slaves. The path to P4 will look like "C:\Program Files\Perforce\p4.exe" which obviously won't work on the master, so if we try to use the master to poll, it will fail miserably.

~~JENKINS-5120~~ and JENKINS-2947 will make this a moot point when I finally get around to implementing them, since then the user can ensure a valid configuration for every node, regardless of their network topology (the same perforce server hostname might be valid on on node, but invalid for another, for example) or the location of the p4 binary on disk.

At the moment your best shot would be to create an upstream build that's configured to run on the master, which then polls for you. When the plugin code finishes getting migrated to git, I'll throw in a case that will at least TRY to use those settings on the master when no node is available. Not the best solution, but it will have to do for now until we have time to refactor the plugin configuration.

Rob Petti added a comment - 2010-11-26 07:31 This is specific to the perforce plugin as far as I know. Those other bugs are for different SCM plugins entirely, so JENKINS-1348 is a fix JUST for the subversion plugin. This is a rare problem, since I find a lot of people leave their nodes online 100% of the time. In many cases it's simply more efficient to do that than start/stop them constantly. The issue here is that the configuration provided in the job config can only be considered valid for the nodes in which the job is configured to run. I don't believe it makes sense to try and assume that the same settings will work on the master. This was why it was changed from polling on the master, to polling on the configured nodes, people were complaining that their settings weren't working. For example, say I have a linux master, and my job is configured to run on windows slaves. The path to P4 will look like "C:\Program Files\Perforce\p4.exe" which obviously won't work on the master, so if we try to use the master to poll, it will fail miserably. JENKINS-5120 and JENKINS-2947 will make this a moot point when I finally get around to implementing them, since then the user can ensure a valid configuration for every node, regardless of their network topology (the same perforce server hostname might be valid on on node, but invalid for another, for example) or the location of the p4 binary on disk. At the moment your best shot would be to create an upstream build that's configured to run on the master, which then polls for you. When the plugin code finishes getting migrated to git, I'll throw in a case that will at least TRY to use those settings on the master when no node is available. Not the best solution, but it will have to do for now until we have time to refactor the plugin configuration.

paulmoran added a comment - 2010-12-01 06:17

Point taken about the nodes always being online, but this can never be in the case in my set up, in my environment they are taken offline because they are virtual machines (there are more VM's than the hardware can run at the same time), its the only way to ensure a build is "clean" using a totally fresh machine/slave (due to self polluting build environments).

I have a plugin that runs a script after the build is finished which takes the node offline, and then shutdowns the VM. When the next build needs the slave Hudson runs another script which reverts the snapshot, powers it on, downloads slave.jar using psexec and executes slave.jar using psexec to bring the node back online.

Also regarding the path the p4 binary, should this not always just be "p4" since it should be in the path on all OS'es? This happens on Windows by default but manually on Linux since it has no installer (have to add to path or place in /usr/bin/p4).

I don't quite understand how this workaround is supposed to work either? Create a job on the master node? Because this will poll perforce locally on the master without needing a workspace? A do nothing build that just triggers the build that runs on the slave?

paulmoran added a comment - 2010-12-01 06:17 Point taken about the nodes always being online, but this can never be in the case in my set up, in my environment they are taken offline because they are virtual machines (there are more VM's than the hardware can run at the same time), its the only way to ensure a build is "clean" using a totally fresh machine/slave (due to self polluting build environments). I have a plugin that runs a script after the build is finished which takes the node offline, and then shutdowns the VM. When the next build needs the slave Hudson runs another script which reverts the snapshot, powers it on, downloads slave.jar using psexec and executes slave.jar using psexec to bring the node back online. Also regarding the path the p4 binary, should this not always just be "p4" since it should be in the path on all OS'es? This happens on Windows by default but manually on Linux since it has no installer (have to add to path or place in /usr/bin/p4). I don't quite understand how this workaround is supposed to work either? Create a job on the master node? Because this will poll perforce locally on the master without needing a workspace? A do nothing build that just triggers the build that runs on the slave?

Rob Petti added a comment - 2010-12-01 07:42

I agree that the path to p4 should always be "p4" and that people should add it to the path, but not all our users feel the same way. This is why the path to p4 option exists. In the future it will be refactored into a global config option that can be overridden in the node configuration. I should note that I only used this as the most common example of what could go wrong when trying to poll on a node that the job wasn't configured to use.

The workaround I proposed is pretty much what you've described, only that the polling job still uses a workspace.

Create a job tied to the master node that has no build steps, and simply checks out the code into a workspace on the master. Set it up to poll the SCM. Since the master is always available, the plugin can poll using that workspace without any problem. Then just set your actual job as a downstream job to your new "polling" job, and it will get triggered whenever there are changes.

Rob Petti added a comment - 2010-12-01 07:42 I agree that the path to p4 should always be "p4" and that people should add it to the path, but not all our users feel the same way. This is why the path to p4 option exists. In the future it will be refactored into a global config option that can be overridden in the node configuration. I should note that I only used this as the most common example of what could go wrong when trying to poll on a node that the job wasn't configured to use. The workaround I proposed is pretty much what you've described, only that the polling job still uses a workspace. Create a job tied to the master node that has no build steps, and simply checks out the code into a workspace on the master. Set it up to poll the SCM. Since the master is always available, the plugin can poll using that workspace without any problem. Then just set your actual job as a downstream job to your new "polling" job, and it will get triggered whenever there are changes.

paulmoran added a comment - 2010-12-01 07:50

Thanks I'm happy there is a workaround for this issue

paulmoran added a comment - 2010-12-01 07:50 Thanks I'm happy there is a workaround for this issue

robsimon added a comment - 2010-12-01 07:57

This workaround works only if the master isn't used for other builds as it is in our case. I think I'd rather wait for a real fix and live with the too many builds instead of using this approach via the Master Node.

robsimon added a comment - 2010-12-01 07:57 This workaround works only if the master isn't used for other builds as it is in our case. I think I'd rather wait for a real fix and live with the too many builds instead of using this approach via the Master Node.

Assignee:: Rob Petti

Reporter:: paulmoran

Votes:: 2 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2010-11-25 04:02

Updated:: 2019-12-11 06:02

Resolved:: 2011-06-14 22:19

Jenkins

Details

Description

Attachments

Attachments

Activity

Collapse comment: Rob Petti added a comment - 2010-11-25 13:33

Expand comment: Rob Petti added a comment - 2010-11-25 13:33

Collapse comment: paulmoran added a comment - 2010-11-26 02:37

Expand comment: paulmoran added a comment - 2010-11-26 02:37

Collapse comment: paulmoran added a comment - 2010-11-26 03:00

Expand comment: paulmoran added a comment - 2010-11-26 03:00

Collapse comment: Rob Petti added a comment - 2010-11-26 07:31

Expand comment: Rob Petti added a comment - 2010-11-26 07:31

Collapse comment: paulmoran added a comment - 2010-12-01 06:17

Expand comment: paulmoran added a comment - 2010-12-01 06:17

Collapse comment: Rob Petti added a comment - 2010-12-01 07:42

Expand comment: Rob Petti added a comment - 2010-12-01 07:42

Collapse comment: paulmoran added a comment - 2010-12-01 07:50

Expand comment: paulmoran added a comment - 2010-12-01 07:50

Collapse comment: robsimon added a comment - 2010-12-01 07:57

Expand comment: robsimon added a comment - 2010-12-01 07:57

People

Dates