-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
Platform: All, OS: Linux
-
Powered by SuggestiMate
We don't see these with p4win, which is why we suspect it is something with the
hudson plugin.
Performing sync with Perforce for: //xxx/...
[workspace] $ p4 workspace -o xxx_inspections Changing P4 Client Root to:
http://xxx.com:8082/job/xxx/ws/
Changing P4 Client View to: //xxx/... //xxx/...
[workspace] $ p4 -s client -i
Last sync'd change: 101540
[workspace] $ p4 changes -m 25 //xxx/...
[workspace] $ p4 describe -s 101661
[workspace] $ p4 describe -s 101656
Caught Exception communicating with perforce. Failed to communicate with
p4FATAL: Unable to communicate with perforce. Failed to communicate with p4
java.io.IOException: Unable to communicate with perforce. Failed to communicate
with p4
at hudson.plugins.perforce.PerforceSCM.checkout(PerforceSCM.java:340)
at hudson.model.AbstractProject.checkout(AbstractProject.java:574)
at hudson.model.AbstractBuild$AbstractRunner.checkout(AbstractBuild.java:251)
at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:225)
at hudson.model.Run.run(Run.java:771)
at hudson.model.Build.run(Build.java:85)
at hudson.model.ResourceController.execute(ResourceController.java:70)
at hudson.model.Executor.run(Executor.java:82)
[JENKINS-2062] Occasional Perforce Connection Errors
this happens to us frequently, on different machines. We have been running 1.254, though I just updated
to 1.255 hudson. Our platform is freebsd 6.3 with diablo:
java -version
java version "1.5.0"
Java(TM) 2 Runtime Environment, Standard Edition (build diablo-1.5.0-b01)
Java HotSpot(TM) Client VM (build diablo-1.5.0_07-b01, mixed mode)
For those of you that (like me) are suffering due to this problem and are
running Hudson on one machine only, here's a nasty hack that can get rid of the
error (tested it on RHE4 32 and 64 bits with Java 6):
--hack--
$ svn diff
Index:
hudson/plugins/perforce/src/main/java/hudson/plugins/perforce/PerforceSCM.java
===================================================================
—
hudson/plugins/perforce/src/main/java/hudson/plugins/perforce/PerforceSCM.java
(revision 12689)
+++
hudson/plugins/perforce/src/main/java/hudson/plugins/perforce/PerforceSCM.java
(working copy)
@@ -124,9 +124,9 @@
*/
protected Depot getDepot(Launcher launcher, FilePath workspace) {
- HudsonP4ExecutorFactory p4Factory = new
HudsonP4ExecutorFactory(launcher,workspace);
+ //HudsonP4ExecutorFactory p4Factory = new
HudsonP4ExecutorFactory(launcher,workspace);
- depot = new Depot(p4Factory);
+ depot = new Depot();
depot.setUser(p4User);
depot.setPassword(p4Passwd);
depot.setPort(p4Port);
--hack--
This is not an official patch, it's just something that fixes the problem (I
don't advertise that it fixes it when running slaves as I haven't tested it yet).
The reasoning behind the hack is: I don't know why you have to use Hudson's
Launcher to execute Perforce commands. By using the Launcher, the
processes/threads that I see spawned is confusing (not saying that's right or
wrong - just that I couldn't understand it straight away) and I figured that
using Perforce API's (tek42) default executor factory would suffice. And it
works [on a master Hudson]!
Some feedback from the author would be appreciated as I'm still working with the
issue (I want a cleaner solution).
Although not the definite cause, if you're using P4 login tickets and the same
user logs into the Perforce server instance, they can invalidate your login
ticket on the server. This whacks your session and gives this kind of behavior.
We use a P4 user setup as a service account with a LONG time-out for just this
reason, as well as avoiding multiple uses for the service account.
I have been experiencing this same problem. I have modified the perforce plugin
source (PerforceSCM.java) as described in the comments without success.
Adding my self as cc.
I have problems on java.vm.version 1.6.0_03-b05 and tomcat 5 and 6
We have several branches running under multi configuration project. Every time
couple of builds started at the same time then we are running into this issue.
Master is running at Fedora Core 4 and JDK6.u11
Caught Exception communicating with perforce. Failed to communicate with
p4FATAL: Unable to communicate with perforce. Failed to communicate with p4
java.io.IOException: Unable to communicate with perforce. Failed to communicate
with p4
at hudson.plugins.perforce.PerforceSCM.checkout(PerforceSCM.java:422)
at hudson.model.AbstractProject.checkout(AbstractProject.java:693)
at hudson.model.AbstractBuild$AbstractRunner.checkout(AbstractBuild.java:277)
at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:239)
at hudson.model.Run.run(Run.java:842)
at hudson.matrix.MatrixBuild.run(MatrixBuild.java:102)
at hudson.model.ResourceController.execute(ResourceController.java:70)
at hudson.model.Executor.run(Executor.java:90)
In our environment the build always fails at the same changelist (on a per
project basis). We have to sync manually using "p4" in order to continue the build.
May be it would a good idea (workaround) to implemnt a kind of "fallback
mechanism" which calls command line "p4"?
pompl,
I have never reproduced this problem. What platform are your master and slaves
running on? How many projects are you running concurrently. Can you give us
anything else that tells us more about your environment?
-Tim
Hi, thepner,
We have tried several environments:
- one master, no slaves
- one master, one slave
- one master, three slaves
We have switched the operating system of master and slave, additionally:
- Windows XP
- Linux Ubuntu 6
– Red Hat 4
And we also tried to deploy the master to Tomcat.
So, we assume the environment isn't the exceptional part.
The error is reproduceable (in 99%) and shows always the same log!
Here is the complete console log (parts are replaced, since my company doesn't
like sharing internal informations )
—————begin—————
A SCM change trigger started this job
Baue auf Slave
Performing sync with Perforce for: //MyProject/main/...
[workspace] $ p4 workspace -o MyWorkspace
Changing P4 Client Root to: c:\hudson\jobs\MyProject.ENTW\workspace\
Changing P4 Client View to: //MyProject/main/... //MyWorkspace/...
[workspace] $ p4 -s client -i
Last sync'd change: 0
[workspace] $ p4 changes -m 25 //MyProject/main/...
[workspace] $ p4 changes -m 25 //MyProject/main/...@120166
[workspace] $ p4 changes -m 25 //MyProject/main/...@118437
[workspace] $ p4 describe -s 134199
[workspace] $ p4 describe -s 130185
[workspace] $ p4 describe -s 128370
[workspace] $ p4 describe -s 128366
[workspace] $ p4 describe -s 128341
[workspace] $ p4 describe -s 127603
[workspace] $ p4 describe -s 127322
[workspace] $ p4 describe -s 127254
[workspace] $ p4 describe -s 127189
[workspace] $ p4 describe -s 127097
[workspace] $ p4 describe -s 127095
[workspace] $ p4 describe -s 127092
[workspace] $ p4 describe -s 127019
[workspace] $ p4 describe -s 127011
[workspace] $ p4 describe -s 123312
[workspace] $ p4 describe -s 123245
[workspace] $ p4 describe -s 122831
[workspace] $ p4 describe -s 122820
[workspace] $ p4 describe -s 122816
[workspace] $ p4 describe -s 122791
[workspace] $ p4 describe -s 120552
Caught Exception communicating with perforce. Failed to retrieve
changelist.FATAL: Unable to communicate with perforce. Failed to retrieve
changelist.
java.io.IOException: Unable to communicate with perforce. Failed to retrieve
changelist.
at hudson.plugins.perforce.PerforceSCM.checkout(PerforceSCM.java:340)
at hudson.model.AbstractProject.checkout(AbstractProject.java:807)
at ...
—————end—————
We have eliminated all special characters (like umlauts) from the changelists.
So, these aren't the problem, also.
By the way, I have examined PerforceSCM.java and have found that the original
exception is caught and a new one is thrown. That is okay, but the causing
exception isn't chained.
Example:
catch(Exception e) {
throw new Exception("Error");
}
instead of
catch(Exception e) {
throw new Exception("Error", e);
}
I have tried to fix it for the plugin--in order to get the root cause--, but I
am not able to compile the plugin, since some dependency of maven is missing.
May be, the full stack trace may led to a solution of this really blocking issue.
This is the corresponding perforce server log of the last mentioned changelist.
—————begin—————
Perforce server info:
2009/04/09 00:32:15 pid 26737 pompl@CI_MyProject 10.189.4.248
[p4/2008.2/NTX86/189013] 'user-describe -s 120552'
— db.rev
— total lock wait+held read/write 0ms+362ms/0ms+0ms
— db.revcx
— total lock wait+held read/write 0ms+362ms/0ms+0ms
— db.working
— total lock wait+held read/write 0ms+362ms/0ms+0ms
— db.change
— total lock wait+held read/write 0ms+362ms/0ms+0ms
— db.job
— total lock wait+held read/write 0ms+362ms/0ms+0ms
— db.fixrev
— total lock wait+held read/write 0ms+362ms/0ms+0ms
— db.boddate
— total lock wait+held read/write 0ms+103ms/0ms+0ms
— db.bodtext
— total lock wait+held read/write 0ms+103ms/0ms+0ms
— max lock wait+held read/write 0ms+103ms/0ms+0ms
—————end—————
I added additional tracing to my build of the p4 plugin trunk and the resulting
stack trace indicates the error is in the com.tek42 package.
com.tek42.perforce.PerforceException: Failed to communicate with p4
at
com.tek42.perforce.parse.AbstractPerforceTemplate.getPerforceResponse(AbstractPerforceTemplate.java:218)
at com.tek42.perforce.parse.Workspaces.syncTo(Workspaces.java:92)
at hudson.plugins.perforce.PerforceSCM.checkout(PerforceSCM.java:351)
at hudson.model.AbstractProject.checkout(AbstractProject.java:807)
at hudson.model.AbstractBuild$AbstractRunner.checkout(AbstractBuild.java:314)
at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:266)
at hudson.model.Run.run(Run.java:923)
at hudson.maven.MavenModuleSetBuild.run(MavenModuleSetBuild.java:234)
at hudson.model.ResourceController.execute(ResourceController.java:93)
at hudson.model.Executor.run(Executor.java:119)
Caused by: java.io.IOException: Write end dead
at java.io.PipedInputStream.read(PipedInputStream.java:294)
at java.io.PipedInputStream.read(PipedInputStream.java:361)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:264)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:306)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:158)
at java.io.InputStreamReader.read(InputStreamReader.java:167)
at java.io.BufferedReader.fill(BufferedReader.java:136)
at java.io.BufferedReader.readLine(BufferedReader.java:299)
at java.io.BufferedReader.readLine(BufferedReader.java:362)
at
com.tek42.perforce.parse.AbstractPerforceTemplate.getPerforceResponse(AbstractPerforceTemplate.java:188)
... 9 more
FATAL: Unable to communicate with perforce. Failed to communicate with p4
java.io.IOException: Unable to communicate with perforce. Failed to communicate
with p4
at hudson.plugins.perforce.PerforceSCM.checkout(PerforceSCM.java:406)
at hudson.model.AbstractProject.checkout(AbstractProject.java:807)
at hudson.model.AbstractBuild$AbstractRunner.checkout(AbstractBuild.java:314)
at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:266)
at hudson.model.Run.run(Run.java:923)
at hudson.maven.MavenModuleSetBuild.run(MavenModuleSetBuild.java:234)
at hudson.model.ResourceController.execute(ResourceController.java:93)
at hudson.model.Executor.run(Executor.java:119)
jmax01: This is why I removed the ExecutorFactory that Hudson provides to the
Perforce library and got rid of the error (tested in a local server only, no
remote nodes).
We have found out a strange effect:
We are using perforce jobs via bugzilla integration, so that every changelist
must contain a job.
The command "p4 describe -s NNN" returns some weird (non-printable) characters.
Theses changelists lead to the well known errorenous behaviour.
And now it gets really strange:
If we remove and re-attach the job, the "describe" command shows no
non-printable characters any more---the build continues without failure.
Seems to be a problem with perforce jobs?!?
pompl: if you get non printable characters when you type p4 describe -s
[changelist] in the CLI, then your problem might not be related to Hudson and
I'd suggest you contact Perforce support.
If, on the other hand, the non-printable characters appear (or not!) in Hudson,
it could be a problem with the way Hudson escapes characters, which I've fixed
in house and haven't submitted a proper patch upstream as yet.
I tracked this down to the perforce process ending abruptly in the tek42
libraries on which the plugin is built. I submitted a patch to the developer of
that library and have been running a locally fixed copy for about 2 weeks. I
use to see this problem at least a couple of times a day (we have 100+ builds
running out of hudson) since my change I have yet to see a failure.
Once revision 0.7.7 of tek42 is released we should be able to build an updated
perforce plugin.
stimms: Impressive, thanks! Would you mind sharing the patch here as tek42
hasn't got any issue tracker? Thanks!
Stimms-
I would really appreciate it if you could please share your patch with us, as it
doesn't appear a new version of tek42 library is available yet.
-JC
stimms is not cc'd on this bug. Try emailing him directly - stimms@dev.java.net
Where can I find tek24 libraries?
We are waiting very badly for a fix to this problem.
FYI – Had some email correspondence with Simon who had posted he had a patch. He has indicated
he has actually checked the whole tek42 code base into the hudson SVN server.
Below is the most recent email I sent to him tonight:
Did you check in the code into the repository before you made your patches so the changes from the
original could be kept track of?
From what I can tell from
http://fisheye4.atlassian.com/changelog/hudson/trunk/hudson/plugins/perforce/src/main/java
I don't see any commits that sound like they were specifically changes related to the issue
https://hudson.dev.java.net/issues/show_bug.cgi?id=2062
If not, how about wiping it out, re-adding it, and then applying your patches, so its easy to identify the
necessary change.
Thank you in advance for your help,
JC
On Jun 11, 2009, at 12:55 PM, Simon Timms wrote:
I just corrected my password gaff I made earlier today. You should be
able to checkout the code and build. That being said I still have some
concerns about what happens when the plugin runs into a workspace which
hasn't been set in the view field. That is the next thing on my list to
check.
On Thu, 2009-06-11 at 11:28 -0600, Simon Timms wrote:
huh, funny this is the second e-mail I've had about this today. I
checked all of tek42 into hudson's svn including my patches. You could
grab the latest and build it but I would wait as I have just checked in
a change which completely breaks passwords for perforce.
On Thu, 2009-06-11 at 11:23 -0600, Jon Christiansen wrote:
Stimms-
I would really appreciate it if you could please share your patch
(attach it to the Hudson issue), as it
doesn't appear a new version of tek42 library is available yet.
It would really benefit many people if you could do this.
Thank you in advance for your help,
Jon
After doing a diff between tek42 source and whats been committed to the Hudson SVN, attached is what I
think the diffs are that Simon made to fix the issue.
Hello,
I've downloaded the latest code changes, but I'm running into problems while
trying to rebuild the source. Any idea when this fix will be included in the
next version of the plugin?
I am blocked by this issue. I'm trying to setup a Hudson environment running on
Windows Virtual Machines, one master with 3 slaves all located in the same lab
(same network). This issue prevents Hudson from connecting with Perforce. I'm
getting the same exception that was called out in one of the earlier threads.
Building remotely on bis-autotest-06.xxx
Using remote perforce client: hudson-bis-autotest-06
Caught Exception communicating with perforce. No output for: p4 workspace -o
hudson-bis-autotest-06 com.tek42.perforce.PerforceException: No output for: p4
workspace -o hudson-bis-autotest-06
at
com.tek42.perforce.parse.AbstractPerforceTemplate.getPerforceResponse(AbstractPe
rforceTemplate.java:326)
at com.tek42.perforce.parse.Workspaces.getWorkspace(Workspaces.java:53)
at
hudson.plugins.perforce.PerforceSCM.getPerforceWorkspace(PerforceSCM.java:699)
at hudson.plugins.perforce.PerforceSCM.checkout(PerforceSCM.java:295)
at hudson.model.AbstractProject.checkout(AbstractProject.java:833)
at
hudson.model.AbstractBuild$AbstractRunner.checkout(AbstractBuild.java:314)
at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:266)
at hudson.model.Run.run(Run.java:949)
at hudson.model.Build.run(Build.java:112)
at hudson.model.ResourceController.execute(ResourceController.java:93)
at hudson.model.Executor.run(Executor.java:116)
Any suggestions would be appreciated!
Thanks,
Charles
Just a general update for this one:
A couple developers are working on a revamp of the perforce plugin. We're moving away from tek42's buggy perforce api (which is what's causing a lot of the headaches seen here) to Perforce's official Java API. We've already seen a marked improvement in performance and reliability, but we're just waiting for perforce to release 2009.2 so we can fix some outstanding issues. Assuming the api is released by the end of the year, I'm hoping to have it merged into trunk sometime in Q1 2010.
That being said, are there people monitoring this issue that are still having regular connection problems with the latest release? If so, I'd like to hear your experiences.
Hi, I'm experiencing a variation on this error on a Windows 2000 slave. The slave node is configured to run as a Windows Service under a Domain account which is a member of the Administrators group on the system. System is a VM with .NET Framework v2.0 and P4V Perforce Visual Client/NTX86/2009.1/205670. Error is as follows:
[dev-envision-4.0-baseline] $ "C:\Program Files\Perforce\p4.exe" workspace -o hudson-dev-envision-4.0-baseline-LAGERWIN2K6
Caught exception communicating with perforce. Connect to server failed; check $P4PORTcom.tek42.perforce.PerforceException: Connect to server failed; check $P4PORT
at com.tek42.perforce.parse.AbstractPerforceTemplate.getPerforceResponse(AbstractPerforceTemplate.java:334)
at com.tek42.perforce.parse.Workspaces.getWorkspace(Workspaces.java:53)
at hudson.plugins.perforce.PerforceSCM.getPerforceWorkspace(PerforceSCM.java:723)
at hudson.plugins.perforce.PerforceSCM.checkout(PerforceSCM.java:330)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1014)
at hudson.model.AbstractBuild$AbstractRunner.checkout(AbstractBuild.java:479)
at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:411)
at hudson.model.Run.run(Run.java:1198)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:122)
Finished: FAILURE
We are running Hudson Version 1.345 with Perforce plugin Version 1.0.21. I can access our Perforce server via the command line when manually executing the same command shown above the stack trace from a Windows shell prompt. I have tried setting %P4PORT%, %P4USER%, and %P4PASSWD% as a system environment variables on the system, and have also added those environment variables to the Hudson job configuration to no avail. Oddly enough, if I specify "No SCM" in the job configuration and specify a Windows BATCH command of "ECHO P4PORT=%P4PORT", the job's console output responds with the proper %P4PORT%. Completely stumped.
The perforce plugin doesn't use the $P4PORT defined on the system, that's just what the p4 client complains about when it can't connect. The values it uses are what is defined in the perforce configuration in the job config, so if those aren't correct, then it won't be able to connect.
Thanks for the follow up. I think what's happening in my case is that SystemRoot is somehow getting reset to C:\WINDOWS when the P4 plugin runs, which causes Hostname resolution to fail because I'm on Windows 2000 and SystemRoot is C:\WINNT. To test this, I added a p4.exe Debugger registry key string to intercept p4 calls on my system and run a script that dumps the environment, and sure enough, SystemRoot=C:\Windows in that output. However, I'm struggling to figure out where to set SystemRoot in Hudson to override this behavior - I've tried setting SystemRoot as a node environment variable and as a job-specific environment variable.
It's behind the "Advanced" button in the per-job perforce plugin configuration.
Just to add our experiences... we get this very frequently, sometimes almost immediately upon starting Hudson. The stack trace is similar to others:
1476 [SCM polling for hudson.model.FreeStyleProject@153e5454[Insight.Java.Head.CbtechMonitoringWebServer]] WARN perforce - Perforce process terminated suddenly
1476 [SCM polling for hudson.model.FreeStyleProject@153e5454[Insight.Java.Head.CbtechMonitoringWebServer]] WARN perforce - java.io.IOException: Write end dead
at java.io.PipedInputStream.read(PipedInputStream.java:294)
at java.io.PipedInputStream.read(PipedInputStream.java:361)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:264)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:306)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:158)
at java.io.InputStreamReader.read(InputStreamReader.java:167)
at java.io.BufferedReader.fill(BufferedReader.java:136)
at java.io.BufferedReader.readLine(BufferedReader.java:299)
at java.io.BufferedReader.readLine(BufferedReader.java:362)
at com.tek42.perforce.parse.AbstractPerforceTemplate.getPerforceResponse(AbstractPerforceTemplate.java:292)
at com.tek42.perforce.parse.Changes.getChangeNumbers(Changes.java:137)
at hudson.plugins.perforce.PerforceSCM.needToBuild(PerforceSCM.java:635)
at hudson.plugins.perforce.PerforceSCM.pollChanges(PerforceSCM.java:542)
at hudson.model.AbstractProject.pollSCMChanges(AbstractProject.java:1067)
at hudson.triggers.SCMTrigger$Runner.runPolling(SCMTrigger.java:317)
at hudson.triggers.SCMTrigger$Runner.run(SCMTrigger.java:344)
at hudson.util.SequentialExecutionQueue$QueueEntry.run(SequentialExecutionQueue.java:118)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Also, it appears as that each time this happens a pipe is left open in the java process until the limit of open files is reached, at which point the whole server hangs (no response to HTTP requests). Some lsof output:
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
<snip>
java 24381 cbrun 665r FIFO 0,6 264379805 pipe
java 24381 cbrun 679r FIFO 0,6 264379825 pipe
java 24381 cbrun 684r FIFO 0,6 264379833 pipe
java 24381 cbrun 687r FIFO 0,6 264379870 pipe
java 24381 cbrun 691r FIFO 0,6 264379839 pipe
java 24381 cbrun 694r FIFO 0,6 264379857 pipe
<snip>
Has there been any progress on using the official Perforce Java API?
Thanks,
Simon
The official perforce java api has some severe limitations on certain platform/jre combinations, so I've stopped developing it myself. I think there are still one or two people working on it, but I'm not sure at this point.
The branch for the version of the perforce plugin that uses it is here if you are still interested: https://hudson.dev.java.net/svn/hudson/branches/perforce-p4java
What versions of hudson and the perforce plugin are you using? I used to get leaked pipes a lot, but I haven't had any since upgrading to Perforce Plugin 1.0.23 and Hudson 1.347.
This appears to be fixed on my end. Between the more aggressive closing of pipes, and the connection retries, I haven't seen this is ages.
Sorry for the lack of update. We've been keeping up to date with new Hudson and plugin versions and are still seeing this issue constantly.
Are your perforce server and client binaries also up to date? What operating system are you running?
We're running 2009.2 server (although we have upgraded recently from 2008.1 with no change) and 2008.1 client binaries (which I've just noticed so I'll update to 2009.2 and test those). Hudson is running on a CentOS 5.2 Linux server and Perforce is on a Solaris 5.10 x86 server.
Sorry, I take it back about keeping up to date with Hudson versions, we're currently on 1.350. I'll upgrade to 1.369 and try that. We usually see this issues within a few hours so I'll get back to you.
My system currently still exhibits this as well. Its done this over various versions of server and client versions of the Perforce software.
I don't have control over my whole division's version of perforce server being used, but they keep pretty recent.
Here's the current versions of everything:
hudson.war -> 1.369
Perforce Plugin for Hudson -> 1.1.4
Perforce Server version: P4D/LINUX26X86_64/2009.2/232252 (2010/01/27)
Perforce Client version: Rev. P4/LINUX26X86_64/2009.2/232252 (2010/01/27).
uname -s -r -v -m -p -i -o
Linux 2.6.18-128.1.10.el5.xs5.5.0.51xen #1 SMP Wed Nov 11 08:31:24 EST 2009 x86_64 x86_64 x86_64 GNU/Linux
Here's whats spitting out constantly to my log file:
15123777 [SCM polling for hudson.maven.MavenModuleSet@27223041[preprod]] WARN perforce - Perforce process terminated suddenly
15123777 [SCM polling for hudson.maven.MavenModuleSet@27223041[preprod]] WARN perforce - java.io.IOException: Write end dead
at java.io.PipedInputStream.read(PipedInputStream.java:294)
at java.io.PipedInputStream.read(PipedInputStream.java:361)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:264)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:306)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:158)
at java.io.InputStreamReader.read(InputStreamReader.java:167)
at java.io.BufferedReader.fill(BufferedReader.java:136)
at java.io.BufferedReader.readLine(BufferedReader.java:299)
at java.io.BufferedReader.readLine(BufferedReader.java:362)
at com.tek42.perforce.parse.AbstractPerforceTemplate.getPerforceResponse(AbstractPerforceTemplate.java:297)
at com.tek42.perforce.parse.Changes.getChangeNumbers(Changes.java:137)
at hudson.plugins.perforce.PerforceSCM.needToBuild(PerforceSCM.java:818)
at hudson.plugins.perforce.PerforceSCM.pollChanges(PerforceSCM.java:714)
at hudson.scm.SCM.poll(SCM.java:370)
at hudson.model.AbstractProject.poll(AbstractProject.java:1151)
at hudson.triggers.SCMTrigger$Runner.runPolling(SCMTrigger.java:330)
at hudson.triggers.SCMTrigger$Runner.run(SCMTrigger.java:359)
at hudson.util.SequentialExecutionQueue$QueueEntry.run(SequentialExecutionQueue.java:118)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
What java version are you using? Are you running hudson on top of Tomcat (if so, which version), or are you using the built-in winstone appserver? Are the perforce binaries installed locally on the machine, or are they being served over a network mount? Is the perforce client being run directly, or through some kind of wrapper script?
How often does this occur? Weekly? Daily? Hourly? Constantly? Does it happen on all your slaves, a subset of them (if so, what are their operating systems and java versions?), or does it only happen on the master?
If I have the time, I'm going to see if I can set up an environment exactly as you describe yours to see if I can reproduce the issue...
$ java -version
java version "1.6.0_10"
Java(TM) SE Runtime Environment (build 1.6.0_10-b33)
Java HotSpot(TM) 64-Bit Server VM (build 11.0-b15, mixed mode)
The java binaries are on a local disk.
We're starting hudson with a Hudson war file:
java -Djava.io.tmpdir=/path/to/network/mounted/tmpdir -DHUDSON_HOME=/path/to/network/mounted/hudsondir -jar /path/to/network/mounted/hudson.war
So, yes we are using the built-in winstone appserver.
The perforce binaries are on a network mount and are being run directly.
We're only seeing this on the master as far as I can tell (we do have slaves running on both Solaris 5.10 x86 and other CentOS 5.2 servers).
This can within a few hours or up to a day or so of starting Hudson, but once in this state no polling works. I believe manually kicking off a build still works though.
We have something in the order of 70-80 jobs. We have another instance of Hudson running with about 10 jobs and I don't think this issue has occurred there (although I'm not sure it's all running the same set of versions and environment, so not sure how useful this is).
I think I'm ok to give you stack traces/heap dumps if that's any use to you (I need to check it's ok for me to release that level of data though first).
Try copying the perforce binaries to the local disk as opposed to running them over a network mount. I'm not sure how java and the operating system will interact if java tries to run a file that's temporarily unavailable, but it's possible that it might open a pipe only to kill it when the NFS client times out which would result in exactly the problem you are seeing. It may also hang indefinitely, which could be the cause of polling failing to run.
You might also want to use a local tmpdir for java as well.
Using local p4 binaries here and built-in winstone.
[hudson@hudson01 ~]$ cat hudson
nohup $JAVA_HOME/bin/java -Xms1024m -Xmx1024m -XX:OnOutOfMemoryError="kill -9 %p" -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Dorg.apache.commons.jelly.tags.fmt.timeZone=America/Chicago -jar hudson.war > hudson.log 2>&1 &
[hudson@hudson01 ~]$ which p4
/usr/local/bin/p4
[hudson@hudson01 ~]$ $JAVA_HOME/bin/java -version
java version "1.6.0_18"
Java(TM) SE Runtime Environment (build 1.6.0_18-b07)
Java HotSpot(TM) 64-Bit Server VM (build 16.0-b13, mixed mode)
I gave you the uname output, but forgot to specifically call out that this is Centos 5.2 as well.
My hudson server's been up 6 hours since I last restarted it and this error has occurred 3134 times so far.
grep " WARN perforce - Perforce process terminated suddenly" hudson.log | wc -l
3134
Are there any adverse effects when this happens? It should be noted that this is a Warning, nothing more...
It was noted earlier that pipes were getting leaked, but the plugin have become much more aggressive with cleaning those up. Is this still happening?
I'm not seeing any adverse effects other than all of these stack traces filling up my log file (48MB in two days)
I am seeing this issue on a recent Hudson installation. I see the warnings in the log files and using lsof I can see the number of open pipes increasing, in time with the warnings. Eventually we reach a point where we get a 'too many open files' error and the server has to be restarted.
The warning is always due to the getChangeNumbers call.
Hudson version 1.379
Perforce plugin version 1.1.9
Perforce server version P4D/SOLARIS10X86_64/2010.1/251161 (2010/06/16)
Hudson running on Linux : uname -a
Linux sj10lo01 2.6.9-78.0.25.ELlargesmp #1 SMP Fri Jun 26 07:56:47 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
Perforce server running on Solaris: uname -a
SunOS sj10ut01 5.10 Generic_137138-09 i86pc i386 i86pc
4561451 [SCM polling for hudson.model.FreeStyleProject@4bbf8a41[xxxx-block]] WARN perforce - Perforce process terminated suddenly
4561452 [SCM polling for hudson.model.FreeStyleProject@4bbf8a41[xxxx-block]] WARN perforce - java.io.IOException: Write end dead
at java.io.PipedInputStream.read(PipedInputStream.java:294)
at java.io.PipedInputStream.read(PipedInputStream.java:361)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:264)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:306)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:158)
at java.io.InputStreamReader.read(InputStreamReader.java:167)
at java.io.BufferedReader.fill(BufferedReader.java:136)
at java.io.BufferedReader.readLine(BufferedReader.java:299)
at java.io.BufferedReader.readLine(BufferedReader.java:362)
at com.tek42.perforce.parse.AbstractPerforceTemplate.getPerforceResponse(AbstractPerforceTemplate.java:297)
at com.tek42.perforce.parse.Changes.getChangeNumbers(Changes.java:137)
at hudson.plugins.perforce.PerforceSCM.needToBuild(PerforceSCM.java:865)
at hudson.plugins.perforce.PerforceSCM.pollChanges(PerforceSCM.java:761)
at hudson.scm.SCM.poll(SCM.java:372)
at hudson.model.AbstractProject.poll(AbstractProject.java:1195)
at hudson.triggers.SCMTrigger$Runner.runPolling(SCMTrigger.java:417)
at hudson.triggers.SCMTrigger$Runner.run(SCMTrigger.java:446)
at hudson.util.SequentialExecutionQueue$QueueEntry.run(SequentialExecutionQueue.java:118)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
at java.lang.Thread.run(Thread.java:619)
@gordonmcgregor: are your jobs configured to use slaves, and if so, what operating system are they running?
EDIT: also what operating system is your hudson server running on? It's difficult to tell with just a uname dump.
@rpetti
>Are your jobs configured to use slaves, and if so, what operating system are they running?
The server is set up as a single Master. All the jobs run on linux (see below)
>Also what operating system is your hudson server running on? It's difficult to tell with just a uname dump.
The hudson server (and sole build master) is a RedHat Linux system
/etc/redhat-release : Red Hat Enterprise Linux WS release 4 (Nahant Update 7)
/proc/version : Linux version 2.6.9-78.0.25.ELlargesmp (mockbuild@ls20-bc1-14.build.redhat.com) (gcc version 3.4.6 20060404 (Red Hat 3.4.6-10)) #1 SMP Fri Jun 26 07:56:47 EDT 2009
Because of this issue, we have to manually restart Hudson daily. Otherwise it eventually gets to the point shown here.
[105] sj10lo01 ../hudson > ps -aux | grep hudson
verify 22719 16.7 0.6 1506004 768616 ? SNl Oct05 220:08 /usr/bin/java -jar /home/verify/hudson/hudson.war
[107] sj10lo01 ../hudson > lsof -p 22719
[... truncated ... ]
exe 22719 verify 1011r FIFO 0,7 64686985 pipe
exe 22719 verify 1012r FIFO 0,7 64673938 pipe
exe 22719 verify 1013r FIFO 0,7 64766263 pipe
exe 22719 verify 1014r FIFO 0,7 64762154 pipe
exe 22719 verify 1015r FIFO 0,7 64687011 pipe
exe 22719 verify 1016r FIFO 0,7 64712594 pipe
exe 22719 verify 1017r FIFO 0,7 64758276 pipe
exe 22719 verify 1019r FIFO 0,7 64687040 pipe
exe 22719 verify 1020r FIFO 0,7 64766310 pipe
[... truncated ... ]
[107] sj10lo01 ../hudson > lsof -p 22719 | grep pipe | wc
805 6440 60375
With those 805 pipes open (after about a day of uptime, single machine/master, 11 jobs, all polling once a minute)
the hudson.log then starts to fill with this error, for each job.
com.tek42.perforce.PerforceException: Could not run perforce command.
at hudson.plugins.perforce.HudsonP4Executor.exec(HudsonP4Executor.java:83)
at com.tek42.perforce.parse.AbstractPerforceTemplate.getPerforceResponse(AbstractPerforceTemplate.java:289)
at com.tek42.perforce.parse.Workspaces.getWorkspace(Workspaces.java:53)
at hudson.plugins.perforce.PerforceSCM.getPerforceWorkspace(PerforceSCM.java:973)
at hudson.plugins.perforce.PerforceSCM.pollChanges(PerforceSCM.java:755)
at hudson.scm.SCM.poll(SCM.java:372)
at hudson.model.AbstractProject.poll(AbstractProject.java:1195)
at hudson.triggers.SCMTrigger$Runner.runPolling(SCMTrigger.java:417)
at hudson.triggers.SCMTrigger$Runner.run(SCMTrigger.java:446)
at hudson.util.SequentialExecutionQueue$QueueEntry.run(SequentialExecutionQueue.java:118)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.IOException: Cannot run program "/proj/merlot/bin/p4" (in directory "/proj/merlot/work/verify/hudson-ci-data/jobs/RVL-block/workspace"): java.io.IOException: error=24, Too many open files
at java.lang.ProcessBuilder.start(ProcessBuilder.java:459)
at hudson.Proc$LocalProc.<init>(Proc.java:192)
at hudson.Proc$LocalProc.<init>(Proc.java:164)
at hudson.Launcher$LocalLauncher.launch(Launcher.java:638)
at hudson.Launcher$ProcStarter.start(Launcher.java:273)
at hudson.plugins.perforce.HudsonP4Executor.exec(HudsonP4Executor.java:74)
... 15 more
Caused by: java.io.IOException: java.io.IOException: error=24, Too many open files
at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
at java.lang.ProcessImpl.start(ProcessImpl.java:65)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:452)
... 20 more
If you need any further information to resolve this issue, please let me know.
For now I've scripted Hudson to reboot every day.
If I miss a reboot for a day, the server locks up with the 'too many files open' error messages
Seeing a similar issue on OS X 10.6.5
Hudson: 1.392
Perforce Plugin: 1.1.13
Java: 1.6.0_22 (running 64bit)
Let me know if there is additional information you'd like to have.
jmcintyre:
I'll need a full description of your problem including logs and version information for all systems and binaries (p4d, p4, perforce server operating system). If it's a connectivity issue, a high level description of where the servers are physically located in relation to one another would also be beneficial.
This particular ticket has been hijacked many times by many people each with a different problem, so when you say you're having a similar issue, I actually have no idea which one you are referring to.
I have a machine that reproduces this many times per day. Fortunately the
exception rarely causes any serious effects. The only serious effects are that
sometimes jobs hang during Perforce steps and Hudson must be restarted to
resolve this situation.
Unless someone recommends otherwise, I will work on reproducing this and
gathering more information.