-
Bug
-
Resolution: Fixed
-
Major
-
None
-
Windows XP, Windows 7 using MSBuild or devenv.exe to build MS Visual Studio Projects
-
Powered by SuggestiMate
I run into errors when using a customized build system which uses Visual Studio's devenv.exe under the hood to compile VisualStudio 2005 projects (with VC++ compiler). When starting two parallel builds with Jenkins (on different code base) the second job will always fail with "Fatal error C1090: PDB API call failed, error code '23' : '(" in exactly the same second the first job finishes processing. Running both jobs outside Jenkins does not produce the error.
This has also been reported for builds executed by MSBuild on the Jenkins user mailing list [1].
I analysed this issue thoroughly and can track the problem down to the usage of mspdbsrv.exe. This program is automatically spawned when building a VisualStudio project. All Visual Studio instances normally share one common pdb-server which shutdown itself after a idle period (standard is 10 minutes). "It ensures access to .pdb files is properly serialized in parallel builds when multiple instances of the compiler try to access the same .pdb file" [2].
I assume that Jenkins does a clean up of its build environment when a automatically started job finishes (like as described at http://wiki.jenkins-ci.org/display/JENKINS/Aborting+a+build). I checked mspbsrv.exe with ProcessExplorer and the process indeed has a variable JENKINS_COOKIE/HUDSON_COOKIE set in its environment if started through Jenkins. Killing mspdbsrv.exe while projects are still connected will break compilation.
Jenkins mustn't kill mspdbsrv.exe to be able to build more than one Visual Studio project at the same time.
–
[1] http://jenkins.361315.n4.nabble.com/MSBuild-fatal-errors-when-build-triggered-by-timer-td385181.html
[2] http://social.msdn.microsoft.com/Forums/en-US/vcgeneral/thread/b1d1bceb-06b6-47ef-a0ea-23ea752e0c4f/
- envinject-config.png
- 13 kB
- screenshot.JPG
- 29 kB
- is duplicated by
-
JENKINS-24753 MSBuild fails with error "fatal error C1090: PDB API call failed, error code '23'"
-
- Resolved
-
- is related to
-
JENKINS-19156 Jenkins does not invoke ProcessKillers for Windows recursively
-
- Open
-
-
JENKINS-3105 Configuration UI to disable process tree killer selectively
-
- Resolved
-
[JENKINS-9104] Visual studio builds started by Jenkins fail with "Fatal error C1090" because mspdbsrv.exe gets killed
How does the /Z7 flag affect performance? My impression is that the point of mspdbsrv.exe is to keep the data around for other builds to use, thus decreasing build times for subsequent builds.
It does not affect performance but size of object file. with this option the debug information is stored in each object file instead of one pdb. At linktime, the debug information is written in a PDB file.
Just wanted to note that this also occurs on my slave nodes and each slave node only has one executor. So at first glance, since I'm not running concurrent builds on any individual slave node, it seems like this error occurring on my slave nodes doesn't make any sense.
Code changed in jenkins
User: Daniel Weber
Path:
pom.xml
src/main/java/hudson/plugins/msbuild/MsBuildKillingVeto.java
src/test/java/hudson/plugins/msbuild/MsBuildKillingVetoTest.java
http://jenkins-ci.org/commit/msbuild-plugin/855a84479b64f32ceb30f73433858dfe2efb5e9f
Log:
[FIXED JENKINS-9104] Veto killing mspdbsrv.exe
Making use of the newly introduced ProcessKillingVeto extension point,
we now make sure that mspdbsrv.exe survives process killing during build
cleanup.
This requires a Jenkins version >= 1.625, the new extension point was
added there. I marked the extension as optional, so that the msbuild
plugin should still work with older Jenkins releases.
Code changed in jenkins
User: Gregory Boissinot
Path:
pom.xml
src/main/java/hudson/plugins/msbuild/MsBuildKillingVeto.java
src/test/java/hudson/plugins/msbuild/MsBuildKillingVetoTest.java
http://jenkins-ci.org/commit/msbuild-plugin/48084be76d434195c9e8b2ddc66f1fb5255a78de
Log:
Merge pull request #19 from DanielWeber/master
[FIXED JENKINS-9104] Veto killing mspdbsrv.exe
Compare: https://github.com/jenkinsci/msbuild-plugin/compare/98f71956d897...48084be76d43
Code changed in jenkins
User: Gregory Boissinot
Path:
pom.xml
src/main/java/hudson/plugins/msbuild/MsBuildKillingVeto.java
src/test/java/hudson/plugins/msbuild/MsBuildKillingVetoTest.java
http://jenkins-ci.org/commit/msbuild-plugin/b9a5b02117e0ee097aaf030ab2574daa3dcd217d
Log:
Revert "[FIXED JENKINS-9104] Veto killing mspdbsrv.exe"
Code changed in jenkins
User: Gregory Boissinot
Path:
pom.xml
src/main/java/hudson/plugins/msbuild/MsBuildKillingVeto.java
src/test/java/hudson/plugins/msbuild/MsBuildKillingVetoTest.java
http://jenkins-ci.org/commit/msbuild-plugin/031a05982b16e42cba5544c4ba9511515941c62f
Log:
Merge pull request #20 from jenkinsci/revert-19-master
Revert "[FIXED JENKINS-9104] Veto killing mspdbsrv.exe"
Compare: https://github.com/jenkinsci/msbuild-plugin/compare/48084be76d43...031a05982b16
> Revert "[FIXED JENKINS-9104] Veto killing mspdbsrv.exe"
I'm confused why has the code fix been reverted?
The reason I am looking at this again is that the BUILD_ID work around is no longer working for me.
Neither is the 1.25 msbuild plugin which is meant to have the fix in.
I upgraded from 1.595 to 1.645.
damiandixon: My changes have been reverted by accident, the msbuild plugin release 1.25 does not contain the change required to fix this issue.
There is a new PR reverting the revert: https://github.com/jenkinsci/msbuild-plugin/pull/21
This is still not resolved. We need an update of the msbuild-plugin, see PR https://github.com/jenkinsci/msbuild-plugin/pull/21
danielweber This issue is filed against the core component, and that change has been included a long time ago.
Is there a plan for Visual Studio builds not started by the msbuild-plugin, please?
I'm asking because our job configurations use a "Execute Windows batch command" build step rather than "Build a Visual Studio project or solution using MSBuild" build step (and our batch process is non-trivial).
akb The proposed MSBuild Plugin change only requires the plugin to be installed to be effective (assuming mspdbsrv.exe is what you don't want killed).
That's great - thank you very much for clarifying this, and for your efforts to fix the wider issue - I'm looking forward to having more projects and configurations built automatically in a timely fashion through judicious use of parallelization
akb Forwarding the praise to my (first)namesake danielweber who did all the work
danielbeck: Well, the core stuff is done. But from a user's perspective the issue still exists.
How can I get someone to merge the pending PR and create a release of the msbuild plugin?
What's happened to this fix? It sounds like its ready to go. How can we get a new release of the plugin?
I tried parallel builds with MSBuild plugin 1.25 on top of Jenkins 1.580.1 but unfortunately I still get this error (fatal error C1090: PDB API call failed, error code '23'). Did I miss something ?
When do you publish new version of plugin with fix? It's been month since you released version with(out) fix...
I'm in need of a fix for this too, it's consistently failing numerous jobs for me. Is there an old version of Jenkins to revert to that avoids this particular problem? I'm willing to go that route as a workaround.
So far this has been a cause of a pretty bad first impressions for a team I setup a CI build setup for who had never seen Jenkins before.
I'm using VS2010 devenv.exe to build the solution files.
Hello Jaime,
I found a solution.
I think it is a workaround, but it works for me.
I set for every project the addition String parameter.
Go to the Jenkins Project and set "This build is parameterized", “Name” – “BUILD_ID”, “Default Value” – “DoNotKillMe”.
Stumbled upon this issue immediately after trying parallel builds. Been open for 5 years now, so I guess you can simply check for 'mspdbsrv.exe' and leave it alone? Please free us of our pain.
Somebody, publish the new version please. Apparently, the fix is already in the source code on GitHub. Can someone else (other than the maintainer) release the new version?
FWIW, we implemented a workaround to this issue that doesn't involve wiping out the BUILD_ID variable (as we need to use it). Having a release with the Veto would be better, but this avoids random crashes in the meantime.
Instead of allowing the MSBuild process to start the daemon itself, you cause the daemon to start using an environment that you choose. MSBuild then just uses the instance you started rather than starting its own.
The Powershell we use is as follows. Use the Powershell plugin to run this as a step before the MSBuild plugin step (could be translated to Windows batch too if you like).
# https://wiki.jenkins-ci.org/display/JENKINS/ProcessTreeKiller var originalBuildID = $Env:BUILD_ID $Env:BUILD_ID = "DoNotKillMe" try { start mspdbsrv -argumentlist '-start','-spawn' -NoNewWindow } catch {} $Env:BUILD_ID = originalBuildID
msbuild-1.26 should contain the fix. Can we finally resolve this, or is something missing?
*sigh*
1.26 is tagged in GitHub but no artifacts are uploaded. Looks like a failed release. Sorry about that.
Note that MSBuild Plugin is almost certainly not currently maintained, as Gregory stopped working on his plugins, so if someone here wants to take over (danielweber perhaps?) that should be possible.
As a workaround I have created a Jenkins Job that executes a Windows batch command on the jenkins node where Visual Studio is installed.
The jenkins job triggers the batch command once a day and works in my environment for several years now.
The batch command looks like this:
set MSPDBSRV_EXE=mspdbsrv.exe set MSPDBSRV_PATH=C:\Program Files (x86)\Microsoft Visual Studio 10.0\Common7\IDE set PATH=%MSPDBSRV_PATH%;%PATH% set ORIG_BUILD_ID=%BUILD_ID% set BUILD_ID=DoNotKillMe echo stop mspdbsrv.exe %MSPDBSRV_EXE% -stop echo wait 7 sec %windir%\system32\ping.exe -n 7 localhost> nul echo restart mspdbsrv.exe with a shutdowntime of 25 hours start /b %MSPDBSRV_EXE% -start -spawn -shutdowntime 90000 set BUILD_ID=%ORIG_BUILD_ID% set ORIG_BUILD_ID= exit 0
What the batch command does is:
stop the mspdbsrv.exe to free up resources
start mspdbsrv.exe with BUILD_ID=DoNotKillMe and a shutdowntime of 25 hours, that leaks the mspdbsrv process without getting killed and it runs for 25 hours so that other build jobs can use the already running process
What you maybe have to do is to change the Path to mspdbsrv -> set MSPDBSRV_PATH=C:\Program Files (x86)\Microsoft Visual Studio 10.0\Common7\IDE
Updating the msbuild plugin won't work in our situation. We run into this issue, but we don't have the plugin installed. Rather the issue comes for us in the Final Builder scripts we run via Jenkins that call msbuild.
set the environment variable
_MSPDBSRV_ENDPOINT_=$JENKINS_COOKIE
(The variable starts and ends with a single '_')
This will lead to separate instance of mspdbsrv being started.
mwinter69, thanks for the pointer.
We couldn't get it working with $JENKINS_COOKIE but managed to correct it by adding the following property via EnvInject prior to kicking off the build
_MSPDBSRV_ENDPOINT_=$BUILD_TAG
This resulted in a separate process being initiated for each build and no conflicts/error.
Edit: Correction due to formatting. Refer below
It is
_MSPDBSRV_ENDPOINT_
(with underlines) not MSPDBSRV_ENDPOINT.
Just realized it myself that it's a formatting issue. If you enclose the word in underlines it will get italicised and the underlines disappear.
We recently re-encountered this on our build network and I did some investigation, here's what I found:
- On the master node, the veto from MSBuild plugin works properly, I was able to confirm the log message show it.
- On a slave node, I do not see the log message from the veto. Instead I see a message that my process is being killed recursively (I was watching the process list to get the id during the build).
It appears that the veto logic doesn't execute on the slave nodes. Is there something special that has to be done in order for it to be detected and executed there? I don't understand enough about how the remoting logic in Jenkins operates to know the answer to this.
Most of the other work-arounds for this are ones that we cannot easily deploy in our environment. If this is truly the issue, does anyone have an idea what it would take to fix it and how long that would take to carry out?
I spent some more time chasing code and I have a suspicion as to the cause of the issue. In ProcessTree.java, there are two different functions that appear to need information from the master and yet operate in different manners
- getVeto() is how the whitelist extension is accessed to block the killing of the process. This function just gets the list as it exists, no attempt to go ask the master for any information.
- getKillers() is used to access the list of ProcessKillers if there are any classes implementing that extension point. This function gets the channel back to the master so it can ask for the master's list of classes implementing this extension.
I think that getVeto() needs to have part of it implemented more like getKillers(), so that it will go to the master for the list. It may be also that the accessor belongs in ProcessTree instead, so that it caches the data and doesn't go back to the master quite as much. Then, I think the veto logic should work properly on both a master and a slave. Unfortuntely, this means a change to Jenkins core and upgrading the full instance to fix the issue instead of just a fix to the plugin itself.
Is there any workaround to this issue, because it completely breaks our usage of Jenkins?
Hi grillba, thanks a lot for your suggestion. It seems that this solved our issues.
Little side note: It might not be sufficient to just specify _MSPDBSRV_ENDPOINT_ env variable in order to avoid conflicts. I recommend to additionally also set TMP , TEMP and TEMPDIR to an isolated folder if you plan on invoking MSBUILD in parallel as various plugins for MSBUILD as well as MSBUILD itself will place files there.
Further catch of using _MSPDBSRV_ENDPOINT_ is, that now serialization of parallel builds in the same working directory will break in return, unless you made sure that the tempoary files for the different architectures (e.g. the temporary program database created with the individual object files, and commonly named just e.g. "Debug\vc120.pdb", notice the lack of a prefix for the architecture) are completely isolated as well. Otherwise the different mspdbsrv-instances will now collide accessing the same file.
grillba, walteste Hi there, we've got this issue too, and we followed your suggestions to config the master Jenkins node like this:
Configure system > Environment variables > Add new key value pair below:
KEY: _MSPDBSRV_ENDPOINT_
VALUE: $BUILD_TAG
But we got nothing, the error still raised up on windows slave, could you please explain the solution in detail? Should we set this Key-Value on the slave node? Thanks in advance
@billhoo,
You need to do it at the Job level - Not the system level. Use envinject to add the environment variable
Have a look here for how to use envinject, https://wiki.jenkins.io/display/JENKINS/EnvInject+Plugin
Make sure you follow the "Inject variables as a build step" topic
Regards
Mark
Thanks for the timely reply, we've followed your guide and found that there were already 3 seprated mspdbsvr.exe processes(for test purpose, we've ran 3 jobs on one windows slave concurrently) ran in background, so it seems worked, but unfortunately, one of our job still failed due to C1090 error.
This is the screenshot of EnvInject in each of our 3 Pipeline jobs configuration page,
I don't think there's anything wrong here, do I miss something?
Thanks,
Bill.
Just in case this helps anyone, I was able to fix all problems mentioned so far in this issue and comments by following the recommendations on this blog post:
http://blog.peter-b.co.uk/2017/02/stop-mspdbsrv-from-breaking-ci-build.html
The solution involves
1. Installing the MSBuild plugin ver. 1.26 or higher in Jenkins. Setup for use on the server is optional, only needs to be installed. This stops Jenkins from killing the mspdbsrv process automatically.
2. Using the _MSPDBSRV_ENDPOINT_ environment variable as done in the comment above.
3. Spawning and killing a new specific mspdbsrv instance of the right Visual Studio version at the beginning and end of each job which uses it.
Powershell implementation of the Python solution in the blog (change VS140COMNTOOLS to the version of Visual Studio being used):
# Manually start mspdbsrv so a parallel job's instance isn't used, works because _MSPDBSRV_ENDPOINT_ is set to a unique value # (otherwise results in "Fatal error C1090: PDB API call failed, error code '23'" when one of the builds completes). $mspdbsrv_proc = Start-Process -FilePath "${env:VS140COMNTOOLS}\..\IDE\mspdbsrv.exe" -ArgumentList ('-start','-shutdowntime','-1') -passthru .\{PowershellBuildScriptName}.ps1 # Manually kill mspdbsrv once the build completes using the previously saved process id Stop-Process $mspdbsrv_proc.Id
I had the same problem with parallel builds (eg. running in parallel job A from trunk and job A from branch), I tried the solution with _MSPDBSRV_ENDPOINT_ with value BUILD_TAG and it worked almost for all jobs. In one situation I still had that error. So I replaced BUILD_TAG with JOB_NAME environment variable and suddenly it was fine, for now we are out of problems. If anyone has still the problem with ENDPOINT solution, try to change BUILD_TAG for something else. If you do not allow parallel build in single job, JOB_NAME should be enough, otherwise you can try JOB_NAME + BUILD_NUMBER combination.
Maybe ENDPOINT has some restrictions, but I did not have a time to inspect this deeper. What I know is that the problematic job has the longest name in my Jenkins - approx. 48 characters.
Please can anyone advise me how to set _MSPDBSRV_ENDPOINT_ with value BUILD_TAG in a pipeline declarative script?
I don’t really understand the difference between defining and injecting an environment variable. I could do:
stage('build_VisualStudio') {
environment { _MSPDBSRV_ENDPOINT_=$BUILD_TAG }
etc.
Would that be sufficient or must environment variable injection be done in a different way?
Code changed in jenkins
User: Daniel Beck
Path:
content/_data/changelogs/weekly.yml
http://jenkins-ci.org/commit/jenkins.io/0391fcb9b4c957e9e41fde03409de330a3de571d
Log:
Remove JENKINS-9104 fix from release to unblock it
Code changed in jenkins
User: Daniel Beck
Path:
content/_data/changelogs/weekly.yml
http://jenkins-ci.org/commit/jenkins.io/62409d42a5769cac66337cbd4b5df5754f0e2384
Log:
Merge pull request #1522 from daniel-beck/changelog-2.119-amended
Remove JENKINS-9104 fix from release to unblock it
Compare: https://github.com/jenkins-infra/jenkins.io/compare/58f029c79331...62409d42a576
Code changed in jenkins
User: Jesse Glick
Path:
core/src/main/java/hudson/util/ProcessTree.java
test/src/test/java/hudson/util/ProcessTreeKillerTest.java
http://jenkins-ci.org/commit/jenkins/3465da4764c322baf4fb5b90651ef6b9bcd409fb
Log:
Merge pull request #3419 from dwnusbaum/JENKINS-9104-test-fix
Fix test failure by cleaning up static state after tests
Compare: https://github.com/jenkinsci/jenkins/compare/ddbc4bbce7d3...3465da4764c3
*NOTE:* This service been marked for deprecation: https://developer.github.com/changes/2018-04-25-github-services-deprecation/
Functionality will be removed from GitHub.com on January 31st, 2019.
Jenkins 2.120 contains a fix for the previous problem of the ProcessKillingVeto extension point not working on agents.
I'm occasionally getting this error with the latest versions of Jenkins and all the plugins. It started in the recent months, haven't been a problem for a year before that. The problem seems to have NOT been resolved, or possibly re-emerged.
What can I do, is there a workaround? Sporadic build failures for no reason are super annoying.
Same error with latest Jenkins ver. 2.150.3
The error is aways occured when running two jobs concurrently on the same agent with VS2015:
fatal error C1090: PDB API
billhoo, thanks for the tip! I was running VS 2017 (v141 toolset), but there were indeed two simultaneous jobs! So the workaround is to limit this agent to one job at a time. Pity, as it's a pretty powerful multicore server, but it's better than flaky builds.
vuiletgiraffe, totaly the same, we have many different jobs which use MSVC14 as toolchain, but now we can only perform one build at a time, its a huge waste of mashine resources ;(
Hope it can be truly solved.
Solution is still the same, before invoking `msbuild`, set the following environment variables to something unique:
_MSPDBSRV_ENDPOINT_=<UUID> TMP=<Unique Tempdir> TEMP=$TMP TMPDIR=$TMP
Once you have done that, you can launch as many parallel MSBuild instances as you like, even mixing different msbuild versions or whatever. They will not interfere in any way. Doing that on a regular base with mixed MSVC12, MSVC14 and MSVC15 toolchains on the same machine, and didn't have any issues since.
The "official" fix for this problem (trying not to kill the job scheduler) is plain wrong, and causes massive issues. Mostly because MSBuild itself isn't exactly stable either when using the same job server for multiple parallel builds. And if the builds are using different toolchains, a crash is ensured.
I used ext3h's solution:
we solved it like this in a jenkins github multi-branch setup with jenkinsfiles:
bat """ mkdir tmp set _MSPDBSRV_ENDPOINT_= ${BUILD_TAG} set TMP=${Workspace}\\tmp set TEMP=${Workspace}\\tmp set TMPDIR=${Workspace}\\tmp build.bat """
When you use the commandline switch /Z7 the debug info is stored in the object and no server process is needed. This should also solve the problem.