-
Bug
-
Resolution: Fixed
-
Major
-
Windows Server 2008 R2; VMware;
-
Powered by SuggestiMate
Jenkins Server get blue screen to death crash about twice a day. After sending the dump to microsoft, it turns out there The Java related binary D:\Jenkins_CI\war\WEB-INF\lib\winp.x64.35A3F35D2ED629A6CEC5B41DD1D280C3.dll has called to Terminate the csrss.exe and csrss.exe is a critical process since it was called to Terminate this process the machine Bug Checked.
What is CSRSS.EXE?
This is the user-mode portion of the Win32 subsystem (with Win32.sys being the kernel-mode portion). Csrss stands for client/server run-time subsystem and is an essential subsystem that must be running at all times. Csrss is responsible for console windows, creating and/or deleting threads, and some parts of the 16-bit virtual MS-DOS environment
[JENKINS-24453] Jenkins server got blue screen to death
Integrated in jenkins_main_trunk #3700
[FIXED JENKINS-24453] (Revision d000cd65a7b0152334532bc7abd148095d5cee09)
Result = SUCCESS
kohsuke : d000cd65a7b0152334532bc7abd148095d5cee09
Files :
- changelog.html
We've been running into this issue since at least March 2013. One out of our 5 nodes seems to blue screen every few days. I always figured it was something in Java or driver issue since I didn't think it was possible for a userspace program to crash the whole machine, so I just worked around it.
We're currently running Jenkins LTS 1.565.2 on a Linux master with Windows 7 64-bit slaves.
I have a complete crash dump (8.3 GB) from a BSOD today. Here's it's bugcheck analysis. I'm not familiar enough with Windows internals to really dig much deeper, but let me know if I can help debug further.
Kernel Complete Dump File: Full address space is available Symbol search path is: srv*D:\MSSymbols*http://msdl.microsoft.com/download/symbols Executable search path is: Windows 7 Kernel Version 7601 (Service Pack 1) MP (8 procs) Free x64 Product: WinNt, suite: TerminalServer SingleUserTS Built by: 7601.18229.amd64fre.win7sp1_gdr.130801-1533 Machine Name: Kernel base = 0xfffff800`03049000 PsLoadedModuleList = 0xfffff800`0328c6d0 Debug session time: Thu Sep 25 13:58:37.568 2014 (UTC - 5:00) System Uptime: 2 days 1:04:26.686 Loading Kernel Symbols ............................................................... ................................................................ .... Loading User Symbols ....................................................... Loading unloaded module list ........Unable to enumerate user-mode unloaded modules, NTSTATUS 0xC0000147 ******************************************************************************* * * * Bugcheck Analysis * * * ******************************************************************************* Use !analyze -v to get detailed debugging information. BugCheck F4, {3, fffffa800aa49060, fffffa800aa49340, fffff800033c50d0} *** ERROR: Symbol file could not be found. Defaulted to export symbols for winp.x64.FEF9CB80B43534DCA303AC36686258E8.dll - Probably caused by : _ Followup: MachineOwner --------- 2: kd> !analyze -v ******************************************************************************* * * * Bugcheck Analysis * * * ******************************************************************************* CRITICAL_OBJECT_TERMINATION (f4) A process or thread crucial to system operation has unexpectedly exited or been terminated. Several processes and threads are necessary for the operation of the system; when they are terminated (for any reason), the system can no longer function. Arguments: Arg1: 0000000000000003, Process Arg2: fffffa800aa49060, Terminating object Arg3: fffffa800aa49340, Process image file name Arg4: fffff800033c50d0, Explanatory message (ascii) Debugging Details: ------------------ PROCESS_OBJECT: fffffa800aa49060 IMAGE_NAME: _ DEBUG_FLR_IMAGE_TIMESTAMP: 0 MODULE_NAME: _ FAULTING_MODULE: 0000000000000000 PROCESS_NAME: java.exe BUGCHECK_STR: 0xF4_java.exe DEFAULT_BUCKET_ID: VISTA_DRIVER_FAULT CURRENT_IRQL: 0 LAST_CONTROL_TRANSFER: from fffff8000344cd92 to fffff800030beb80 STACK_TEXT: fffff880`072b89c8 fffff800`0344cd92 : 00000000`000000f4 00000000`00000003 fffffa80`0aa49060 fffffa80`0aa49340 : nt!KeBugCheckEx fffff880`072b89d0 fffff800`033f91db : ffffffff`ffffffff fffffa80`10987b50 fffffa80`0aa49060 fffffa80`0a3ccb30 : nt!PspCatchCriticalBreak+0x92 fffff880`072b8a10 fffff800`03378ec4 : ffffffff`ffffffff 00000000`00000001 fffffa80`0aa49060 00000000`00000008 : nt! ?? ::NNGAKEGL::`string'+0x17476 fffff880`072b8a60 fffff800`030bde13 : fffffa80`0aa49060 00000000`ffffffff fffffa80`10987b50 00000000`00000000 : nt!NtTerminateProcess+0xf4 fffff880`072b8ae0 00000000`770c157a : 000007fe`fd35402f 00000000`00000000 00000000`00000000 00000000`12c2ec30 : nt!KiSystemServiceCopyEnd+0x13 00000000`12c2e998 000007fe`fd35402f : 00000000`00000000 00000000`00000000 00000000`12c2ec30 00000000`00000ab0 : ntdll!NtTerminateProcess+0xa 00000000`12c2e9a0 000007fe`fa3c193c : 00000000`0000075c 00000000`00000001 00000000`00000000 00000000`0000075c : KERNELBASE!TerminateProcess+0x2f 00000000`12c2e9d0 00000000`0000075c : 00000000`00000001 00000000`00000000 00000000`0000075c 00000000`0052c290 : winp_x64_FEF9CB80B43534DCA303AC36686258E8!Java_org_jvnet_winp_Native_noop+0x178 00000000`12c2e9d8 00000000`00000001 : 00000000`00000000 00000000`0000075c 00000000`0052c290 000007fe`fa3c1a40 : 0x75c 00000000`12c2e9e0 00000000`00000000 : 00000000`0000075c 00000000`0052c290 000007fe`fa3c1a40 00000000`000007f0 : 0x1 STACK_COMMAND: kb FOLLOWUP_NAME: MachineOwner FAILURE_BUCKET_ID: X64_0xF4_java.exe_IMAGE__ BUCKET_ID: X64_0xF4_java.exe_IMAGE__ Followup: MachineOwner ---------
Code changed in jenkins
User: Kohsuke Kawaguchi
Path:
core/pom.xml
http://jenkins-ci.org/commit/jenkins/21fb398bfa7068f6f72154763bdbb09baddf313d
Log:
JENKINS-24453[ZD-20503] Incorporated a newer version of winp.
This version checks the critical flag of the process, as killing such a
process results in BSoD.
(cherry picked from commit 040b7d4047b324f17c0971f2511f954eb173fa23)
Conflicts:
changelog.html
Compare: https://github.com/jenkinsci/jenkins/compare/b105627b7357^...21fb398bfa70
Integrated in jenkins_main_trunk #4292
JENKINS-24453[ZD-20503] Incorporated a newer version of winp. (Revision 21fb398bfa7068f6f72154763bdbb09baddf313d)
Result = UNSTABLE
ogondza : 21fb398bfa7068f6f72154763bdbb09baddf313d
Files :
- core/pom.xml
Hello, we are facing similar issue as described in this thread. Based on comments I thought the issue was solved in version 1.580.1 but we are running version 2.19.1 and we got the strange behavior also there.
One of our Windows boxes resulted in BSOD. When we inspected memory dump we found following details.
Debugging Details:
------------------
DUMP_CLASS: 1
DUMP_QUALIFIER: 401
BUILD_VERSION_STRING: 7601.23572.amd64fre.win7sp1_ldr.161011-0600
SYSTEM_MANUFACTURER: VMware, Inc.
VIRTUAL_MACHINE: VMware
SYSTEM_PRODUCT_NAME: VMware Virtual Platform
SYSTEM_VERSION: None
BIOS_VENDOR: Phoenix Technologies LTD
BIOS_VERSION: 6.00
BIOS_DATE: 09/17/2015
BASEBOARD_MANUFACTURER: Intel Corporation
BASEBOARD_PRODUCT: 440BX Desktop Reference Platform
BASEBOARD_VERSION: None
DUMP_TYPE: 1
BUGCHECK_P1: 3
BUGCHECK_P2: fffffa8182e4c2a0
BUGCHECK_P3: fffffa8182e4c580
BUGCHECK_P4: fffff80001994b70
PROCESS_NAME: csrss.exe
CRITICAL_PROCESS: csrss.exe
IMAGE_NAME: csrss.exe
DEBUG_FLR_IMAGE_TIMESTAMP: 0
MODULE_NAME: csrss
FAULTING_MODULE: 0000000000000000
EXCEPTION_CODE: (HRESULT) 0x855ee060 (2237587552) - <Unable to get error code text>
ERROR_CODE: (NTSTATUS) 0x855ee060 - <Unable to get error code text>
CPU_COUNT: 4
CPU_MHZ: 960
CPU_VENDOR: GenuineIntel
CPU_FAMILY: 6
CPU_MODEL: f
CPU_STEPPING: 1
CPU_MICROCODE: 6,f,1,0 (F,M,S,R) SIG: 428'00000000 (cache) 428'00000000 (init)
DEFAULT_BUCKET_ID: WIN7_DRIVER_FAULT
BUGCHECK_STR: 0xF4
CURRENT_IRQL: 0
ANALYSIS_SESSION_HOST: NVPC100
ANALYSIS_SESSION_TIME: 03-01-2017 12:25:49.0246
ANALYSIS_VERSION: 10.0.14321.1024 amd64fre
STACK_TEXT:
fffff880`069abb18 fffff800`01a1e852 : 00000000`000000f4 00000000`00000003 fffffa81`82e4c2a0 fffffa81`82e4c580 : nt!KeBugCheckEx
fffff880`069abb20 fffff800`019dc09b : 00000000`00000001 fffffa81`855ee060 fffffa81`82e4c2a0 fffffa81`83d68b01 : nt!PspCatchCriticalBreak+0x92
fffff880`069abb60 fffff800`01945454 : 00000000`00000001 00000000`00000328 fffffa81`82e4c2a0 fffffa81`00000008 : nt! ?? ::NNGAKEGL::`string'+0x27296
fffff880`069abbb0 fffff800`01689693 : 00000000`00000328 fffffa81`855ee060 fffffa81`82e4c2a0 00000000`00000328 : nt!NtTerminateProcess+0x284
fffff880`069abc20 00000000`776cbffa : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x13
00000000`1ac8e008 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x776cbffa
STACK_COMMAND: kb
FOLLOWUP_NAME: MachineOwner
FAILURE_BUCKET_ID: X64_0xF4_csrss.exe_BUGCHECK_CRITICAL_PROCESS_TERMINATED_BY_jenkins-slave.exe_855ee060
BUCKET_ID: X64_0xF4_csrss.exe_BUGCHECK_CRITICAL_PROCESS_TERMINATED_BY_jenkins-slave.exe_855ee060
PRIMARY_PROBLEM_CLASS: X64_0xF4_csrss.exe_BUGCHECK_CRITICAL_PROCESS_TERMINATED_BY_jenkins-slave.exe_855ee060
TARGET_TIME: 2017-03-01T09:52:59.000Z
OSBUILD: 7601
OSSERVICEPACK: 1000
SERVICEPACK_NUMBER: 0
OS_REVISION: 0
SUITE_MASK: 400
PRODUCT_TYPE: 3
OSPLATFORM_TYPE: x64
OSNAME: Windows 7
OSEDITION: Windows 7 Server (Service Pack 1) TerminalServer DataCenter SingleUserTS
OS_LOCALE:
USER_LCID: 0
OSBUILD_TIMESTAMP: 2016-10-11 16:57:55
BUILDDATESTAMP_STR: 161011-0600
BUILDLAB_STR: win7sp1_ldr
BUILDOSVER_STR: 6.1.7601.23572.amd64fre.win7sp1_ldr.161011-0600
ANALYSIS_SESSION_ELAPSED_TIME: 9e0
ANALYSIS_SOURCE: KM
FAILURE_ID_HASH_STRING: km:x64_0xf4_csrss.exe_bugcheck_critical_process_terminated_by_jenkins-slave.exe_855ee060
FAILURE_ID_HASH:
{795c17b0-7869-c608-f0a3-7708c7ee022c}Followup: MachineOwner
Yes, there is a known issue with WinP: https://github.com/kohsuke/winp/issues/22 . This fix has been released in winp-1.24 and then integrated into Jenkins 2.34: https://github.com/jenkinsci/jenkins/commit/63c2f6c5d7d154a3a0f58c54f04f9b1a25ea5385.
According to the code the fix has been also backported to Jenkins 2.32.1 though changelogs do not say it explicitly. I will make sure that changelogs get updated.
The recommendation is to update the core to the latest LTS
Thank you Oleg for your quick response. We will do the upgrade and will let you know if the issue reoccur.
Code changed in jenkins
User: Oleg Nenashev
Path:
content/_data/changelogs/lts.yml
content/_partials/changelog-weekly.html
http://jenkins-ci.org/commit/jenkins.io/1e74ef665f37621c5d7fda4c0f5a42e3e4462a59
Log:
JENKINS-24453 - Noting fix in Jenkins 2.34 and 2.32.1
It was a missing fix, since I didn't notice there is a Jenkins JIRA issue for it.
This PR updates the Weekly release entry and also adds the missing entry for the Stable release
Code changed in jenkins
User: R. Tyler Croy
Path:
content/_data/changelogs/lts.yml
content/_partials/changelog-weekly.html
http://jenkins-ci.org/commit/jenkins.io/89b2c800c392195aca397baac4ceb91af08ad7f4
Log:
Merge pull request #723 from oleg-nenashev/changelog/JENKINS-24453
JENKINS-24453 - Noting fix in Jenkins 2.34 and 2.32.1
Compare: https://github.com/jenkins-infra/jenkins.io/compare/c49bb48b33bc...89b2c800c392
csrss.exe wouldn't have the right environment variables, so it should never match the killing criteria.
In 1.583 we made a change to winp so that at least it won't terminate a critical process, but we'll likely still have to figure out why that process was selected as a target of killing.