-
Bug
-
Resolution: Unresolved
-
Critical
-
None
-
Jenkins ver. 2.178
Ubuntu 16.04
We've not been able to narrow down the exact repro.
We've had 2 different servers with the jenkins master running on them crash in a similar way, it always appears to be random memory addresses for the SIGSEGV SEGV_MAPERR crash.
All I could narrow it down to is that it happens during the day when devs are likely to be either in the UI, using plugins, pushing changes that might trigger a job, etc. Outside of those hours I haven't noticed any of these crashes. Though even some days the crashes wouldn't happen, other days it happened many times throughout the day. No particular patterns that I could find.
We also appear to be able to run fine when we downgrade to 2.168, no crashes yet there on either of the servers.
I can provide more of the log files it generates if it helps. And can add other log files but they're all very similar.
I'm new to debugging Java issues and tried to search for ways of narrowing it down but it seems that it happening in native code has made it difficult for me to provide more useful information. I even tried upping the verbosity of the logging but that didn't seem to change much if it's in the same jenkins.log files.
If you have any other information or tools I might use to better narrow down the issue please let me know as I'd like to get this figured out so that we can update to a version of Jenkins that doesn't regularly crash.
# # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00007fff342d05d0, pid=689704, tid=0x00007fff9cbfd700 # # JRE version: OpenJDK Runtime Environment (8.0_212-b03) (build 1.8.0_212-8u212-b03-0ubuntu1.16.04.1-b03) # Java VM: OpenJDK 64-Bit Server VM (25.212-b03 mixed mode linux-amd64 compressed oops) # Problematic frame: # C 0x00007fff342d05d0 # # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again # # If you would like to submit a bug report, please visit: # http://bugreport.java.com/bugreport/crash.jsp # --------------- T H R E A D --------------- Current thread is native thread siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x00007fff342d05d0 Registers: RAX=0x0000000000000001, RBX=0x00007fff9cbfda68, RCX=0x00007fff9cbfd700, RDX=0x00007fff342d05d0 RSP=0x00007fff9cbfcf18, RBP=0x00007fffee018318, RSI=0x0000000000000000, RDI=0x00007ffcd8049740 R8 =0x00007ffcd803c460, R9 =0x0000000000000001, R10=0x00000000000010e0, R11=0x00007fffee2002d8 R12=0x0000000000000000, R13=0x00007fffee0182c8, R14=0x0000000000000004, R15=0x00007fff9cbfdc18 RIP=0x00007fff342d05d0, EFLAGS=0x0000000000010206, CSGSFS=0x004b000000000053, ERR=0x0000000000000014 TRAPNO=0x000000000000000e Top of Stack: (sp=0x00007fff9cbfcf18) 0x00007fff9cbfcf18: 00007fffede06439 00007fffa6bfd10f 0x00007fff9cbfcf28: 0000000000000000 0000000000000000 0x00007fff9cbfcf38: 0000000000000000 00007fffcccfe17f 0x00007fff9cbfcf48: 0000000000100000 00007ffd680027d0 0x00007fff9cbfcf58: 00007fffede07870 0000000000000000 0x00007fff9cbfcf68: 00007fff9cbfd700 00007fff9cbfd700 0x00007fff9cbfcf78: a68917a264a42f63 0000000000000000 0x00007fff9cbfcf88: 00007fffcccfe17f 0000000000100000 0x00007fff9cbfcf98: 00007ffd680027d0 59762eddfa642f63 0x00007fff9cbfcfa8: 5976cc62885e2f63 0000000000000000 0x00007fff9cbfcfb8: 0000000000000000 0000000000000000 0x00007fff9cbfcfc8: 0000000000000000 0000000000000000 0x00007fff9cbfcfd8: 0000000000000000 0000000000000000 0x00007fff9cbfcfe8: 0000000000000000 00007fff9cbfd700 0x00007fff9cbfcff8: 00007fffeeb0741d 0000000000000000 0x00007fff9cbfd008: 0000000000000000 0000000000000000 0x00007fff9cbfd018: 0000000000000000 0000000000000000 0x00007fff9cbfd028: 0000000000000000 0000000000000000 0x00007fff9cbfd038: 0000000000000000 0000000000000000 0x00007fff9cbfd048: 0000000000000000 0000000000000000 0x00007fff9cbfd058: 0000000000000000 0000000000000000 0x00007fff9cbfd068: 0000000000000000 0000000000000000 0x00007fff9cbfd078: 0000000000000000 0000000000000000 0x00007fff9cbfd088: 0000000000000000 0000000000000000 0x00007fff9cbfd098: 0000000000000000 0000000000000000 0x00007fff9cbfd0a8: 0000000000000000 0000000000000000 0x00007fff9cbfd0b8: 0000000000000000 0000000000000000 0x00007fff9cbfd0c8: 0000000000000000 0000000000000000 0x00007fff9cbfd0d8: 0000000000000000 0000000000000000 0x00007fff9cbfd0e8: 0000000000000000 0000000000000000 0x00007fff9cbfd0f8: 0000000000000000 0000000000000000 0x00007fff9cbfd108: 0000000000000000 0000000000000000 Instructions: (pc=0x00007fff342d05d0) 0x00007fff342d05b0: [error occurred during error reporting (printing registers, top of stack, instructions near pc), id 0xb] Register to memory mapping: RAX=0x0000000000000001 is an unknown value RBX=0x00007fff9cbfda68 is an unknown value RCX=0x00007fff9cbfd700 is an unknown value RDX=0x00007fff342d05d0 is an unknown value RSP=0x00007fff9cbfcf18 is an unknown value RBP=0x00007fffee018318: <offset 0x218318> in /lib/x86_64-linux-gnu/libpthread.so.0 at 0x00007fffede00000 RSI=0x0000000000000000 is an unknown value RDI=0x00007ffcd8049740 is an unknown value R8 =0x00007ffcd803c460 is an unknown value R9 =0x0000000000000001 is an unknown value R10=0x00000000000010e0 is an unknown value R11=0x00007fffee2002d8: <offset 0x2d8> in /lib/x86_64-linux-gnu/libdl.so.2 at 0x00007fffee200000 R12=0x0000000000000000 is an unknown value R13=0x00007fffee0182c8: <offset 0x2182c8> in /lib/x86_64-linux-gnu/libpthread.so.0 at 0x00007fffede00000 R14=0x0000000000000004 is an unknown value R15=0x00007fff9cbfdc18 is an unknown value Stack: [0x00007fff9cafd000,0x00007fff9cbfe000], sp=0x00007fff9cbfcf18, free space=1023k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) C 0x00007fff342d05d0
Well, we thought rolling back to 2.168 was going to help here but we now have 2 crashes today that are very similar to the one attached. It seems to be not happening as frequently with this build. Though it could very well be whatever is triggering it is something special. One triggered just after we turned on a service that sends data to the jenkins master to trigger builds, though it only seems like it was due to a potential load issue at the time as after that there was only one other failure in the last 6 or so hours.
Any way for us to provide more information here to help get this solved?