Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-40842

swarm 2.2 SEGV w/ java-1.8.0-openjdk-1.8.0.111-0.b15.el6_8.x86_64

    • Icon: Bug Bug
    • Resolution: Not A Defect
    • Icon: Major Major
    • swarm-plugin
    • None
    • CentOS release 6.8 (Final)
      java-1.8.0-openjdk-1.8.0.111-0.b15.el6_8.x86_64

      I updated 2x centos 6 and 4x centos 7 swarm slaves to java 1.8.0.111-0.b15 yesterday morning. Overnight, both of the centos 6 slaves had died with a SEGV.

      #
      # A fatal error has been detected by the Java Runtime Environment:
      #
      #  SIGSEGV (0xb) at pc=0x00007fce95b91ee0, pid=7403, tid=0x00007fce9dfe7700
      #
      # JRE version: OpenJDK Runtime Environment (8.0_111-b15) (build 1.8.0_111-b15)
      # Java VM: OpenJDK 64-Bit Server VM (25.111-b15 mixed mode linux-amd64 compressed oops)
      # Problematic frame:
      # C  0x00007fce95b91ee0
      #
      # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
      #
      # An error report file with more information is saved as:
      # /home/jenkins-slave/hs_err_pid7403.log
      #
      # If you would like to submit a bug report, please visit:
      #   http://bugreport.java.com/bugreport/crash.jsp
      #
      

          [JENKINS-40842] swarm 2.2 SEGV w/ java-1.8.0-openjdk-1.8.0.111-0.b15.el6_8.x86_64

          Joshua Hoblitt created issue -
          Joshua Hoblitt made changes -
          Attachment Original: hs_err_pid7403.log [ 35358 ]
          Joshua Hoblitt made changes -
          Attachment New: hs_err_pid7403.log [ 35360 ]

          I am continuing to see occasional slave segvs on el6 after updating java to `java-1.8.0-openjdk-1.8.0.121-0.b13.el7_3.x86_64`:

          #
          # A fatal error has been detected by the Java Runtime Environment:
          #
          #  SIGSEGV (0xb) at pc=0x00007f838915bee0, pid=1559, tid=0x00007f8391396700
          #
          # JRE version: OpenJDK Runtime Environment (8.0_121-b13) (build 1.8.0_121-b13)
          # Java VM: OpenJDK 64-Bit Server VM (25.121-b13 mixed mode linux-amd64 compressed oops)
          # Problematic frame:
          # C  0x00007f838915bee0
          #
          # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
          #
          # An error report file with more information is saved as:
          # /tmp/hs_err_pid1559.log
          #
          # If you would like to submit a bug report, please visit:
          #   http://bugreport.java.com/bugreport/crash.jsp
          #
          

          Joshua Hoblitt added a comment - I am continuing to see occasional slave segvs on el6 after updating java to `java-1.8.0-openjdk-1.8.0.121-0.b13.el7_3.x86_64`: # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00007f838915bee0, pid=1559, tid=0x00007f8391396700 # # JRE version: OpenJDK Runtime Environment (8.0_121-b13) (build 1.8.0_121-b13) # Java VM: OpenJDK 64-Bit Server VM (25.121-b13 mixed mode linux-amd64 compressed oops) # Problematic frame: # C 0x00007f838915bee0 # # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again # # An error report file with more information is saved as: # /tmp/hs_err_pid1559.log # # If you would like to submit a bug report, please visit: # http: //bugreport.java.com/bugreport/crash.jsp #

          Ryan Fox added a comment -

          I am seeing similar SIGSEGVs on Java (Oracle) JDK 1.8.0_112-b15

          Ryan Fox added a comment - I am seeing similar SIGSEGVs on Java (Oracle) JDK 1.8.0_112-b15

          Spencer Malone added a comment - - edited

          We're also seeing this, and I feel like the ticket priority should be bumped up until a workaround is presented. It's bad enough that we're rewriting the puppet-jenkins module to support SSH slaves instead of using this plugin, because the unreliability is causing regular job failures. With 4 slave workers, we were experiencing 1-2 going down per day. Swapped off the swarm plugin and haven't experienced a single node go down in ~ a week.

           

          It's also unclear if newer versions do or don't have this problem, but it's hard to update to 3.3 with so much of the changelog seemingly missing. Does 3.x's changelog combine all the changes of the prior failed releases?

          Spencer Malone added a comment - - edited We're also seeing this, and I feel like the ticket priority should be bumped up until a workaround is presented. It's bad enough that we're rewriting the puppet-jenkins module to support SSH slaves instead of using this plugin, because the unreliability is causing regular job failures. With 4 slave workers, we were experiencing 1-2 going down per day. Swapped off the swarm plugin and haven't experienced a single node go down in ~ a week.   It's also unclear if newer versions do or don't have this problem, but it's hard to update to 3.3 with so much of the changelog seemingly missing. Does 3.x's changelog combine all the changes of the prior failed releases?
          Spencer Malone made changes -
          Priority Original: Minor [ 4 ] New: Critical [ 2 ]

          Oleg Nenashev added a comment -

          KK does not maintain this plugin anymore. Moving to unassigned to set the expectation

          Oleg Nenashev added a comment - KK does not maintain this plugin anymore. Moving to unassigned to set the expectation
          Oleg Nenashev made changes -
          Assignee Original: Kohsuke Kawaguchi [ kohsuke ]
          Basil Crow made changes -
          Priority Original: Critical [ 2 ] New: Major [ 3 ]

            Unassigned Unassigned
            jhoblitt Joshua Hoblitt
            Votes:
            2 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: