Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-40842

swarm 2.2 SEGV w/ java-1.8.0-openjdk-1.8.0.111-0.b15.el6_8.x86_64

    • Icon: Bug Bug
    • Resolution: Not A Defect
    • Icon: Major Major
    • swarm-plugin
    • None
    • CentOS release 6.8 (Final)
      java-1.8.0-openjdk-1.8.0.111-0.b15.el6_8.x86_64

      I updated 2x centos 6 and 4x centos 7 swarm slaves to java 1.8.0.111-0.b15 yesterday morning. Overnight, both of the centos 6 slaves had died with a SEGV.

      #
      # A fatal error has been detected by the Java Runtime Environment:
      #
      #  SIGSEGV (0xb) at pc=0x00007fce95b91ee0, pid=7403, tid=0x00007fce9dfe7700
      #
      # JRE version: OpenJDK Runtime Environment (8.0_111-b15) (build 1.8.0_111-b15)
      # Java VM: OpenJDK 64-Bit Server VM (25.111-b15 mixed mode linux-amd64 compressed oops)
      # Problematic frame:
      # C  0x00007fce95b91ee0
      #
      # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
      #
      # An error report file with more information is saved as:
      # /home/jenkins-slave/hs_err_pid7403.log
      #
      # If you would like to submit a bug report, please visit:
      #   http://bugreport.java.com/bugreport/crash.jsp
      #
      

          [JENKINS-40842] swarm 2.2 SEGV w/ java-1.8.0-openjdk-1.8.0.111-0.b15.el6_8.x86_64

          I am continuing to see occasional slave segvs on el6 after updating java to `java-1.8.0-openjdk-1.8.0.121-0.b13.el7_3.x86_64`:

          #
          # A fatal error has been detected by the Java Runtime Environment:
          #
          #  SIGSEGV (0xb) at pc=0x00007f838915bee0, pid=1559, tid=0x00007f8391396700
          #
          # JRE version: OpenJDK Runtime Environment (8.0_121-b13) (build 1.8.0_121-b13)
          # Java VM: OpenJDK 64-Bit Server VM (25.121-b13 mixed mode linux-amd64 compressed oops)
          # Problematic frame:
          # C  0x00007f838915bee0
          #
          # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
          #
          # An error report file with more information is saved as:
          # /tmp/hs_err_pid1559.log
          #
          # If you would like to submit a bug report, please visit:
          #   http://bugreport.java.com/bugreport/crash.jsp
          #
          

          Joshua Hoblitt added a comment - I am continuing to see occasional slave segvs on el6 after updating java to `java-1.8.0-openjdk-1.8.0.121-0.b13.el7_3.x86_64`: # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00007f838915bee0, pid=1559, tid=0x00007f8391396700 # # JRE version: OpenJDK Runtime Environment (8.0_121-b13) (build 1.8.0_121-b13) # Java VM: OpenJDK 64-Bit Server VM (25.121-b13 mixed mode linux-amd64 compressed oops) # Problematic frame: # C 0x00007f838915bee0 # # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again # # An error report file with more information is saved as: # /tmp/hs_err_pid1559.log # # If you would like to submit a bug report, please visit: # http: //bugreport.java.com/bugreport/crash.jsp #

          Ryan Fox added a comment -

          I am seeing similar SIGSEGVs on Java (Oracle) JDK 1.8.0_112-b15

          Ryan Fox added a comment - I am seeing similar SIGSEGVs on Java (Oracle) JDK 1.8.0_112-b15

          Spencer Malone added a comment - - edited

          We're also seeing this, and I feel like the ticket priority should be bumped up until a workaround is presented. It's bad enough that we're rewriting the puppet-jenkins module to support SSH slaves instead of using this plugin, because the unreliability is causing regular job failures. With 4 slave workers, we were experiencing 1-2 going down per day. Swapped off the swarm plugin and haven't experienced a single node go down in ~ a week.

           

          It's also unclear if newer versions do or don't have this problem, but it's hard to update to 3.3 with so much of the changelog seemingly missing. Does 3.x's changelog combine all the changes of the prior failed releases?

          Spencer Malone added a comment - - edited We're also seeing this, and I feel like the ticket priority should be bumped up until a workaround is presented. It's bad enough that we're rewriting the puppet-jenkins module to support SSH slaves instead of using this plugin, because the unreliability is causing regular job failures. With 4 slave workers, we were experiencing 1-2 going down per day. Swapped off the swarm plugin and haven't experienced a single node go down in ~ a week.   It's also unclear if newer versions do or don't have this problem, but it's hard to update to 3.3 with so much of the changelog seemingly missing. Does 3.x's changelog combine all the changes of the prior failed releases?

          Oleg Nenashev added a comment -

          KK does not maintain this plugin anymore. Moving to unassigned to set the expectation

          Oleg Nenashev added a comment - KK does not maintain this plugin anymore. Moving to unassigned to set the expectation

          Basil Crow added a comment -

          The crash was in com.kenai.jffi.PageManager, and the latest version of the Swarm client doesn't even have that library or that class in the JAR, so it is safe to say this is no longer a bug. Please ensure you are running the latest LTS release of Jenkins, the latest release of the Swarm plugin, and the latest release of the Swarm client.

          Basil Crow added a comment - The crash was in com.kenai.jffi.PageManager , and the latest version of the Swarm client doesn't even have that library or that class in the JAR, so it is safe to say this is no longer a bug. Please ensure you are running the latest LTS release of Jenkins, the latest release of the Swarm plugin, and the latest release of the Swarm client.

            Unassigned Unassigned
            jhoblitt Joshua Hoblitt
            Votes:
            2 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: