Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-41932

libzfs native API changed for OpenZFS - causes crashes on newer illumos distributions

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • core
    • None

      Context: Jenkins.WAR ships with a libzfs-0.5.jar which gets enabled and loaded on operating systems detected as Solaris or similar, which currently is a broad family including the proprietary Oracle Solaris and various open-source distributions that appeared from the legacy of OpenSolaris and are currently based on the illumos core and its intimate relationship to the OpenZFS spin-off project, sharing code with *BSD, ZoL and perhaps (inofficial) MacOS support of ZFS.

      Problem: The libzfs.jar provides JNA wrapping to internal (not public, not committed) API/ABI of the native libzfs.so which is the best (only) binding there is, de-facto. Unfortunately, such non-committed APIs changed (after the split of Oracle and FOSS codebases); the change happened in OpenZFS codebase about 5 years ago and was picked back up by illumos-gate about 9 months ago - so the issue began manifesting in rolling-releases of OSes that deploy a new illumos core as commits land, so since about June 2016. When the existing JAR tries to call a native function with a wrong signature, the JVM segfaults. Currently libzfs.jar assumes that ZFS is only present on Solaris-like OSes, so the issue did not manifest in other platforms that support it in fact - but those also do not take advantage of ZFS either (this is a separate issue though).

      As discussed with ci_jenkinsci_org at FOSDEM, there are in fact several codebases for libzfs.jar itself too - one on his github account https://github.com/kohsuke/libzfs4j (which in fact has the fix for newer ZFS API in https://github.com/kohsuke/libzfs4j/commit/05067e754e56e7249e320d86cea769c3b878aeeb), and another at java.net (https://java.net/projects/zfs - with older API, and so it seems to be the one shipped in Jenkins).

      It seems feasible and safe to detect presence of new API by querying the libzfs.so for presence of routines that support ZFS feature flags (something that Oracle ZFS will likely never have) and so make assumptions OpenZFS vs OracleZFS, and further assuming a recent OpenZFS - set a Java boolea flag and use the new function signature following a Java if-clause.

      As a fallback for cases where automagic guesses wrongly, and/or as the initial implementation, the same toggle can be done by an envvar set in the application server initscript/service. At least, this fix will hold for the lifetime of the server, across updates of Jenkins.war (currently the custom build of libzfs.jar has to be substituted as part of upgrade procedure).

      Another option to explore is to perhaps integrate instead with libzfs_core https://github.com/openzfs/openzfs/tree/master/usr/src/lib/libzfs_core https://www.illumos.org/issues/2882 which is AFAIK the attempt at a stable and public API/ABI to tools using and managing ZFS in OpenZFS and illumos, and maybe at libzfs_jni https://github.com/openzfs/openzfs/tree/master/usr/src/lib/libzfs_jni - but the latter seems stale, and AFAIK there is a desire to evict it and nuke the build-time dependency of OpenZFS and illumos-gate on Java.

          [JENKINS-41932] libzfs native API changed for OpenZFS - causes crashes on newer illumos distributions

          Jim Klimov added a comment -

          Jim Klimov added a comment - PR proposed at https://github.com/kohsuke/libzfs4j/pull/3

          Jim Klimov added a comment - - edited

          Initial PR was accepted and digested into Kohsuke's libzfs4j codebase. Another PR is so far pending to take care of similar signature changes that took place over time. https://github.com/kohsuke/libzfs4j/pull/4

          Task not closed for now, because JENKINS-42176 should get solved first (to actually do deliver this code with binary distributions of `jenkins.war`).

          Jim Klimov added a comment - - edited Initial PR was accepted and digested into Kohsuke's libzfs4j codebase. Another PR is so far pending to take care of similar signature changes that took place over time. https://github.com/kohsuke/libzfs4j/pull/4 Task not closed for now, because JENKINS-42176 should get solved first (to actually do deliver this code with binary distributions of `jenkins.war`).

          Code changed in jenkins
          User: Oleg Nenashev
          Path:
          core/pom.xml
          http://jenkins-ci.org/commit/jenkins/e9f0995f4d87e41a048185d29d98515ba33bcef4
          Log:
          [JENKINS-41932, JENKINS-42176] - Update libzfs4j from 0.5 to 0.8 (#2776)

          • Update libzfs4j from 0.5 to 0.7
          • Change the groupId of libzfs
          • Pick libzfs 0.8 with compatibility fixes

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Oleg Nenashev Path: core/pom.xml http://jenkins-ci.org/commit/jenkins/e9f0995f4d87e41a048185d29d98515ba33bcef4 Log: [JENKINS-41932, JENKINS-42176] - Update libzfs4j from 0.5 to 0.8 (#2776) Update libzfs4j from 0.5 to 0.7 Change the groupId of libzfs Pick libzfs 0.8 with compatibility fixes

          Oleg Nenashev added a comment -

          The fix has been integrated towards 2.55. jimklimov I am not 100% sure it's safe to backport it to LTS, but please mark it as lts-candidate if you need it in 2.46.3

          Oleg Nenashev added a comment - The fix has been integrated towards 2.55. jimklimov I am not 100% sure it's safe to backport it to LTS, but please mark it as lts-candidate if you need it in 2.46.3

            kohsuke Kohsuke Kawaguchi
            jimklimov Jim Klimov
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: