Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-67995

SystemdLifecycle logging "Operation not permitted" calling sd_notify(3) during startup

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Minor Minor
    • core
    • Jenkins 2.332.1 (just upgraded)
      Debian 11 (bullseye)
    • 2.339, 2.332.2

      Having just upgraded to the new Jenkins LTS 2.332.1, I'm noticing lots of warnings in the logs during startup, reporting a failure to talk to systemd to send sd_notify startup notifications:

      Mar 10 07:01:33 l.maxb.eu jenkins[1411]: 2022-03-10 07:01:33.875+0000 [id=22]        WARNING        h.lifecycle.SystemdLifecycle#notify
      Mar 10 07:01:33 l.maxb.eu jenkins[1411]: com.sun.jna.LastErrorException: [1] Operation not permitted
      Mar 10 07:01:33 l.maxb.eu jenkins[1411]:         at com.sun.jna.Native.invokeInt(Native Method)
      Mar 10 07:01:33 l.maxb.eu jenkins[1411]:         at com.sun.jna.Function.invoke(Function.java:426)
      Mar 10 07:01:33 l.maxb.eu jenkins[1411]:         at com.sun.jna.Function.invoke(Function.java:361)
      Mar 10 07:01:33 l.maxb.eu jenkins[1411]:         at com.sun.jna.Library$Handler.invoke(Library.java:265)
      Mar 10 07:01:33 l.maxb.eu jenkins[1411]:         at hudson.lifecycle.$Proxy21.sd_notify(Unknown Source)
      Mar 10 07:01:33 l.maxb.eu jenkins[1411]:         at hudson.lifecycle.SystemdLifecycle.notify(SystemdLifecycle.java:64)
      Mar 10 07:01:33 l.maxb.eu jenkins[1411]:         at hudson.lifecycle.SystemdLifecycle.onReady(SystemdLifecycle.java:35)
      Mar 10 07:01:33 l.maxb.eu jenkins[1411]:         at hudson.WebAppMain$3.run(WebAppMain.java:258) 

      Similar tracebacks are reported at multiple points during the startup procedure, as multiple systemd notifications are sent.

      It seems Jenkins is being denied access to write to the systemd notification socket, for some reason.

       

      UPDATE: I captured an strace, which reveals what part of the operation is actually failing:

      socket(AF_UNIX, SOCK_DGRAM|SOCK_CLOEXEC, 0) = 309
      getsockopt(309, SOL_SOCKET, SO_SNDBUF, [212992], [4]) = 0
      setsockopt(309, SOL_SOCKET, SO_SNDBUF, [8388608], 4) = 0
      getsockopt(309, SOL_SOCKET, SO_SNDBUF, [425984], [4]) = 0
      setsockopt(309, SOL_SOCKET, SO_SNDBUFFORCE, [8388608], 4) = -1 EPERM (Operation not permitted)
      getuid()                                = 111
      geteuid()                               = 111
      getgid()                                = 117
      getegid()                               = 117
      sendmsg(309, {msg_name={sa_family=AF_UNIX, sun_path="/run/systemd/notify"}, msg_namelen=22, msg_iov=[{iov_base="READY=1", iov_len=7}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 7
      close(309)                              = 0 

       

      From

      https://linux.die.net/man/7/socket
      SO_SNDBUFFORCE (since Linux 2.6.14)
      Using this socket option, a privileged (CAP_NET_ADMIN) process can perform the same task as SO_SNDBUF, but the wmem_max limit can be overridden.

      This seems like an incorrect thing for code running within the Jenkins process to be attempting.

       

      UPDATE 2: This call is actually being made in libsystemd, but there, it intends to ignore the error from this call.

      It even casts the return code to void to make this explicit:

      https://github.com/systemd/systemd/blob/b62c27050320c697392d40167b5ebaaa0057e5f0/src/libsystemd/sd-daemon/sd-daemon.c#L482

      The problem here is that Jenkins' JNA wrapper around sd_notify is misinterpreting the return value. The canonical reference is

      https://www.freedesktop.org/software/systemd/man/sd_notify.html

      Return Value

      On failure, these calls return a negative errno-style error code. If $NOTIFY_SOCKET was not set and hence no status message could be sent, 0 is returned. If the status was sent, these functions return a positive value. In order to support both service managers that implement this scheme and those which do not, it is generally recommended to ignore the return value of this call. Note that the return value simply indicates whether the notification message was enqueued properly, it does not reflect whether the message could be processed successfully. Specifically, no error is returned when a file descriptor is attempted to be stored using FDSTORE=1 but the service is not actually configured to permit storing of file descriptors (see above).

      The JNA wrapper appears to be ignoring the successful positive return value, and just seeing that errno was set by the (allowed, ignored) SO_SNDBUFFORCE operation. It therefore raises an exception because it's not properly implementing the error return convention in use.

            basil Basil Crow
            maxb Max
            Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: