Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-67995

SystemdLifecycle logging "Operation not permitted" calling sd_notify(3) during startup

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Minor Minor
    • core
    • Jenkins 2.332.1 (just upgraded)
      Debian 11 (bullseye)
    • 2.339, 2.332.2

      Having just upgraded to the new Jenkins LTS 2.332.1, I'm noticing lots of warnings in the logs during startup, reporting a failure to talk to systemd to send sd_notify startup notifications:

      Mar 10 07:01:33 l.maxb.eu jenkins[1411]: 2022-03-10 07:01:33.875+0000 [id=22]        WARNING        h.lifecycle.SystemdLifecycle#notify
      Mar 10 07:01:33 l.maxb.eu jenkins[1411]: com.sun.jna.LastErrorException: [1] Operation not permitted
      Mar 10 07:01:33 l.maxb.eu jenkins[1411]:         at com.sun.jna.Native.invokeInt(Native Method)
      Mar 10 07:01:33 l.maxb.eu jenkins[1411]:         at com.sun.jna.Function.invoke(Function.java:426)
      Mar 10 07:01:33 l.maxb.eu jenkins[1411]:         at com.sun.jna.Function.invoke(Function.java:361)
      Mar 10 07:01:33 l.maxb.eu jenkins[1411]:         at com.sun.jna.Library$Handler.invoke(Library.java:265)
      Mar 10 07:01:33 l.maxb.eu jenkins[1411]:         at hudson.lifecycle.$Proxy21.sd_notify(Unknown Source)
      Mar 10 07:01:33 l.maxb.eu jenkins[1411]:         at hudson.lifecycle.SystemdLifecycle.notify(SystemdLifecycle.java:64)
      Mar 10 07:01:33 l.maxb.eu jenkins[1411]:         at hudson.lifecycle.SystemdLifecycle.onReady(SystemdLifecycle.java:35)
      Mar 10 07:01:33 l.maxb.eu jenkins[1411]:         at hudson.WebAppMain$3.run(WebAppMain.java:258) 

      Similar tracebacks are reported at multiple points during the startup procedure, as multiple systemd notifications are sent.

      It seems Jenkins is being denied access to write to the systemd notification socket, for some reason.

       

      UPDATE: I captured an strace, which reveals what part of the operation is actually failing:

      socket(AF_UNIX, SOCK_DGRAM|SOCK_CLOEXEC, 0) = 309
      getsockopt(309, SOL_SOCKET, SO_SNDBUF, [212992], [4]) = 0
      setsockopt(309, SOL_SOCKET, SO_SNDBUF, [8388608], 4) = 0
      getsockopt(309, SOL_SOCKET, SO_SNDBUF, [425984], [4]) = 0
      setsockopt(309, SOL_SOCKET, SO_SNDBUFFORCE, [8388608], 4) = -1 EPERM (Operation not permitted)
      getuid()                                = 111
      geteuid()                               = 111
      getgid()                                = 117
      getegid()                               = 117
      sendmsg(309, {msg_name={sa_family=AF_UNIX, sun_path="/run/systemd/notify"}, msg_namelen=22, msg_iov=[{iov_base="READY=1", iov_len=7}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 7
      close(309)                              = 0 

       

      From

      https://linux.die.net/man/7/socket
      SO_SNDBUFFORCE (since Linux 2.6.14)
      Using this socket option, a privileged (CAP_NET_ADMIN) process can perform the same task as SO_SNDBUF, but the wmem_max limit can be overridden.

      This seems like an incorrect thing for code running within the Jenkins process to be attempting.

       

      UPDATE 2: This call is actually being made in libsystemd, but there, it intends to ignore the error from this call.

      It even casts the return code to void to make this explicit:

      https://github.com/systemd/systemd/blob/b62c27050320c697392d40167b5ebaaa0057e5f0/src/libsystemd/sd-daemon/sd-daemon.c#L482

      The problem here is that Jenkins' JNA wrapper around sd_notify is misinterpreting the return value. The canonical reference is

      https://www.freedesktop.org/software/systemd/man/sd_notify.html

      Return Value

      On failure, these calls return a negative errno-style error code. If $NOTIFY_SOCKET was not set and hence no status message could be sent, 0 is returned. If the status was sent, these functions return a positive value. In order to support both service managers that implement this scheme and those which do not, it is generally recommended to ignore the return value of this call. Note that the return value simply indicates whether the notification message was enqueued properly, it does not reflect whether the message could be processed successfully. Specifically, no error is returned when a file descriptor is attempted to be stored using FDSTORE=1 but the service is not actually configured to permit storing of file descriptors (see above).

      The JNA wrapper appears to be ignoring the successful positive return value, and just seeing that errno was set by the (allowed, ignored) SO_SNDBUFFORCE operation. It therefore raises an exception because it's not properly implementing the error return convention in use.

          [JENKINS-67995] SystemdLifecycle logging "Operation not permitted" calling sd_notify(3) during startup

          Max created issue -
          Max made changes -
          Description Original: Having just upgraded to the new Jenkins LTS 2.332.1, I'm noticing lots of warnings in the logs during startup, reporting a failure to talk to systemd to send sd_notify startup notifications:
          {noformat}
          Mar 10 07:01:33 l.maxb.eu jenkins[1411]: 2022-03-10 07:01:33.875+0000 [id=22] WARNING h.lifecycle.SystemdLifecycle#notify
          Mar 10 07:01:33 l.maxb.eu jenkins[1411]: com.sun.jna.LastErrorException: [1] Operation not permitted
          Mar 10 07:01:33 l.maxb.eu jenkins[1411]: at com.sun.jna.Native.invokeInt(Native Method)
          Mar 10 07:01:33 l.maxb.eu jenkins[1411]: at com.sun.jna.Function.invoke(Function.java:426)
          Mar 10 07:01:33 l.maxb.eu jenkins[1411]: at com.sun.jna.Function.invoke(Function.java:361)
          Mar 10 07:01:33 l.maxb.eu jenkins[1411]: at com.sun.jna.Library$Handler.invoke(Library.java:265)
          Mar 10 07:01:33 l.maxb.eu jenkins[1411]: at hudson.lifecycle.$Proxy21.sd_notify(Unknown Source)
          Mar 10 07:01:33 l.maxb.eu jenkins[1411]: at hudson.lifecycle.SystemdLifecycle.notify(SystemdLifecycle.java:64)
          Mar 10 07:01:33 l.maxb.eu jenkins[1411]: at hudson.lifecycle.SystemdLifecycle.onReady(SystemdLifecycle.java:35)
          Mar 10 07:01:33 l.maxb.eu jenkins[1411]: at hudson.WebAppMain$3.run(WebAppMain.java:258) {noformat}
          Similar tracebacks are reported at multiple points during the startup procedure, as multiple systemd notifications are sent.

          It seems Jenkins is being denied access to write to the systemd notification socket, for some reason.
          New: Having just upgraded to the new Jenkins LTS 2.332.1, I'm noticing lots of warnings in the logs during startup, reporting a failure to talk to systemd to send sd_notify startup notifications:
          {noformat}
          Mar 10 07:01:33 l.maxb.eu jenkins[1411]: 2022-03-10 07:01:33.875+0000 [id=22] WARNING h.lifecycle.SystemdLifecycle#notify
          Mar 10 07:01:33 l.maxb.eu jenkins[1411]: com.sun.jna.LastErrorException: [1] Operation not permitted
          Mar 10 07:01:33 l.maxb.eu jenkins[1411]: at com.sun.jna.Native.invokeInt(Native Method)
          Mar 10 07:01:33 l.maxb.eu jenkins[1411]: at com.sun.jna.Function.invoke(Function.java:426)
          Mar 10 07:01:33 l.maxb.eu jenkins[1411]: at com.sun.jna.Function.invoke(Function.java:361)
          Mar 10 07:01:33 l.maxb.eu jenkins[1411]: at com.sun.jna.Library$Handler.invoke(Library.java:265)
          Mar 10 07:01:33 l.maxb.eu jenkins[1411]: at hudson.lifecycle.$Proxy21.sd_notify(Unknown Source)
          Mar 10 07:01:33 l.maxb.eu jenkins[1411]: at hudson.lifecycle.SystemdLifecycle.notify(SystemdLifecycle.java:64)
          Mar 10 07:01:33 l.maxb.eu jenkins[1411]: at hudson.lifecycle.SystemdLifecycle.onReady(SystemdLifecycle.java:35)
          Mar 10 07:01:33 l.maxb.eu jenkins[1411]: at hudson.WebAppMain$3.run(WebAppMain.java:258) {noformat}
          Similar tracebacks are reported at multiple points during the startup procedure, as multiple systemd notifications are sent.

          It seems Jenkins is being denied access to write to the systemd notification socket, for some reason.

           

          UPDATE: I captured an strace, which reveals what part of the operation is actually failing:
          {noformat}
          socket(AF_UNIX, SOCK_DGRAM|SOCK_CLOEXEC, 0) = 309
          getsockopt(309, SOL_SOCKET, SO_SNDBUF, [212992], [4]) = 0
          setsockopt(309, SOL_SOCKET, SO_SNDBUF, [8388608], 4) = 0
          getsockopt(309, SOL_SOCKET, SO_SNDBUF, [425984], [4]) = 0
          setsockopt(309, SOL_SOCKET, SO_SNDBUFFORCE, [8388608], 4) = -1 EPERM (Operation not permitted)
          getuid() = 111
          geteuid() = 111
          getgid() = 117
          getegid() = 117
          sendmsg(309, {msg_name={sa_family=AF_UNIX, sun_path="/run/systemd/notify"}, msg_namelen=22, msg_iov=[{iov_base="READY=1", iov_len=7}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 7
          close(309) = 0 {noformat}
           
          {panel:title=From https://linux.die.net/man/7/socket}
          *SO_SNDBUFFORCE* (since Linux 2.6.14)
          Using this socket option, a privileged (*CAP_NET_ADMIN*) process can perform the same task as *SO_SNDBUF*, but the _wmem_max_ limit can be overridden.
          {panel}
          This seems like an incorrect thing for code running within the Jenkins process to be attempting.
          Max made changes -
          Description Original: Having just upgraded to the new Jenkins LTS 2.332.1, I'm noticing lots of warnings in the logs during startup, reporting a failure to talk to systemd to send sd_notify startup notifications:
          {noformat}
          Mar 10 07:01:33 l.maxb.eu jenkins[1411]: 2022-03-10 07:01:33.875+0000 [id=22] WARNING h.lifecycle.SystemdLifecycle#notify
          Mar 10 07:01:33 l.maxb.eu jenkins[1411]: com.sun.jna.LastErrorException: [1] Operation not permitted
          Mar 10 07:01:33 l.maxb.eu jenkins[1411]: at com.sun.jna.Native.invokeInt(Native Method)
          Mar 10 07:01:33 l.maxb.eu jenkins[1411]: at com.sun.jna.Function.invoke(Function.java:426)
          Mar 10 07:01:33 l.maxb.eu jenkins[1411]: at com.sun.jna.Function.invoke(Function.java:361)
          Mar 10 07:01:33 l.maxb.eu jenkins[1411]: at com.sun.jna.Library$Handler.invoke(Library.java:265)
          Mar 10 07:01:33 l.maxb.eu jenkins[1411]: at hudson.lifecycle.$Proxy21.sd_notify(Unknown Source)
          Mar 10 07:01:33 l.maxb.eu jenkins[1411]: at hudson.lifecycle.SystemdLifecycle.notify(SystemdLifecycle.java:64)
          Mar 10 07:01:33 l.maxb.eu jenkins[1411]: at hudson.lifecycle.SystemdLifecycle.onReady(SystemdLifecycle.java:35)
          Mar 10 07:01:33 l.maxb.eu jenkins[1411]: at hudson.WebAppMain$3.run(WebAppMain.java:258) {noformat}
          Similar tracebacks are reported at multiple points during the startup procedure, as multiple systemd notifications are sent.

          It seems Jenkins is being denied access to write to the systemd notification socket, for some reason.

           

          UPDATE: I captured an strace, which reveals what part of the operation is actually failing:
          {noformat}
          socket(AF_UNIX, SOCK_DGRAM|SOCK_CLOEXEC, 0) = 309
          getsockopt(309, SOL_SOCKET, SO_SNDBUF, [212992], [4]) = 0
          setsockopt(309, SOL_SOCKET, SO_SNDBUF, [8388608], 4) = 0
          getsockopt(309, SOL_SOCKET, SO_SNDBUF, [425984], [4]) = 0
          setsockopt(309, SOL_SOCKET, SO_SNDBUFFORCE, [8388608], 4) = -1 EPERM (Operation not permitted)
          getuid() = 111
          geteuid() = 111
          getgid() = 117
          getegid() = 117
          sendmsg(309, {msg_name={sa_family=AF_UNIX, sun_path="/run/systemd/notify"}, msg_namelen=22, msg_iov=[{iov_base="READY=1", iov_len=7}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 7
          close(309) = 0 {noformat}
           
          {panel:title=From https://linux.die.net/man/7/socket}
          *SO_SNDBUFFORCE* (since Linux 2.6.14)
          Using this socket option, a privileged (*CAP_NET_ADMIN*) process can perform the same task as *SO_SNDBUF*, but the _wmem_max_ limit can be overridden.
          {panel}
          This seems like an incorrect thing for code running within the Jenkins process to be attempting.
          New: Having just upgraded to the new Jenkins LTS 2.332.1, I'm noticing lots of warnings in the logs during startup, reporting a failure to talk to systemd to send sd_notify startup notifications:
          {noformat}
          Mar 10 07:01:33 l.maxb.eu jenkins[1411]: 2022-03-10 07:01:33.875+0000 [id=22] WARNING h.lifecycle.SystemdLifecycle#notify
          Mar 10 07:01:33 l.maxb.eu jenkins[1411]: com.sun.jna.LastErrorException: [1] Operation not permitted
          Mar 10 07:01:33 l.maxb.eu jenkins[1411]: at com.sun.jna.Native.invokeInt(Native Method)
          Mar 10 07:01:33 l.maxb.eu jenkins[1411]: at com.sun.jna.Function.invoke(Function.java:426)
          Mar 10 07:01:33 l.maxb.eu jenkins[1411]: at com.sun.jna.Function.invoke(Function.java:361)
          Mar 10 07:01:33 l.maxb.eu jenkins[1411]: at com.sun.jna.Library$Handler.invoke(Library.java:265)
          Mar 10 07:01:33 l.maxb.eu jenkins[1411]: at hudson.lifecycle.$Proxy21.sd_notify(Unknown Source)
          Mar 10 07:01:33 l.maxb.eu jenkins[1411]: at hudson.lifecycle.SystemdLifecycle.notify(SystemdLifecycle.java:64)
          Mar 10 07:01:33 l.maxb.eu jenkins[1411]: at hudson.lifecycle.SystemdLifecycle.onReady(SystemdLifecycle.java:35)
          Mar 10 07:01:33 l.maxb.eu jenkins[1411]: at hudson.WebAppMain$3.run(WebAppMain.java:258) {noformat}
          Similar tracebacks are reported at multiple points during the startup procedure, as multiple systemd notifications are sent.

          It seems Jenkins is being denied access to write to the systemd notification socket, for some reason.

           

          UPDATE: I captured an strace, which reveals what part of the operation is actually failing:
          {noformat}
          socket(AF_UNIX, SOCK_DGRAM|SOCK_CLOEXEC, 0) = 309
          getsockopt(309, SOL_SOCKET, SO_SNDBUF, [212992], [4]) = 0
          setsockopt(309, SOL_SOCKET, SO_SNDBUF, [8388608], 4) = 0
          getsockopt(309, SOL_SOCKET, SO_SNDBUF, [425984], [4]) = 0
          setsockopt(309, SOL_SOCKET, SO_SNDBUFFORCE, [8388608], 4) = -1 EPERM (Operation not permitted)
          getuid() = 111
          geteuid() = 111
          getgid() = 117
          getegid() = 117
          sendmsg(309, {msg_name={sa_family=AF_UNIX, sun_path="/run/systemd/notify"}, msg_namelen=22, msg_iov=[{iov_base="READY=1", iov_len=7}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 7
          close(309) = 0 {noformat}
           
          {panel:title=From}
          [https://linux.die.net/man/7/socket]
           *SO_SNDBUFFORCE* (since Linux 2.6.14)
           Using this socket option, a privileged (*CAP_NET_ADMIN*) process can perform the same task as *SO_SNDBUF*, but the _wmem_max_ limit can be overridden.
          {panel}
          This seems like an incorrect thing for code running within the Jenkins process to be attempting.

           

          UPDATE 2: This call is actually being made in libsystemd, but there, it intends to ignore the error from this call.

          It even casts the return code to void to make this explicit:

          [https://github.com/systemd/systemd/blob/b62c27050320c697392d40167b5ebaaa0057e5f0/src/libsystemd/sd-daemon/sd-daemon.c#L482]

          The problem here is that Jenkins' JNA wrapper around sd_notify is misinterpreting the return value. The canonical reference is
          {panel}
          [https://www.freedesktop.org/software/systemd/man/sd_notify.html]
          h2. Return Value

          On failure, these calls return a negative errno-style error code. If {{$NOTIFY_SOCKET}} was not set and hence no status message could be sent, 0 is returned. If the status was sent, these functions return a positive value. In order to support both service managers that implement this scheme and those which do not, it is generally recommended to ignore the return value of this call. Note that the return value simply indicates whether the notification message was enqueued properly, it does not reflect whether the message could be processed successfully. Specifically, no error is returned when a file descriptor is attempted to be stored using {{FDSTORE=1}} but the service is not actually configured to permit storing of file descriptors (see above).
          {panel}
          The JNA wrapper appears to be ignoring the successful positive return value, and just seeing that errno was set by the (allowed, ignored) SO_SNDBUFFORCE operation. It therefore raises an exception because it's not properly implementing the error return convention in use.
          Basil Crow made changes -
          Assignee New: Basil Crow [ basil ]
          Basil Crow made changes -
          Summary Original: SystemdLifecycle logging "Operation not permitted" calling sd_notify during startup New: SystemdLifecycle logging "Operation not permitted" calling sd_notify(3) during startup
          Basil Crow made changes -
          Status Original: Open [ 1 ] New: In Progress [ 3 ]
          Basil Crow made changes -
          Status Original: In Progress [ 3 ] New: In Review [ 10005 ]
          Basil Crow made changes -
          Remote Link New: This issue links to "jenkinsci/jenkins#6359 (Web Link)" [ 27480 ]
          Basil Crow made changes -
          Labels New: lts-candidate regression

          Fabian Pijcke added a comment -

          I have the same warnings, after an update to version 2.232.1, on Debian 11.

          My /etc/systemd/system/jenkins.service.d/override.conf is as follows, everything else is untouched.

          [Service]
          Environment="JENKINS_PORT=8090"
          Environment="JENKINS_JAVA_CMD=/usr/lib/jvm/java-11-openjdk-amd64/bin/java"
          

          Fabian Pijcke added a comment - I have the same warnings, after an update to version 2.232.1, on Debian 11. My /etc/systemd/system/jenkins.service.d/override.conf is as follows, everything else is untouched. [Service] Environment="JENKINS_PORT=8090" Environment="JENKINS_JAVA_CMD=/usr/lib/jvm/java-11-openjdk-amd64/bin/java"

            basil Basil Crow
            maxb Max
            Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: