• Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • core
    • Jenkins 2.426.3

      I am experiencing an issue with all Jenkins agents after a random period of runtime (it can be an hour or even a day). The error occurs with any script that triggers the pipeline. In a more direct test, I attempted to access the script console of the specific node and executed println "uname -a".execute().text. This resulted in the following error:

      java.io.IOException: error=0, Failed to exec spawn helper: pid: 15842, signal: 11
          at java.base/java.lang.ProcessImpl.forkAndExec(Native Method)
          at java.base/java.lang.ProcessImpl.<init>(ProcessImpl.java:314)
          at java.base/java.lang.ProcessImpl.start(ProcessImpl.java:244)
          at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1110)
      Caused: java.io.IOException: Cannot run program "uname": error=0, Failed to exec spawn helper: pid: 15842, signal: 11
          at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1143)
          at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1073)
          at java.base/java.lang.Runtime.exec(Runtime.java:594)
          at java.base/java.lang.Runtime.exec(Runtime.java:418)
          at java.base/java.lang.Runtime.exec(Runtime.java:315)
          at org.codehaus.groovy.runtime.ProcessGroovyMethods.execute(ProcessGroovyMethods.java:544)
          at org.codehaus.groovy.runtime.dgm$895.invoke(Unknown Source)
          at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite$PojoMetaMethodSiteNoUnwrapNoCoerce.invoke(PojoMetaMethodSite.java:274)
          at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMetaMethodSite.java:56)
          at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
          at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
          at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:120)
          at Script1.run(Script1.groovy:1)
          at groovy.lang.GroovyShell.evaluate(GroovyShell.java:574)
          at groovy.lang.GroovyShell.evaluate(GroovyShell.java:612)
          at groovy.lang.GroovyShell.evaluate(GroovyShell.java:583)
          at hudson.util.RemotingDiagnostics$Script.call(RemotingDiagnostics.java:149)
          at hudson.util.RemotingDiagnostics$Script.call(RemotingDiagnostics.java:118)
          at hudson.remoting.UserRequest.perform(UserRequest.java:211)
          at hudson.remoting.UserRequest.perform(UserRequest.java:54)
          at hudson.remoting.Request$2.run(Request.java:377)
          at hudson.remoting.InterceptingExecutorService.lambda$wrap$0(InterceptingExecutorService.java:78)
          at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
          at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
          at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
          at java.base/java.lang.Thread.run(Thread.java:840)

       

       

      It is not directly related to the uname -a script, as the issue occurs with any script execution attempt.

      I am encountering an issue automatically at the start of the job if the node is broken.
       
      The strange thing is that if I restart Jenkins, everything works fine, with the nodes already created.

          [JENKINS-72665] Agent error before some executions

          Jonatan added a comment -

          Thanks mark. Indeed we use Amazon Linux 2 in the agents. We are going to update to Amazon linux 2023.

          Jonatan added a comment - Thanks mark. Indeed we use Amazon Linux 2 in the agents. We are going to update to Amazon linux 2023.

          Enda added a comment -

          We had the same issue but we are using Azure Virtual Machines for our agents. Our controller is a kubernetes pod running on AKS and everything is managed via Flux. Adding the vfork option to CasC seems to have fixed the errors

          Enda added a comment - We had the same issue but we are using Azure Virtual Machines for our agents. Our controller is a kubernetes pod running on AKS and everything is managed via Flux. Adding the vfork option to CasC seems to have fixed the errors

          See also https://bugs.launchpad.net/ubuntu/+source/openjdk-17/+bug/2055280, in my case it's a Jenkins agent running on Ubuntu 22.04.4 LTS, and after updating the OpenJDK 17 JRE to `17.0.10+7-1~22.04.1` jspawnhelper consistently started segfaulting.

          It appears to be related to this change in openjdk17: https://github.com/openjdk/jdk17u/commit/cd6cb730c934d8e16d4bd8e3342e59e806f158f9 which alters the command line handling of jspawnhelper.

          I am unsure how Jenkins exactly invokes jspawnhelper, but it looks likely that some interface was subtly broken.

          Dimitry Andric added a comment - See also https://bugs.launchpad.net/ubuntu/+source/openjdk-17/+bug/2055280 , in my case it's a Jenkins agent running on Ubuntu 22.04.4 LTS, and after updating the OpenJDK 17 JRE to `17.0.10+7-1~22.04.1` jspawnhelper consistently started segfaulting. It appears to be related to this change in openjdk17: https://github.com/openjdk/jdk17u/commit/cd6cb730c934d8e16d4bd8e3342e59e806f158f9 which alters the command line handling of jspawnhelper. I am unsure how Jenkins exactly invokes jspawnhelper, but it looks likely that some interface was subtly broken.

          Note: I just posted https://bugs.launchpad.net/ubuntu/+source/openjdk-17/+bug/2055280/comments/5, which I will partially reproduce here for informational purposes:

          The root cause is that unattended-upgrades (or some other apt upgrade) does a openjdk-17 package update, while at the same time a java process is running. After this minor upgrade, the protocol between the JRE's forkAndExec JNI function and the jspawnhelper tool is changed! The jspawnhelper tool now expects argv[0] to be the executable name of itself, argv[1] to be a "%d:%d" format string with two file descriptors, and argv[2] to be NULL.

          However, the any already-running java process will still use the old protocol, which invoked jspawnhelper with the "%d:%d" format string in argv[0], and argv[1] set to NULL. This is what makes the new jspawnhelper executable segfault.

          Therefore, with this particular openjdk-17 upgrade, even it is a minor 'patch' upgrade, it is vital that ALL java processes that intend to spawn external processes are immediately terminated, and restarted.

          See also https://bugs.openjdk.org/browse/JDK-8310265 ("(process) jspawnhelper should not use argv[0]"), and related https://bugs.openjdk.org/browse/JDK-8325567 ("jspawnhelper without args fails with segfault"). In the latter bug, it is also noted that the protocol between the JRE and jspawnhelper changed (even in a supposedly 'minor' update!), and this will cause segfaults.

          Therefore, when nodes running Jenkins agents receive this particular OpenJDK update, the agents should be relaunched, or if more brute force is desired, the whole node should be restarted.

          Dimitry Andric added a comment - Note: I just posted https://bugs.launchpad.net/ubuntu/+source/openjdk-17/+bug/2055280/comments/5 , which I will partially reproduce here for informational purposes: The root cause is that unattended-upgrades (or some other apt upgrade) does a openjdk-17 package update, while at the same time a java process is running. After this minor upgrade, the protocol between the JRE's forkAndExec JNI function and the jspawnhelper tool is changed! The jspawnhelper tool now expects argv [0] to be the executable name of itself, argv [1] to be a "%d:%d" format string with two file descriptors, and argv [2] to be NULL. However, the any already-running java process will still use the old protocol, which invoked jspawnhelper with the "%d:%d" format string in argv [0] , and argv [1] set to NULL. This is what makes the new jspawnhelper executable segfault. Therefore, with this particular openjdk-17 upgrade, even it is a minor 'patch' upgrade, it is vital that ALL java processes that intend to spawn external processes are immediately terminated, and restarted. See also https://bugs.openjdk.org/browse/JDK-8310265 ("(process) jspawnhelper should not use argv [0] "), and related https://bugs.openjdk.org/browse/JDK-8325567 ("jspawnhelper without args fails with segfault"). In the latter bug, it is also noted that the protocol between the JRE and jspawnhelper changed (even in a supposedly 'minor' update!), and this will cause segfaults. Therefore, when nodes running Jenkins agents receive this particular OpenJDK update, the agents should be relaunched, or if more brute force is desired, the whole node should be restarted.

          Dimitry Andric added a comment - - edited

          As for Amazon Linux 2, if you get the latest `java-11-openjdk-headless` package, this corresponds to openjdk upstream version jdk-11.0.22-ga:

          # rpm -q --changelog java-11-openjdk-headless|head -6
          * Wed Jan 10 2024 Andrew Hughes <gnu.andrew@redhat.com> - 1:11.0.22.0.7-1
          - Update to jdk-11.0.22+7 (GA)
          - Update release notes to 11.0.22+7
          - Switch to GA mode for release
          - ** This tarball is embargoed until 2024-01-16 @ 1pm PT. **
          - Resolves: RHEL-20966
          

          The upstream commit that introduces the same non-backwards compatible jspawnhelper is here:
          https://github.com/openjdk/jdk11u/commit/416c48e9d30ba4232bc9d592693992eea6819211

          Hence, on Amazon Linux 2 you see the same issue as on Ubuntu 22.04: if you upgrade the JRE but don't terminate and restart all Java processes, segfaults will occur if those Java processes attempt to spawn external processes.

          Dimitry Andric added a comment - - edited As for Amazon Linux 2, if you get the latest `java-11-openjdk-headless` package, this corresponds to openjdk upstream version jdk-11.0.22-ga: # rpm -q --changelog java-11-openjdk-headless|head -6 * Wed Jan 10 2024 Andrew Hughes <gnu.andrew@redhat.com> - 1:11.0.22.0.7-1 - Update to jdk-11.0.22+7 (GA) - Update release notes to 11.0.22+7 - Switch to GA mode for release - ** This tarball is embargoed until 2024-01-16 @ 1pm PT. ** - Resolves: RHEL-20966 The upstream commit that introduces the same non-backwards compatible jspawnhelper is here: https://github.com/openjdk/jdk11u/commit/416c48e9d30ba4232bc9d592693992eea6819211 Hence, on Amazon Linux 2 you see the same issue as on Ubuntu 22.04: if you upgrade the JRE but don't terminate and restart all Java processes, segfaults will occur if those Java processes attempt to spawn external processes.

          Roman Zwi added a comment -

          for the records: see also discussion in the Jenkins Forum

          Roman Zwi added a comment - for the records: see also discussion in the Jenkins Forum

          Basil Crow added a comment -

          If I am understanding this ticket correctly, it appears upstream OpenJDK has made a breaking change, but once all Java runtimes are upgraded past the breaking change (including both controller and agents) then there are no issues, in which case the most we can do in the Jenkins project is to highlight this in the documentation. Are there any issues that persist after both controller and agents have been upgraded past the breaking change?

          Basil Crow added a comment - If I am understanding this ticket correctly, it appears upstream OpenJDK has made a breaking change, but once all Java runtimes are upgraded past the breaking change (including both controller and agents) then there are no issues, in which case the most we can do in the Jenkins project is to highlight this in the documentation. Are there any issues that persist after both controller and agents have been upgraded past the breaking change?

          The upstream OpenJDK change made it so a running old copy of the JRE will invoke a new copy of the jspawnhelper tool in such a way that it segfaults. When you terminate the old JRE, and re-run the agent under the new JRE, this problem is solved. So probably the easiest way is to restart any Jenkins agents, either by rebooting the machine/VM/instance they run in, or by disconnecting and reconnecting.

           

          Dimitry Andric added a comment - The upstream OpenJDK change made it so a running old copy of the JRE will invoke a new copy of the jspawnhelper tool in such a way that it segfaults. When you terminate the old JRE, and re-run the agent under the new JRE, this problem is solved. So probably the easiest way is to restart any Jenkins agents, either by rebooting the machine/VM/instance they run in, or by disconnecting and reconnecting.  

          Basil Crow added a comment -

          Thanks dimitry_unified, in that case I think the most we can do in the Jenkins project is add a note to the Upgrade Guide saying something to the effect of:

          When upgrading the Java runtime to OpenJDK 11.0.22 and later, 17.0.10 and later the Java process must be restarted. This applies to both controllers and agents: when upgrading the Java runtime on the controller, restart the Java process on the controller; when upgrading the Java runtime on the agent, restart the Java process on the agent.

          It would be nice if the OpenJDK project would make this rough edge a bit smoother, but that ship appears to have already sailed for 11.0.22 and 17.0.10 at least.

          Basil Crow added a comment - Thanks dimitry_unified , in that case I think the most we can do in the Jenkins project is add a note to the Upgrade Guide saying something to the effect of: When upgrading the Java runtime to OpenJDK 11.0.22 and later, 17.0.10 and later the Java process must be restarted. This applies to both controllers and agents: when upgrading the Java runtime on the controller, restart the Java process on the controller; when upgrading the Java runtime on the agent, restart the Java process on the agent. It would be nice if the OpenJDK project would make this rough edge a bit smoother, but that ship appears to have already sailed for 11.0.22 and 17.0.10 at least.

          Btw, note that in the Ubuntu bug I submitted (https://bugs.launchpad.net/ubuntu/+source/openjdk-17/+bug/2055280) the maintainer there has said he would bring it up with the OpenJDK people.

          I think it should not be difficult to patch the jspawnhelper tool so that it accepts both "old" and "new" invocations, because that would smoothen JRE upgrades. But obviously, this is out of the hands of the Jenkins project, and you cannot assume that all JRE distributions will get such a fix.

          Dimitry Andric added a comment - Btw, note that in the Ubuntu bug I submitted ( https://bugs.launchpad.net/ubuntu/+source/openjdk-17/+bug/2055280 ) the maintainer there has said he would bring it up with the OpenJDK people. I think it should not be difficult to patch the jspawnhelper tool so that it accepts both "old" and "new" invocations, because that would smoothen JRE upgrades. But obviously, this is out of the hands of the Jenkins project, and you cannot assume that all JRE distributions will get such a fix.

            Unassigned Unassigned
            jmayoranotandil Jonatan
            Votes:
            1 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated: