• Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • core
    • Jenkins 2.426.3

      I am experiencing an issue with all Jenkins agents after a random period of runtime (it can be an hour or even a day). The error occurs with any script that triggers the pipeline. In a more direct test, I attempted to access the script console of the specific node and executed println "uname -a".execute().text. This resulted in the following error:

      java.io.IOException: error=0, Failed to exec spawn helper: pid: 15842, signal: 11
          at java.base/java.lang.ProcessImpl.forkAndExec(Native Method)
          at java.base/java.lang.ProcessImpl.<init>(ProcessImpl.java:314)
          at java.base/java.lang.ProcessImpl.start(ProcessImpl.java:244)
          at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1110)
      Caused: java.io.IOException: Cannot run program "uname": error=0, Failed to exec spawn helper: pid: 15842, signal: 11
          at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1143)
          at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1073)
          at java.base/java.lang.Runtime.exec(Runtime.java:594)
          at java.base/java.lang.Runtime.exec(Runtime.java:418)
          at java.base/java.lang.Runtime.exec(Runtime.java:315)
          at org.codehaus.groovy.runtime.ProcessGroovyMethods.execute(ProcessGroovyMethods.java:544)
          at org.codehaus.groovy.runtime.dgm$895.invoke(Unknown Source)
          at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite$PojoMetaMethodSiteNoUnwrapNoCoerce.invoke(PojoMetaMethodSite.java:274)
          at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMetaMethodSite.java:56)
          at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
          at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
          at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:120)
          at Script1.run(Script1.groovy:1)
          at groovy.lang.GroovyShell.evaluate(GroovyShell.java:574)
          at groovy.lang.GroovyShell.evaluate(GroovyShell.java:612)
          at groovy.lang.GroovyShell.evaluate(GroovyShell.java:583)
          at hudson.util.RemotingDiagnostics$Script.call(RemotingDiagnostics.java:149)
          at hudson.util.RemotingDiagnostics$Script.call(RemotingDiagnostics.java:118)
          at hudson.remoting.UserRequest.perform(UserRequest.java:211)
          at hudson.remoting.UserRequest.perform(UserRequest.java:54)
          at hudson.remoting.Request$2.run(Request.java:377)
          at hudson.remoting.InterceptingExecutorService.lambda$wrap$0(InterceptingExecutorService.java:78)
          at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
          at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
          at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
          at java.base/java.lang.Thread.run(Thread.java:840)

       

       

      It is not directly related to the uname -a script, as the issue occurs with any script execution attempt.

      I am encountering an issue automatically at the start of the job if the node is broken.
       
      The strange thing is that if I restart Jenkins, everything works fine, with the nodes already created.

          [JENKINS-72665] Agent error before some executions

          Jonatan created issue -

          saad added a comment - - edited

          We are also experiencing this issue, and it comes and goes randomely

          Tried multiple settings like upgrading/downgrading jdk version, changing the specific commands or paths that fail to load etc..

          We cannot pinpoint the source of the issue yet and it is impacting out builds, any tip is welcomed

           

          here is an example with git failing (other times its 'helper', 'nohup'....)

          Suppressed: hudson.remoting.Channel$CallSiteStackTrace: Remote call to EC2 (ec2) - EC2 Agent Medium (i-016a23d9b7e325030)
          		at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1787)
          		at hudson.remoting.UserRequest$ExceptionResponse.retrieve(UserRequest.java:356)
          		at hudson.remoting.Channel.call(Channel.java:1003)
          		at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.execute(RemoteGitImpl.java:153)
          		at jdk.internal.reflect.GeneratedMethodAccessor810.invoke(Unknown Source)
          		at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          		at java.base/java.lang.reflect.Method.invoke(Method.java:566)
          		at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.invoke(RemoteGitImpl.java:138)
          		at com.sun.proxy.$Proxy208.execute(Unknown Source)
          		at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1222)
          		at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1305)
          ....
          Caused by: java.io.IOException: Cannot run program "git" (in directory "/home/ec2-user/workspace/XXXXXX"): error=0, Failed to exec spawn helper: pid: 16292, signal: 11
          	at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1143)
          	at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1073)
          	at hudson.Proc$LocalProc.<init>(Proc.java:252)
          	at hudson.Proc$LocalProc.<init>(Proc.java:221)
          	at hudson.Launcher$LocalLauncher.launch(Launcher.java:994)
          	at hudson.Launcher$ProcStarter.start(Launcher.java:506)
          	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:2835)
          	... 15 more
          Caused by: java.io.IOException: error=0, Failed to exec spawn helper: pid: 16292, signal: 11
          	at java.base/java.lang.ProcessImpl.forkAndExec(Native Method)
          	at java.base/java.lang.ProcessImpl.<init>(ProcessImpl.java:314)
          	at java.base/java.lang.ProcessImpl.start(ProcessImpl.java:244)
          	at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1110)

           

          saad added a comment - - edited We are also experiencing this issue, and it comes and goes randomely Tried multiple settings like upgrading/downgrading jdk version, changing the specific commands or paths that fail to load etc.. We cannot pinpoint the source of the issue yet and it is impacting out builds, any tip is welcomed   here is an example with git failing (other times its 'helper', 'nohup'....) Suppressed: hudson.remoting.Channel$CallSiteStackTrace: Remote call to EC2 (ec2) - EC2 Agent Medium (i-016a23d9b7e325030) at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1787) at hudson.remoting.UserRequest$ExceptionResponse.retrieve(UserRequest.java:356) at hudson.remoting.Channel.call(Channel.java:1003) at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.execute(RemoteGitImpl.java:153) at jdk.internal.reflect.GeneratedMethodAccessor810.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.invoke(RemoteGitImpl.java:138) at com.sun.proxy.$Proxy208.execute(Unknown Source) at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1222) at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1305) .... Caused by: java.io.IOException: Cannot run program "git" (in directory "/home/ec2-user/workspace/XXXXXX"): error=0, Failed to exec spawn helper: pid: 16292, signal: 11 at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1143) at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1073) at hudson.Proc$LocalProc.<init>(Proc.java:252) at hudson.Proc$LocalProc.<init>(Proc.java:221) at hudson.Launcher$LocalLauncher.launch(Launcher.java:994) at hudson.Launcher$ProcStarter.start(Launcher.java:506) at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:2835) ... 15 more Caused by: java.io.IOException: error=0, Failed to exec spawn helper: pid: 16292, signal: 11 at java.base/java.lang.ProcessImpl.forkAndExec(Native Method) at java.base/java.lang.ProcessImpl.<init>(ProcessImpl.java:314) at java.base/java.lang.ProcessImpl.start(ProcessImpl.java:244) at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1110)  

          Mark Waite added a comment -

          https://stackoverflow.com/questions/61301818/java-failed-to-exec-spawn-helper-error-since-moving-to-java-14-on-linux offers a suggested workaround if you are running Java 17 or Java 21. It also suggests checking the permission settings of the jspawnhelper program in the jre/lib directory.

          Mark Waite added a comment - https://stackoverflow.com/questions/61301818/java-failed-to-exec-spawn-helper-error-since-moving-to-java-14-on-linux offers a suggested workaround if you are running Java 17 or Java 21. It also suggests checking the permission settings of the jspawnhelper program in the jre/lib directory.

          saad added a comment - - edited

          The workaround seem to be working, didnt test that much yet
          I added it to JVM Options in the advanced tap of the ec2 cloud configuration for each label i have.

          in jcasc you can add this key a node template : 
          jvmopts: "-Djdk.lang.Process.launchMechanism=vfork"

          for infromation :
          I am running agents of different size (from t2small to t3 large) but provisioned the same :

          • ami sources : "amzn2-ami-hvm-2.0
          • toolings like git, docker, jdk, node etc..
          • ebs
          • hvm

          for me since the workaround fixes the problem, the main culprit is the version of java. in my case : 
          java-17-openjdk

          I only used the vfork solution, I didnt try changing the permission

          saad added a comment - - edited The workaround seem to be working, didnt test that much yet I added it to JVM Options in the advanced tap of the ec2 cloud configuration for each label i have. in jcasc you can add this key a node template :  jvmopts : "-Djdk.lang.Process.launchMechanism=vfork" for infromation : I am running agents of different size (from t2small to t3 large) but provisioned the same : ami sources : "amzn2-ami-hvm-2.0 toolings like git, docker, jdk, node etc.. ebs hvm for me since the workaround fixes the problem, the main culprit is the version of java. in my case :  java-17-openjdk I only used the vfork solution, I didnt try changing the permission

          Jonatan added a comment -

          Yes, this solution worked correctly. thank you so much

          Jonatan added a comment - Yes, this solution worked correctly. thank you so much

          Mark Waite added a comment -

          selmernssi and jmayoranotandil could you share more information about the configuration of the machines that are hosting the agents so that others who encounter the issue can more easily identify it? Are you running Amazon Linux 2, Amazon Linux 2023, or some other Linux variant on the machines where you see the problem? Are you running from a local file system, a network file system based on EFS, network based block storage, or some other form of network storage?

          Mark Waite added a comment - selmernssi and jmayoranotandil could you share more information about the configuration of the machines that are hosting the agents so that others who encounter the issue can more easily identify it? Are you running Amazon Linux 2, Amazon Linux 2023, or some other Linux variant on the machines where you see the problem? Are you running from a local file system, a network file system based on EFS, network based block storage, or some other form of network storage?

          saad added a comment -

          I added more detail to my previous comment

          saad added a comment - I added more detail to my previous comment

          Mark Waite added a comment - - edited

          Thanks selmernssi. If my guess of the Amazon Linux version is correct, then you are running a version of Amazon Linux that the Jenkins project no longer supports. Amazon Linux 2 is based on Red Hat Enterprise Linux 7. The Jenkins project stopped supporting Red Hat Enterprise Linux 7 and its derivatives in Nov, 2023. Refer to the end of life operating system blog post for more information.

          The Jenkins project continues to support Amazon Linux 2023 and many other Linux distributions.

          I suspect that the root of the problem is an issue in an interaction between Amazon Linux 2 and EBS. We had sporadic reports of issues with Red Hat Linux 7 derivatives and file permissions that unexpectedly changed. This may be another example.

          Mark Waite added a comment - - edited Thanks selmernssi . If my guess of the Amazon Linux version is correct, then you are running a version of Amazon Linux that the Jenkins project no longer supports. Amazon Linux 2 is based on Red Hat Enterprise Linux 7. The Jenkins project stopped supporting Red Hat Enterprise Linux 7 and its derivatives in Nov, 2023. Refer to the end of life operating system blog post for more information. The Jenkins project continues to support Amazon Linux 2023 and many other Linux distributions. I suspect that the root of the problem is an issue in an interaction between Amazon Linux 2 and EBS. We had sporadic reports of issues with Red Hat Linux 7 derivatives and file permissions that unexpectedly changed. This may be another example.

          Jonatan added a comment -

          Thanks mark. Indeed we use Amazon Linux 2 in the agents. We are going to update to Amazon linux 2023.

          Jonatan added a comment - Thanks mark. Indeed we use Amazon Linux 2 in the agents. We are going to update to Amazon linux 2023.

          Enda added a comment -

          We had the same issue but we are using Azure Virtual Machines for our agents. Our controller is a kubernetes pod running on AKS and everything is managed via Flux. Adding the vfork option to CasC seems to have fixed the errors

          Enda added a comment - We had the same issue but we are using Azure Virtual Machines for our agents. Our controller is a kubernetes pod running on AKS and everything is managed via Flux. Adding the vfork option to CasC seems to have fixed the errors

            Unassigned Unassigned
            jmayoranotandil Jonatan
            Votes:
            1 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated: