[JENKINS-72665] Agent error before some executions

Type: Bug
Resolution: Unresolved
Priority: Major
Component/s: core
Labels:
- agents
- ec2-plugin
- node
Environment:
Jenkins 2.426.3

Similar Issues:
Powered by SuggestiMate

Show

I am experiencing an issue with all Jenkins agents after a random period of runtime (it can be an hour or even a day). The error occurs with any script that triggers the pipeline. In a more direct test, I attempted to access the script console of the specific node and executed println "uname -a".execute().text. This resulted in the following error:

java.io.IOException: error=0, Failed to exec spawn helper: pid: 15842, signal: 11
at java.base/java.lang.ProcessImpl.forkAndExec(Native Method)
at java.base/java.lang.ProcessImpl.<init>(ProcessImpl.java:314)
at java.base/java.lang.ProcessImpl.start(ProcessImpl.java:244)
at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1110)
Caused: java.io.IOException: Cannot run program "uname": error=0, Failed to exec spawn helper: pid: 15842, signal: 11
at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1143)
at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1073)
at java.base/java.lang.Runtime.exec(Runtime.java:594)
at java.base/java.lang.Runtime.exec(Runtime.java:418)
at java.base/java.lang.Runtime.exec(Runtime.java:315)
at org.codehaus.groovy.runtime.ProcessGroovyMethods.execute(ProcessGroovyMethods.java:544)
at org.codehaus.groovy.runtime.dgm$895.invoke(Unknown Source)
at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite$PojoMetaMethodSiteNoUnwrapNoCoerce.invoke(PojoMetaMethodSite.java:274)
at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMetaMethodSite.java:56)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:120)
at Script1.run(Script1.groovy:1)
at groovy.lang.GroovyShell.evaluate(GroovyShell.java:574)
at groovy.lang.GroovyShell.evaluate(GroovyShell.java:612)
at groovy.lang.GroovyShell.evaluate(GroovyShell.java:583)
at hudson.util.RemotingDiagnostics$Script.call(RemotingDiagnostics.java:149)
at hudson.util.RemotingDiagnostics$Script.call(RemotingDiagnostics.java:118)
at hudson.remoting.UserRequest.perform(UserRequest.java:211)
at hudson.remoting.UserRequest.perform(UserRequest.java:54)
at hudson.remoting.Request$2.run(Request.java:377)
at hudson.remoting.InterceptingExecutorService.lambda$wrap$0(InterceptingExecutorService.java:78)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)

It is not directly related to the uname -a script, as the issue occurs with any script execution attempt.

I am encountering an issue automatically at the start of the job if the node is broken.

The strange thing is that if I restart Jenkins, everything works fine, with the nodes already created.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

Captura de pantalla 2024-02-05 a la(s) 17.06.25.png
501 kB
2024-02-05 20:07
Captura de pantalla 2024-02-05 a la(s) 17.06.56.png
499 kB
2024-02-05 20:07

Jonatan created issue - 2024-02-05 20:07

saad added a comment - 2024-02-07 11:30 - edited

We are also experiencing this issue, and it comes and goes randomely

Tried multiple settings like upgrading/downgrading jdk version, changing the specific commands or paths that fail to load etc..

We cannot pinpoint the source of the issue yet and it is impacting out builds, any tip is welcomed

here is an example with git failing (other times its 'helper', 'nohup'....)

Suppressed: hudson.remoting.Channel$CallSiteStackTrace: Remote call to EC2 (ec2) - EC2 Agent Medium (i-016a23d9b7e325030)
		at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1787)
		at hudson.remoting.UserRequest$ExceptionResponse.retrieve(UserRequest.java:356)
		at hudson.remoting.Channel.call(Channel.java:1003)
		at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.execute(RemoteGitImpl.java:153)
		at jdk.internal.reflect.GeneratedMethodAccessor810.invoke(Unknown Source)
		at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
		at java.base/java.lang.reflect.Method.invoke(Method.java:566)
		at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.invoke(RemoteGitImpl.java:138)
		at com.sun.proxy.$Proxy208.execute(Unknown Source)
		at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1222)
		at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1305)
....
Caused by: java.io.IOException: Cannot run program "git" (in directory "/home/ec2-user/workspace/XXXXXX"): error=0, Failed to exec spawn helper: pid: 16292, signal: 11
	at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1143)
	at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1073)
	at hudson.Proc$LocalProc.<init>(Proc.java:252)
	at hudson.Proc$LocalProc.<init>(Proc.java:221)
	at hudson.Launcher$LocalLauncher.launch(Launcher.java:994)
	at hudson.Launcher$ProcStarter.start(Launcher.java:506)
	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:2835)
	... 15 more
Caused by: java.io.IOException: error=0, Failed to exec spawn helper: pid: 16292, signal: 11
	at java.base/java.lang.ProcessImpl.forkAndExec(Native Method)
	at java.base/java.lang.ProcessImpl.<init>(ProcessImpl.java:314)
	at java.base/java.lang.ProcessImpl.start(ProcessImpl.java:244)
	at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1110)

saad added a comment - 2024-02-07 11:30 - edited We are also experiencing this issue, and it comes and goes randomely Tried multiple settings like upgrading/downgrading jdk version, changing the specific commands or paths that fail to load etc.. We cannot pinpoint the source of the issue yet and it is impacting out builds, any tip is welcomed here is an example with git failing (other times its 'helper', 'nohup'....) Suppressed: hudson.remoting.Channel$CallSiteStackTrace: Remote call to EC2 (ec2) - EC2 Agent Medium (i-016a23d9b7e325030) at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1787) at hudson.remoting.UserRequest$ExceptionResponse.retrieve(UserRequest.java:356) at hudson.remoting.Channel.call(Channel.java:1003) at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.execute(RemoteGitImpl.java:153) at jdk.internal.reflect.GeneratedMethodAccessor810.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.invoke(RemoteGitImpl.java:138) at com.sun.proxy.$Proxy208.execute(Unknown Source) at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1222) at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1305) .... Caused by: java.io.IOException: Cannot run program "git" (in directory "/home/ec2-user/workspace/XXXXXX"): error=0, Failed to exec spawn helper: pid: 16292, signal: 11 at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1143) at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1073) at hudson.Proc$LocalProc.<init>(Proc.java:252) at hudson.Proc$LocalProc.<init>(Proc.java:221) at hudson.Launcher$LocalLauncher.launch(Launcher.java:994) at hudson.Launcher$ProcStarter.start(Launcher.java:506) at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:2835) ... 15 more Caused by: java.io.IOException: error=0, Failed to exec spawn helper: pid: 16292, signal: 11 at java.base/java.lang.ProcessImpl.forkAndExec(Native Method) at java.base/java.lang.ProcessImpl.<init>(ProcessImpl.java:314) at java.base/java.lang.ProcessImpl.start(ProcessImpl.java:244) at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1110)

Mark Waite added a comment - 2024-02-07 12:53

https://stackoverflow.com/questions/61301818/java-failed-to-exec-spawn-helper-error-since-moving-to-java-14-on-linux offers a suggested workaround if you are running Java 17 or Java 21. It also suggests checking the permission settings of the jspawnhelper program in the jre/lib directory.

Mark Waite added a comment - 2024-02-07 12:53 https://stackoverflow.com/questions/61301818/java-failed-to-exec-spawn-helper-error-since-moving-to-java-14-on-linux offers a suggested workaround if you are running Java 17 or Java 21. It also suggests checking the permission settings of the jspawnhelper program in the jre/lib directory.

saad added a comment - 2024-02-09 10:40 - edited

The workaround seem to be working, didnt test that much yet
I added it to JVM Options in the advanced tap of the ec2 cloud configuration for each label i have.

in jcasc you can add this key a node template :
jvmopts: "-Djdk.lang.Process.launchMechanism=vfork"

for infromation :
I am running agents of different size (from t2small to t3 large) but provisioned the same :

ami sources : "amzn2-ami-hvm-2.0
toolings like git, docker, jdk, node etc..
ebs
hvm

for me since the workaround fixes the problem, the main culprit is the version of java. in my case :
java-17-openjdk

I only used the vfork solution, I didnt try changing the permission

saad added a comment - 2024-02-09 10:40 - edited The workaround seem to be working, didnt test that much yet I added it to JVM Options in the advanced tap of the ec2 cloud configuration for each label i have. in jcasc you can add this key a node template : jvmopts : "-Djdk.lang.Process.launchMechanism=vfork" for infromation : I am running agents of different size (from t2small to t3 large) but provisioned the same : ami sources : "amzn2-ami-hvm-2.0 toolings like git, docker, jdk, node etc.. ebs hvm for me since the workaround fixes the problem, the main culprit is the version of java. in my case : java-17-openjdk I only used the vfork solution, I didnt try changing the permission

Jonatan added a comment - 2024-02-09 11:29

Yes, this solution worked correctly. thank you so much

Jonatan added a comment - 2024-02-09 11:29 Yes, this solution worked correctly. thank you so much

Mark Waite added a comment - 2024-02-09 11:55

selmernssi and jmayoranotandil could you share more information about the configuration of the machines that are hosting the agents so that others who encounter the issue can more easily identify it? Are you running Amazon Linux 2, Amazon Linux 2023, or some other Linux variant on the machines where you see the problem? Are you running from a local file system, a network file system based on EFS, network based block storage, or some other form of network storage?

Mark Waite added a comment - 2024-02-09 11:55 selmernssi and jmayoranotandil could you share more information about the configuration of the machines that are hosting the agents so that others who encounter the issue can more easily identify it? Are you running Amazon Linux 2, Amazon Linux 2023, or some other Linux variant on the machines where you see the problem? Are you running from a local file system, a network file system based on EFS, network based block storage, or some other form of network storage?

saad added a comment - 2024-02-09 12:24

I added more detail to my previous comment

saad added a comment - 2024-02-09 12:24 I added more detail to my previous comment

Mark Waite added a comment - 2024-02-09 13:12 - edited

Thanks selmernssi. If my guess of the Amazon Linux version is correct, then you are running a version of Amazon Linux that the Jenkins project no longer supports. Amazon Linux 2 is based on Red Hat Enterprise Linux 7. The Jenkins project stopped supporting Red Hat Enterprise Linux 7 and its derivatives in Nov, 2023. Refer to the end of life operating system blog post for more information.

The Jenkins project continues to support Amazon Linux 2023 and many other Linux distributions.

I suspect that the root of the problem is an issue in an interaction between Amazon Linux 2 and EBS. We had sporadic reports of issues with Red Hat Linux 7 derivatives and file permissions that unexpectedly changed. This may be another example.

Mark Waite added a comment - 2024-02-09 13:12 - edited Thanks selmernssi . If my guess of the Amazon Linux version is correct, then you are running a version of Amazon Linux that the Jenkins project no longer supports. Amazon Linux 2 is based on Red Hat Enterprise Linux 7. The Jenkins project stopped supporting Red Hat Enterprise Linux 7 and its derivatives in Nov, 2023. Refer to the end of life operating system blog post for more information. The Jenkins project continues to support Amazon Linux 2023 and many other Linux distributions. I suspect that the root of the problem is an issue in an interaction between Amazon Linux 2 and EBS. We had sporadic reports of issues with Red Hat Linux 7 derivatives and file permissions that unexpectedly changed. This may be another example.

Jonatan added a comment - 2024-02-09 13:50

Thanks mark. Indeed we use Amazon Linux 2 in the agents. We are going to update to Amazon linux 2023.

Jonatan added a comment - 2024-02-09 13:50 Thanks mark. Indeed we use Amazon Linux 2 in the agents. We are going to update to Amazon linux 2023.

Enda added a comment - 2024-02-28 12:42

We had the same issue but we are using Azure Virtual Machines for our agents. Our controller is a kubernetes pod running on AKS and everything is managed via Flux. Adding the vfork option to CasC seems to have fixed the errors

Enda added a comment - 2024-02-28 12:42 We had the same issue but we are using Azure Virtual Machines for our agents. Our controller is a kubernetes pod running on AKS and everything is managed via Flux. Adding the vfork option to CasC seems to have fixed the errors

Assignee:: Unassigned

Reporter:: Jonatan

Votes:: 1 Vote for this issue

Watchers:: 9 Start watching this issue

Created:: 2024-02-05 20:07

Updated:: 2024-03-06 08:28

Jenkins

Details

Description

Attachments

Attachments

Activity

Collapse comment: saad added a comment - 2024-02-07 11:30, Edited by saad - 2024-02-07 11:34

Expand comment: saad added a comment - 2024-02-07 11:30, Edited by saad - 2024-02-07 11:34

Collapse comment: Mark Waite added a comment - 2024-02-07 12:53

Expand comment: Mark Waite added a comment - 2024-02-07 12:53

Collapse comment: saad added a comment - 2024-02-09 10:40, Edited by saad - 2024-02-09 12:25

Expand comment: saad added a comment - 2024-02-09 10:40, Edited by saad - 2024-02-09 12:25

Collapse comment: Jonatan added a comment - 2024-02-09 11:29

Expand comment: Jonatan added a comment - 2024-02-09 11:29

Collapse comment: Mark Waite added a comment - 2024-02-09 11:55

Expand comment: Mark Waite added a comment - 2024-02-09 11:55

Collapse comment: saad added a comment - 2024-02-09 12:24

Expand comment: saad added a comment - 2024-02-09 12:24

Collapse comment: Mark Waite added a comment - 2024-02-09 13:12, Edited by Mark Waite - 2024-02-09 13:14

Expand comment: Mark Waite added a comment - 2024-02-09 13:12, Edited by Mark Waite - 2024-02-09 13:14

Collapse comment: Jonatan added a comment - 2024-02-09 13:50

Expand comment: Jonatan added a comment - 2024-02-09 13:50

Collapse comment: Enda added a comment - 2024-02-28 12:42

Expand comment: Enda added a comment - 2024-02-28 12:42

People

Dates