[JENKINS-53888] Batch step running on a node other than the master fails

Type: Bug
Resolution: Unresolved
Priority: Critical
Component/s: durable-task-plugin, workflow-durable-task-step-plugin
Labels:
None
Environment:
Windows Server 2012 R2 (both master and agent)
Jenkins 2.138.1
Pipeline 2.6
Durable task plugin 1.26
Pipeline Nodes and Processes 2.22

Similar Issues:
Powered by SuggestiMate

Show

Running a batch command on another node takes several minutes and then fails. Attached example (all Windows, the echo command won't print):

stage('1') {
  node('uitest') {
    bat 'echo something'
  }
}

After 10 minutes the console output prompts this and the build fails:

ERROR: script apparently exited with code 0 but asynchronous notification was lost

In addition the system log gets these exceptions:

java.lang.NoClassDefFoundError: Could not initialize class hudson.slaves.SlaveComputer
hudson.remoting.Channel$CallSiteStackTrace: Remote call to JNLP4-connect connection from

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

log.txt
3 kB
2018-10-04 08:51

relates to

JENKINS-52165 Use push rather than pull for durable task logging

Reopened

Yoav Miles added a comment - 2018-10-04 10:11

Ok the problem was with the Java runtime... I reverted from 10 to 8 and everything works again.

Yoav Miles added a comment - 2018-10-04 10:11 Ok the problem was with the Java runtime... I reverted from 10 to 8 and everything works again.

Yoav Miles added a comment - 2018-10-07 07:10

JRE was 10 instead of 8

Yoav Miles added a comment - 2018-10-07 07:10 JRE was 10 instead of 8

Bhushan Shah added a comment - 2018-10-15 13:20

Despite using the JRE 8, I am still having this issue with latest jenkins upgrade.

Bhushan Shah added a comment - 2018-10-15 13:20 Despite using the JRE 8, I am still having this issue with latest jenkins upgrade.

Éric Louvard added a comment - 2018-10-19 12:36 - edited

Issue seen on Windows XP with Java(TM) SE Runtime Environment (build 1.8.0_121-b13).
Jenkins 2.147
Windows slave, version 3.7 + 3.27

Éric Louvard added a comment - 2018-10-19 12:36 - edited Issue seen on Windows XP with Java(TM) SE Runtime Environment (build 1.8.0_121-b13). Jenkins 2.147 Windows slave, version 3.7 + 3.27

Oleg Nenashev added a comment - 2018-10-23 19:15

CC svanoort jglick . Could it be related to the recent regression in Pipeline?

Oleg Nenashev added a comment - 2018-10-23 19:15 CC svanoort jglick . Could it be related to the recent regression in Pipeline?

mishal shah added a comment - 2018-10-23 21:16 - edited

Started seeing this after updating Pipeline: Nodes and Processes (2.22 -> 2.24) and Durable Task Plugin (1.25 -> 1.26).

ERROR: script apparently exited with code 0 but asynchronous notification was lost

mishal shah added a comment - 2018-10-23 21:16 - edited Started seeing this after updating Pipeline: Nodes and Processes (2.22 -> 2.24) and Durable Task Plugin (1.25 -> 1.26). ERROR: script apparently exited with code 0 but asynchronous notification was lost

Sam Van Oort added a comment - 2018-10-24 07:15

ericlouvard bshah Do you also have problems with Linux build agents as well? And do you see the same message in your logs as in the originally reported description?

I don't think this relates to the recent Pipeline regression, though it could be related in some obscure way to the Controller.watch APIs added by jglick

Sam Van Oort added a comment - 2018-10-24 07:15 ericlouvard bshah Do you also have problems with Linux build agents as well? And do you see the same message in your logs as in the originally reported description? I don't think this relates to the recent Pipeline regression, though it could be related in some obscure way to the Controller.watch APIs added by jglick

Jesse Glick added a comment - 2018-10-24 13:00

The

java.lang.NoClassDefFoundError: Could not initialize class hudson.slaves.SlaveComputer
	at hudson.util.ProcessTree.get(ProcessTree.java:399)

suggests some basic problem with the agent connection. Unfortunately due to JDK-8051847 the original problem is not available in this log. It may be displayed in the agent’s own log, which can be seen in the UI, or by installing the support-core plugin and grabbing a Support bundle. Certainly an error like that would be consistent with accidentally running the agent on an unsupported version of Java—i.e., anything but 8, as in towel’s case. Whether this has anything to do with recent plugin updates, it is hard to say. I do not see any obvious connection, and towel says that indeed there was none.

The other commenters are probably encountering a totally unrelated issue, for which we have almost no diagnostics. If the error is reproducible but running with the JVM option -Dorg.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep.USE_WATCHING=false reliably fixes it, then that would be a clear signal that it is related. The asynchronous notification was lost error is definitely related. Is this only happening for users of bat, as opposed to sh?

Jesse Glick added a comment - 2018-10-24 13:00 The java.lang.NoClassDefFoundError: Could not initialize class hudson.slaves.SlaveComputer at hudson.util.ProcessTree.get(ProcessTree.java:399) suggests some basic problem with the agent connection. Unfortunately due to JDK-8051847 the original problem is not available in this log. It may be displayed in the agent’s own log, which can be seen in the UI, or by installing the support-core plugin and grabbing a Support bundle. Certainly an error like that would be consistent with accidentally running the agent on an unsupported version of Java—i.e., anything but 8, as in towel ’s case. Whether this has anything to do with recent plugin updates, it is hard to say. I do not see any obvious connection, and towel says that indeed there was none. The other commenters are probably encountering a totally unrelated issue, for which we have almost no diagnostics. If the error is reproducible but running with the JVM option -Dorg.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep.USE_WATCHING=false reliably fixes it, then that would be a clear signal that it is related. The asynchronous notification was lost error is definitely related. Is this only happening for users of bat , as opposed to sh ?

Bhushan Shah added a comment - 2018-10-24 13:02

For me this bug happens also with the pipeline running on linux node and with sh instead of bat.

Bhushan Shah added a comment - 2018-10-24 13:02 For me this bug happens also with the pipeline running on linux node and with sh instead of bat.

Jesse Glick added a comment - 2018-10-24 13:33

For what it is worth, I am unable to reproduce any such issue with a Windows 10 agent running a simple bat script. Possibly there are special conditions triggering it.

Jesse Glick added a comment - 2018-10-24 13:33 For what it is worth, I am unable to reproduce any such issue with a Windows 10 agent running a simple bat script. Possibly there are special conditions triggering it.

Jesse Glick added a comment - 2018-10-25 12:40

Anyone who is encountering the asynchronous notification was lost error: assuming you do not know how to reproduce the issue from scratch, please create a custom logger tracking org.jenkinsci.plugins.workflow.steps.durable_task and org.jenkinsci.plugins.durabletask at FINE and report details. Installing the support-core plugin is ideal as it allows these logs and other things to be recorded as a single ZIP file.

Jesse Glick added a comment - 2018-10-25 12:40 Anyone who is encountering the asynchronous notification was lost error: assuming you do not know how to reproduce the issue from scratch, please create a custom logger tracking org.jenkinsci.plugins.workflow.steps.durable_task and org.jenkinsci.plugins.durabletask at FINE and report details. Installing the support-core plugin is ideal as it allows these logs and other things to be recorded as a single ZIP file.

Jesse Glick added a comment - 2018-10-26 20:34

Also be sure to pick up the workflow-api 2.31 release with the purported fix of ~~JENKINS-54073~~.

Note that workflow-durable-task-step 2.25 has disabled watch mode by default, so if you have accepted that update and wish to help test this, please run with: -Dorg.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep.USE_WATCHING=true

Jesse Glick added a comment - 2018-10-26 20:34 Also be sure to pick up the workflow-api 2.31 release with the purported fix of JENKINS-54073 . Note that workflow-durable-task-step 2.25 has disabled watch mode by default, so if you have accepted that update and wish to help test this, please run with: -Dorg.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep.USE_WATCHING=true

Florian Ramillien added a comment - 2018-10-30 10:26

About worklow-api (Pipeline API plugin) I was still in 2.30 when this problem occured. For now, I can plan to upgrade to last plugins version and re-enable watch mode next week.

About "special conditions triggering it", in my case it's huge logs files (example in this report was working perfectly). See : https://issues.jenkins-ci.org/browse/JENKINS-54081?focusedCommentId=352021&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-352021

Florian Ramillien added a comment - 2018-10-30 10:26 About worklow-api (Pipeline API plugin) I was still in 2.30 when this problem occured. For now, I can plan to upgrade to last plugins version and re-enable watch mode next week. About "special conditions triggering it", in my case it's huge logs files (example in this report was working perfectly). See : https://issues.jenkins-ci.org/browse/JENKINS-54081?focusedCommentId=352021&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-352021

Jesse Glick added a comment - 2018-10-30 12:56

framillien

in my case it's huge logs files

Then you may rather have hit a symptom of ~~JENKINS-54073~~, whereas other reporters (bshah, ericlouvard, shahmishal, but again excluding the false initial report by towel) seem to have hit something very different.

Jesse Glick added a comment - 2018-10-30 12:56 framillien in my case it's huge logs files Then you may rather have hit a symptom of JENKINS-54073 , whereas other reporters ( bshah , ericlouvard , shahmishal , but again excluding the false initial report by towel ) seem to have hit something very different.

Jesse Glick added a comment - 2018-11-13 15:11

Anyone still seeing this using up-to-date plugins and -Dorg.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep.USE_WATCHING=true? If so, please make sure you have custom loggers set up as in my comment of 2018-10-25 and see if the problem can be reproduced in a clean environment.

Jesse Glick added a comment - 2018-11-13 15:11 Anyone still seeing this using up-to-date plugins and -Dorg.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep.USE_WATCHING=true ? If so, please make sure you have custom loggers set up as in my comment of 2018-10-25 and see if the problem can be reproduced in a clean environment.

Shriram Datar added a comment - 2018-11-27 06:37

I am seeing the same issue with the latest Jenkins+ all up to date plugins. I am running only echo 1234 in the batch file. I have exact same logs as attached to this ticket

Shriram Datar added a comment - 2018-11-27 06:37 I am seeing the same issue with the latest Jenkins+ all up to date plugins. I am running only echo 1234 in the batch file. I have exact same logs as attached to this ticket

milo6 added a comment - 2018-11-27 06:50

I change from 32 bit to 64 bit Java and Increased JVM heap size to 4GB seems to be working .

milo6 added a comment - 2018-11-27 06:50 I change from 32 bit to 64 bit Java and Increased JVM heap size to 4GB seems to be working .

Jesse Glick added a comment - 2018-11-27 14:44

shriramd if you are seeing the java.lang.NoClassDefFoundError: Could not initialize class hudson.slaves.SlaveComputer then, like the original reporter, this suggests that you were using an incompatible JRE for the agent—a problem which the switch to watching mode might have incidentally triggered, but not really caused. amol_malokar’s issue sounds similar—possibly the use of 32-bit Java led to some incompatibility with JNA that would up crashing a lot of class loading. Hard to know without being able to reproduce from scratch, including details of the Java installation packages used.

Jesse Glick added a comment - 2018-11-27 14:44 shriramd if you are seeing the java.lang.NoClassDefFoundError: Could not initialize class hudson.slaves.SlaveComputer then, like the original reporter, this suggests that you were using an incompatible JRE for the agent—a problem which the switch to watching mode might have incidentally triggered, but not really caused. amol_malokar ’s issue sounds similar—possibly the use of 32-bit Java led to some incompatibility with JNA that would up crashing a lot of class loading. Hard to know without being able to reproduce from scratch, including details of the Java installation packages used.

Assignee:: Unassigned

Reporter:: Yoav Miles

Votes:: 7 Vote for this issue

Watchers:: 17 Start watching this issue

Created:: 2018-10-03 15:49

Updated:: 2018-11-27 14:44

Jenkins

Details

Description

Attachments

Attachments

Issue Links

Activity

Collapse comment: Yoav Miles added a comment - 2018-10-04 10:11

Expand comment: Yoav Miles added a comment - 2018-10-04 10:11

Collapse comment: Yoav Miles added a comment - 2018-10-07 07:10

Expand comment: Yoav Miles added a comment - 2018-10-07 07:10

Collapse comment: Bhushan Shah added a comment - 2018-10-15 13:20

Expand comment: Bhushan Shah added a comment - 2018-10-15 13:20

Collapse comment: Éric Louvard added a comment - 2018-10-19 12:36, Edited by Éric Louvard - 2018-10-22 06:49

Expand comment: Éric Louvard added a comment - 2018-10-19 12:36, Edited by Éric Louvard - 2018-10-22 06:49

Collapse comment: Oleg Nenashev added a comment - 2018-10-23 19:15

Expand comment: Oleg Nenashev added a comment - 2018-10-23 19:15

Collapse comment: mishal shah added a comment - 2018-10-23 21:16, Edited by mishal shah - 2018-10-23 21:18

Expand comment: mishal shah added a comment - 2018-10-23 21:16, Edited by mishal shah - 2018-10-23 21:18

Collapse comment: Sam Van Oort added a comment - 2018-10-24 07:15

Expand comment: Sam Van Oort added a comment - 2018-10-24 07:15

Collapse comment: Jesse Glick added a comment - 2018-10-24 13:00

Expand comment: Jesse Glick added a comment - 2018-10-24 13:00

Collapse comment: Bhushan Shah added a comment - 2018-10-24 13:02

Expand comment: Bhushan Shah added a comment - 2018-10-24 13:02

Collapse comment: Jesse Glick added a comment - 2018-10-24 13:33

Expand comment: Jesse Glick added a comment - 2018-10-24 13:33

Collapse comment: Jesse Glick added a comment - 2018-10-25 12:40

Expand comment: Jesse Glick added a comment - 2018-10-25 12:40

Collapse comment: Jesse Glick added a comment - 2018-10-26 20:34

Expand comment: Jesse Glick added a comment - 2018-10-26 20:34

Collapse comment: Florian Ramillien added a comment - 2018-10-30 10:26

Expand comment: Florian Ramillien added a comment - 2018-10-30 10:26

Collapse comment: Jesse Glick added a comment - 2018-10-30 12:56

Expand comment: Jesse Glick added a comment - 2018-10-30 12:56

Collapse comment: Jesse Glick added a comment - 2018-11-13 15:11

Expand comment: Jesse Glick added a comment - 2018-11-13 15:11

Collapse comment: Shriram Datar added a comment - 2018-11-27 06:37

Expand comment: Shriram Datar added a comment - 2018-11-27 06:37

Collapse comment: milo6 added a comment - 2018-11-27 06:50

Expand comment: milo6 added a comment - 2018-11-27 06:50

Collapse comment: Jesse Glick added a comment - 2018-11-27 14:44

Expand comment: Jesse Glick added a comment - 2018-11-27 14:44

People

Dates