Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-53888

Batch step running on a node other than the master fails

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • None
    • Windows Server 2012 R2 (both master and agent)
      Jenkins 2.138.1
      Pipeline 2.6
      Durable task plugin 1.26
      Pipeline Nodes and Processes 2.22

      Running a batch command on another node takes several minutes and then fails. Attached example (all Windows, the echo command won't print):

      stage('1') {
        node('uitest') {
          bat 'echo something'
        }
      }
      

      After 10 minutes the console output prompts this and the build fails:

      ERROR: script apparently exited with code 0 but asynchronous notification was lost

      In addition the system log gets these exceptions:

      • java.lang.NoClassDefFoundError: Could not initialize class hudson.slaves.SlaveComputer
      • hudson.remoting.Channel$CallSiteStackTrace: Remote call to JNLP4-connect connection from

          [JENKINS-53888] Batch step running on a node other than the master fails

          Yoav Miles added a comment -

          Ok the problem was with the Java runtime... I reverted from 10 to 8 and everything works again.

          Yoav Miles added a comment - Ok the problem was with the Java runtime... I reverted from 10 to 8 and everything works again.

          Yoav Miles added a comment -

          JRE was 10 instead of 8

          Yoav Miles added a comment - JRE was 10 instead of 8

          Bhushan Shah added a comment -

          Despite using the JRE 8, I am still having this issue with latest jenkins upgrade.

           

          Bhushan Shah added a comment - Despite using the JRE 8, I am still having this issue with latest jenkins upgrade.  

          Éric Louvard added a comment - - edited

          Issue seen on Windows XP with Java(TM) SE Runtime Environment (build 1.8.0_121-b13).
          Jenkins 2.147
          Windows slave, version 3.7 + 3.27

          Éric Louvard added a comment - - edited Issue seen on Windows XP with Java(TM) SE Runtime Environment (build 1.8.0_121-b13). Jenkins 2.147 Windows slave, version 3.7 + 3.27

          Oleg Nenashev added a comment -

          CC svanoort jglick . Could it be related to the recent regression in Pipeline?

          Oleg Nenashev added a comment - CC svanoort jglick . Could it be related to the recent regression in Pipeline?

          mishal shah added a comment - - edited

          Started seeing this after updating Pipeline: Nodes and Processes (2.22 -> 2.24) and Durable Task Plugin (1.25 -> 1.26). 

           ERROR: script apparently exited with code 0 but asynchronous notification was lost

          mishal shah added a comment - - edited Started seeing this after updating Pipeline: Nodes and Processes (2.22 -> 2.24) and Durable Task Plugin (1.25 -> 1.26).   ERROR: script apparently exited with code 0 but asynchronous notification was lost

          Sam Van Oort added a comment -

          ericlouvard bshah Do you also have problems with Linux build agents as well? And do you see the same message in your logs as in the originally reported description?

          I don't think this relates to the recent Pipeline regression, though it could be related in some obscure way to the Controller.watch APIs added by jglick

          Sam Van Oort added a comment - ericlouvard bshah Do you also have problems with Linux build agents as well? And do you see the same message in your logs as in the originally reported description? I don't think this relates to the recent Pipeline regression, though it could be related in some obscure way to the Controller.watch APIs added by jglick

          Jesse Glick added a comment -

          The

          java.lang.NoClassDefFoundError: Could not initialize class hudson.slaves.SlaveComputer
          	at hudson.util.ProcessTree.get(ProcessTree.java:399)
          

          suggests some basic problem with the agent connection. Unfortunately due to JDK-8051847 the original problem is not available in this log. It may be displayed in the agent’s own log, which can be seen in the UI, or by installing the support-core plugin and grabbing a Support bundle. Certainly an error like that would be consistent with accidentally running the agent on an unsupported version of Java—i.e., anything but 8, as in towel’s case. Whether this has anything to do with recent plugin updates, it is hard to say. I do not see any obvious connection, and towel says that indeed there was none.

          The other commenters are probably encountering a totally unrelated issue, for which we have almost no diagnostics. If the error is reproducible but running with the JVM option -Dorg.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep.USE_WATCHING=false reliably fixes it, then that would be a clear signal that it is related. The asynchronous notification was lost error is definitely related. Is this only happening for users of bat, as opposed to sh?

          Jesse Glick added a comment - The java.lang.NoClassDefFoundError: Could not initialize class hudson.slaves.SlaveComputer at hudson.util.ProcessTree.get(ProcessTree.java:399) suggests some basic problem with the agent connection. Unfortunately due to JDK-8051847 the original problem is not available in this log. It may be displayed in the agent’s own log, which can be seen in the UI, or by installing the support-core plugin and grabbing a Support bundle. Certainly an error like that would be consistent with accidentally running the agent on an unsupported version of Java—i.e., anything but 8, as in towel ’s case. Whether this has anything to do with recent plugin updates, it is hard to say. I do not see any obvious connection, and towel says that indeed there was none. The other commenters are probably encountering a totally unrelated issue, for which we have almost no diagnostics. If the error is reproducible but running with the JVM option -Dorg.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep.USE_WATCHING=false reliably fixes it, then that would be a clear signal that it is related. The asynchronous notification was lost error is definitely related. Is this only happening for users of bat , as opposed to sh ?

          Bhushan Shah added a comment -

          For me this bug happens also with the pipeline running on linux node and with sh instead of bat.

          Bhushan Shah added a comment - For me this bug happens also with the pipeline running on linux node and with sh instead of bat.

          Jesse Glick added a comment -

          For what it is worth, I am unable to reproduce any such issue with a Windows 10 agent running a simple bat script. Possibly there are special conditions triggering it.

          Jesse Glick added a comment - For what it is worth, I am unable to reproduce any such issue with a Windows 10 agent running a simple bat script. Possibly there are special conditions triggering it.

          Jesse Glick added a comment -

          Anyone who is encountering the asynchronous notification was lost error: assuming you do not know how to reproduce the issue from scratch, please create a custom logger tracking org.jenkinsci.plugins.workflow.steps.durable_task and org.jenkinsci.plugins.durabletask at FINE and report details. Installing the support-core plugin is ideal as it allows these logs and other things to be recorded as a single ZIP file.

          Jesse Glick added a comment - Anyone who is encountering the asynchronous notification was lost error: assuming you do not know how to reproduce the issue from scratch, please create a custom logger tracking org.jenkinsci.plugins.workflow.steps.durable_task and org.jenkinsci.plugins.durabletask at FINE and report details. Installing the support-core plugin is ideal as it allows these logs and other things to be recorded as a single ZIP file.

          Jesse Glick added a comment -

          Also be sure to pick up the workflow-api 2.31 release with the purported fix of JENKINS-54073.

          Note that workflow-durable-task-step 2.25 has disabled watch mode by default, so if you have accepted that update and wish to help test this, please run with: -Dorg.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep.USE_WATCHING=true

          Jesse Glick added a comment - Also be sure to pick up the workflow-api 2.31 release with the purported fix of JENKINS-54073 . Note that workflow-durable-task-step 2.25 has disabled watch mode by default, so if you have accepted that update and wish to help test this, please run with: -Dorg.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep.USE_WATCHING=true

          About worklow-api (Pipeline API plugin) I was still in 2.30 when this problem occured. For now, I can plan to upgrade to last plugins version and re-enable watch mode next week.

          About "special conditions triggering it", in my case it's huge logs files (example in this report was working perfectly). See : https://issues.jenkins-ci.org/browse/JENKINS-54081?focusedCommentId=352021&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-352021

          Florian Ramillien added a comment - About worklow-api (Pipeline API plugin) I was still in 2.30 when this problem occured. For now, I can plan to upgrade to last plugins version and re-enable watch mode next week. About "special conditions triggering it", in my case it's huge logs files (example in this report was working perfectly). See : https://issues.jenkins-ci.org/browse/JENKINS-54081?focusedCommentId=352021&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-352021

          Jesse Glick added a comment -

          framillien

          in my case it's huge logs files

          Then you may rather have hit a symptom of JENKINS-54073, whereas other reporters (bshah, ericlouvard, shahmishal, but again excluding the false initial report by towel) seem to have hit something very different.

          Jesse Glick added a comment - framillien in my case it's huge logs files Then you may rather have hit a symptom of JENKINS-54073 , whereas other reporters ( bshah , ericlouvard , shahmishal , but again excluding the false initial report by towel ) seem to have hit something very different.

          Jesse Glick added a comment -

          Anyone still seeing this using up-to-date plugins and -Dorg.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep.USE_WATCHING=true? If so, please make sure you have custom loggers set up as in my comment of 2018-10-25 and see if the problem can be reproduced in a clean environment.

          Jesse Glick added a comment - Anyone still seeing this using up-to-date plugins and -Dorg.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep.USE_WATCHING=true ? If so, please make sure you have custom loggers set up as in my comment of 2018-10-25 and see if the problem can be reproduced in a clean environment.

          Shriram Datar added a comment -

          I am seeing the same issue with the latest Jenkins+ all up to date plugins. I am running only echo 1234 in the batch file. I have exact same logs as attached to this ticket

           

          Shriram Datar added a comment - I am seeing the same issue with the latest Jenkins+ all up to date plugins. I am running only echo 1234 in the batch file. I have exact same logs as attached to this ticket  

          milo6 added a comment -

          I change from 32 bit to 64 bit Java and Increased JVM heap size to 4GB seems to be working . 

          milo6 added a comment - I change from 32 bit to 64 bit Java and Increased JVM heap size to 4GB seems to be working . 

          Jesse Glick added a comment -

          shriramd if you are seeing the java.lang.NoClassDefFoundError: Could not initialize class hudson.slaves.SlaveComputer then, like the original reporter, this suggests that you were using an incompatible JRE for the agent—a problem which the switch to watching mode might have incidentally triggered, but not really caused. amol_malokar’s issue sounds similar—possibly the use of 32-bit Java led to some incompatibility with JNA that would up crashing a lot of class loading. Hard to know without being able to reproduce from scratch, including details of the Java installation packages used.

          Jesse Glick added a comment - shriramd if you are seeing the java.lang.NoClassDefFoundError: Could not initialize class hudson.slaves.SlaveComputer then, like the original reporter, this suggests that you were using an incompatible JRE for the agent—a problem which the switch to watching mode might have incidentally triggered, but not really caused. amol_malokar ’s issue sounds similar—possibly the use of 32-bit Java led to some incompatibility with JNA that would up crashing a lot of class loading. Hard to know without being able to reproduce from scratch, including details of the Java installation packages used.

            Unassigned Unassigned
            towel Yoav Miles
            Votes:
            7 Vote for this issue
            Watchers:
            17 Start watching this issue

              Created:
              Updated: