Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-60107

durabletask v1.33 - process apparently never started in /var/lib/jenkins/workspace/local-cloud-regression-test@tmp/durable-fa896164

    XMLWordPrintable

Details

    Description

      As mentioned in this bug I wrote some days ago:

      https://issues.jenkins-ci.org/browse/JENKINS-59838

       

      I have updated Durable Task Plugin to v1.33 but I am still facing the same issue. 

      I have captured the logs on "System Log"

      The output I get from that log does mean to much to me:

       

      Here the error appeared on the job console:

       

      And the "system log" changed to:

       

       

      This is being a great headache because I always get that error.

       

       

       

       

       

       

      Attachments

        1. image-2019-11-08-14-16-55-929.png
          75 kB
          Jon Udaondo
        2. image-2019-11-08-14-17-40-455.png
          254 kB
          Jon Udaondo
        3. image-2019-11-08-14-18-42-774.png
          46 kB
          Jon Udaondo
        4. image-2019-11-08-14-19-32-129.png
          286 kB
          Jon Udaondo

        Activity

          carroll Carroll Chiou added a comment -

          Thanks for including the logs. In the future it would be better to include the actual logs next time and not screenshots. It makes it easier to actually read as well as guarantees that nothing has been left out.

          So, looking at JENKINS-59838, the problem might very well be that you are executing an extremely long shell step that is not giving any output because it is over `ssh`. That is most likely tricking the shell wrapper into thinking that the shell step is not running?
          If this is indeed the problem, I can think of two solutions:

          1. Enable the binary wrapper by passing the system property org.jenkinsci.plugins.durabletask.BourneShellScript.FORCE_BINARY_WRAPPER=true
          2. Switch to the SSH plugin
          carroll Carroll Chiou added a comment - Thanks for including the logs. In the future it would be better to include the actual logs next time and not screenshots. It makes it easier to actually read as well as guarantees that nothing has been left out. So, looking at JENKINS-59838 , the problem might very well be that you are executing an extremely long shell step that is not giving any output because it is over `ssh`. That is most likely tricking the shell wrapper into thinking that the shell step is not running? If this is indeed the problem, I can think of two solutions: Enable the binary wrapper by passing the system property org.jenkinsci.plugins.durabletask.BourneShellScript.FORCE_BINARY_WRAPPER=true Switch to the SSH plugin
          dnusbaum Devin Nusbaum added a comment -

          joudaon Please send an email to the jenkinsci-users mailing list for a general investigation like this. In your email, add the output from once you enable LAUNCH_DIAGNOSTICS, and some details about whether this was working for you previously, and if so with what version of the plugin and Jenkins, what version of Jenkins you are running, what OS you are using on the Jenkins master and the agent being used by the shell step, how the agent is configured, what other configurations you have tried, etc.

          Basic guidance on how to enable LAUNCH_DIAGNOSTICS is here. The property you want is named org.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS, and you want to set it to true. Exactly where the system property is specified will depend on how you run Jenkins. From your last ticket, it looks like you are running it as a service, so you want to configure the Java arguments in the service configuration. Once you have it set correctly, the message "running Jenkins temporarily with -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true ..." will no longer be present in the build log, and hopefully the actual error will be printed to the build log for the failing build, which you should include in your email.

          I am going to close this issue for now, if it looks like there is a bug after you provide more info on the mailing list, we’ll reopen the ticket, so please do not open another ticket for your issue.

          dnusbaum Devin Nusbaum added a comment - joudaon Please send an email to the jenkinsci-users mailing list for a general investigation like this. In your email, add the output from once you enable LAUNCH_DIAGNOSTICS, and some details about whether this was working for you previously, and if so with what version of the plugin and Jenkins, what version of Jenkins you are running, what OS you are using on the Jenkins master and the agent being used by the shell step, how the agent is configured, what other configurations you have tried, etc. Basic guidance on how to enable LAUNCH_DIAGNOSTICS is here . The property you want is named org.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS , and you want to set it to true . Exactly where the system property is specified will depend on how you run Jenkins. From your last ticket, it looks like you are running it as a service, so you want to configure the Java arguments in the service configuration. Once you have it set correctly, the message "running Jenkins temporarily with -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true ..." will no longer be present in the build log, and hopefully the actual error will be printed to the build log for the failing build, which you should include in your email. I am going to close this issue for now, if it looks like there is a bug after you provide more info on the mailing list, we’ll reopen the ticket, so please do not open another ticket for your issue.
          joudaon Jon Udaondo added a comment - - edited

          dnusbaum I sent and email to jenkins-ci-users mailing list and get no response.

          https://groups.google.com/forum/?hl=es#!searchin/jenkinsci-users/durable$20task%7Csort:date/jenkinsci-users/8knTNDW4jf8/BCmL5rlgAwAJ

          I also set "-Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true" but the error is not printed on the build log...

          I my /etc/default/jenkins file I have the following:

          # Allow graphs etc. to work even when an X server is present
          JAVA_ARGS="-Xmx2048m -XX:MaxPermSize=512m -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.FORCE_BINARY_WRAPPER=true -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true -Dfile.encoding=UTF8"
          
          joudaon Jon Udaondo added a comment - - edited dnusbaum I sent and email to jenkins-ci-users mailing list and get no response. https://groups.google.com/forum/?hl=es#!searchin/jenkinsci-users/durable$20task%7Csort:date/jenkinsci-users/8knTNDW4jf8/BCmL5rlgAwAJ I also set "-Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true" but the error is not printed on the build log... I my /etc/default/jenkins file I have the following: # Allow graphs etc. to work even when an X server is present JAVA_ARGS= "-Xmx2048m -XX:MaxPermSize=512m -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.FORCE_BINARY_WRAPPER= true -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS= true -Dfile.encoding=UTF8"
          joudaon Jon Udaondo added a comment - - edited

          dnusbaum

          The issue is still there...

          It happened in another job while using "terraform destroy -f". The console output of the job displays the following..

          00:00:26.617  module.redis02.vsphere_virtual_machine.vm: Destroying... [id=42068f3c-c1f8-70e9-4262-0f63e7c9c77e]
          00:00:26.617  module.cassandra01.vsphere_virtual_machine.vm: Destroying... [id=4206a140-090b-3066-35ba-71c30007b204]
          00:05:18.718  process apparently never started in /var/lib/jenkins/workspace/destroy-cloud-environment@tmp/durable-3ee2246a
          [Pipeline] }
          00:05:18.748  ERROR: script returned exit code -2
          

          Below that, no more output about the error is deplayed despite having "-Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true" enabled.

           

          joudaon Jon Udaondo added a comment - - edited dnusbaum The issue is still there... It happened in another job while using "terraform destroy -f". The console output of the job displays the following.. 00:00:26.617 module.redis02.vsphere_virtual_machine.vm: Destroying... [id=42068f3c-c1f8-70e9-4262-0f63e7c9c77e] 00:00:26.617 module.cassandra01.vsphere_virtual_machine.vm: Destroying... [id=4206a140-090b-3066-35ba-71c30007b204] 00:05:18.718 process apparently never started in / var /lib/jenkins/workspace/destroy-cloud-environment@tmp/durable-3ee2246a [Pipeline] } 00:05:18.748 ERROR: script returned exit code -2 Below that, no more output about the error is deplayed despite having "-Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true" enabled.  
          dnusbaum Devin Nusbaum added a comment - - edited

          00:00:26.617 module.redis02.vsphere_virtual_machine.vm: Destroying... [id=42068f3c-c1f8-70e9-4262-0f63e7c9c77e]
          00:00:26.617 module.cassandra01.vsphere_virtual_machine.vm: Destroying... [id=4206a140-090b-3066-35ba-71c30007b204]

          Is this output from the script you passed to sh? If so, I am a bit curious about the timestamps. It looks like it ran for 5 hours, and then failed with "process apparently never started"? Can you confirm that downgrading to Durable Task Plugin 1.30 (you will also need to downgrade Pipeline: Nodes and Processes Plugin to 2.34 if you have already updated to 2.35) makes these problems go away?

          dnusbaum Devin Nusbaum added a comment - - edited 00:00:26.617 module.redis02.vsphere_virtual_machine.vm: Destroying... [id=42068f3c-c1f8-70e9-4262-0f63e7c9c77e] 00:00:26.617 module.cassandra01.vsphere_virtual_machine.vm: Destroying... [id=4206a140-090b-3066-35ba-71c30007b204] Is this output from the script you passed to sh ? If so, I am a bit curious about the timestamps. It looks like it ran for 5 hours, and then failed with "process apparently never started"? Can you confirm that downgrading to Durable Task Plugin 1.30 (you will also need to downgrade Pipeline: Nodes and Processes Plugin to 2.34 if you have already updated to 2.35) makes these problems go away?

          People

            Unassigned Unassigned
            joudaon Jon Udaondo
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: