Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-31314

Running asynchronous code inside a @NonCPS method should fail cleanly

    • workflow-cps 2.71

      User-visible bit: throws an error

       

      Currently if you run a step with asynchronous execution (which is to say, almost all of them, since AbstractSynchronousNonBlockingStepExecution counts) inside a method marked @NonCPS, or otherwise not transformed (for example because of JENKINS-26481), you get bizarre results: the return value of the step becomes the return value of the method, regardless of what was supposed to happen in the rest of the method. This is because CpsCallableInvocation is thrown from DSL.invokeMethod and interrupts the method's execution.

      This is very confusing behavior. It would be better for CpsCallableInvocation.<init> to somehow detect that it is inside an untransformed method and throw a comprehensible error.

          [JENKINS-31314] Running asynchronous code inside a @NonCPS method should fail cleanly

          Code changed in jenkins
          User: Jesse Glick
          Path:
          aggregator/src/test/java/org/jenkinsci/plugins/workflow/SerializationTest.java
          http://jenkins-ci.org/commit/workflow-plugin/e5d9b321b7490d811cf3db5e4e93630c2f912412
          Log:
          JENKINS-31314 Demonstrating problem in test.
          The sample flow cannot possibly work, yet it runs without errors, just weird behavior.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: aggregator/src/test/java/org/jenkinsci/plugins/workflow/SerializationTest.java http://jenkins-ci.org/commit/workflow-plugin/e5d9b321b7490d811cf3db5e4e93630c2f912412 Log: JENKINS-31314 Demonstrating problem in test. The sample flow cannot possibly work, yet it runs without errors, just weird behavior.

          Code changed in jenkins
          User: Jesse Glick
          Path:
          aggregator/src/test/java/org/jenkinsci/plugins/workflow/SerializationTest.java
          http://jenkins-ci.org/commit/workflow-plugin/59e6cf394ba79dea4db3dd2b99b7cca891341f16
          Log:
          Merge pull request #233 from jglick/test-JENKINS-31314

          JENKINS-31314 Demonstrating problem in test

          Compare: https://github.com/jenkinsci/workflow-plugin/compare/1b83632c5b8d...59e6cf394ba7

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: aggregator/src/test/java/org/jenkinsci/plugins/workflow/SerializationTest.java http://jenkins-ci.org/commit/workflow-plugin/59e6cf394ba79dea4db3dd2b99b7cca891341f16 Log: Merge pull request #233 from jglick/test- JENKINS-31314 JENKINS-31314 Demonstrating problem in test Compare: https://github.com/jenkinsci/workflow-plugin/compare/1b83632c5b8d...59e6cf394ba7

          Code changed in jenkins
          User: Jesse Glick
          Path:
          aggregator/src/test/java/org/jenkinsci/plugins/workflow/SerializationTest.java
          http://jenkins-ci.org/commit/workflow-cps-plugin/7a03a4bcf2c08fa05afb7c5dd39ed73b28fef48c
          Log:
          JENKINS-31314 Demonstrating problem in test.
          The sample flow cannot possibly work, yet it runs without errors, just weird behavior.
          Originally-Committed-As: e5d9b321b7490d811cf3db5e4e93630c2f912412

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: aggregator/src/test/java/org/jenkinsci/plugins/workflow/SerializationTest.java http://jenkins-ci.org/commit/workflow-cps-plugin/7a03a4bcf2c08fa05afb7c5dd39ed73b28fef48c Log: JENKINS-31314 Demonstrating problem in test. The sample flow cannot possibly work, yet it runs without errors, just weird behavior. Originally-Committed-As: e5d9b321b7490d811cf3db5e4e93630c2f912412

          Code changed in jenkins
          User: Jesse Glick
          Path:
          aggregator/src/test/java/org/jenkinsci/plugins/workflow/SerializationTest.java
          http://jenkins-ci.org/commit/workflow-cps-plugin/f1b67bb381d1e55ccfe494c80e9d7ec3037d3e71
          Log:
          Merge pull request #233 from jglick/test-JENKINS-31314

          JENKINS-31314 Demonstrating problem in test
          Originally-Committed-As: 59e6cf394ba79dea4db3dd2b99b7cca891341f16

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: aggregator/src/test/java/org/jenkinsci/plugins/workflow/SerializationTest.java http://jenkins-ci.org/commit/workflow-cps-plugin/f1b67bb381d1e55ccfe494c80e9d7ec3037d3e71 Log: Merge pull request #233 from jglick/test- JENKINS-31314 JENKINS-31314 Demonstrating problem in test Originally-Committed-As: 59e6cf394ba79dea4db3dd2b99b7cca891341f16

          Thanks for the first explanation of what is goes wrong when I try writing any non-trivial CPS script. Usually I'm prepared to do the following:

          1. Define @NonCPS functions at the top level of the script, and only call them from the master node
          2. Use data processing in CPS very carefully on any non-master node (e.g. no closures, not even a regexp for string manipulation)
          3. Expect any feature of Groovy to fail at any time.

          Needless to say, that I have a hard time explaining people how to solve a problem with CPS, even if my solutions demonstrate that it can work. Maybe it would be easier to raise the priority of such issues if CPS didn't work at all.

          Thomas Goeppel added a comment - Thanks for the first explanation of what is goes wrong when I try writing any non-trivial CPS script. Usually I'm prepared to do the following: 1. Define @NonCPS functions at the top level of the script, and only call them from the master node 2. Use data processing in CPS very carefully on any non-master node (e.g. no closures, not even a regexp for string manipulation) 3. Expect any feature of Groovy to fail at any time. Needless to say, that I have a hard time explaining people how to solve a problem with CPS, even if my solutions demonstrate that it can work. Maybe it would be easier to raise the priority of such issues if CPS didn't work at all.

          Jesse Glick added a comment -

          only call them from the master node

          All Pipeline scripts runs inside the master process. node blocks have no effect on this whatsoever. They only control where nested sh steps and the like connect to.

          Jesse Glick added a comment - only call them from the master node All Pipeline scripts runs inside the master process. node blocks have no effect on this whatsoever. They only control where nested sh steps and the like connect to.

          Is there a potential workaround for this issue?  I tried catching CpsCallableInvocation but in the versions I'm using it doesn't seem to be raised.

          When calling httpRequest within an ArrayList.each block, it preemptively returns the value of only the first invocation of httpRequest rather than continuing with the rest of the code in the parent method. @NonCPS on the method does not seem to have an effect.

          adam patterson added a comment - Is there a potential workaround for this issue?  I tried catching CpsCallableInvocation but in the versions I'm using it doesn't seem to be raised. When calling httpRequest within an ArrayList.each  block, it preemptively returns the value of only the first invocation of httpRequest  rather than continuing with the rest of the code in the parent method. @NonCPS on the method does not seem to have an effect.

          Jesse Glick added a comment -

          There is no possible “workaround” for a purely diagnostic issue. Either you understand why your code failed, or you do not. What adampatts is apparently asking for is not a workaround for this issue but for Array.each to actually work in CPS mode, which is accomplished by updating the workflow-cps plugin as per its changelog.

          Jesse Glick added a comment - There is no possible “workaround” for a purely diagnostic issue. Either you understand why your code failed, or you do not. What adampatts is apparently asking for is not a workaround for this issue but for Array.each to actually work in CPS mode, which is accomplished by updating the workflow-cps plugin as per its changelog.

          Jesse Glick added a comment -

          Extremely vague sketch of solution for at least the simple case of

          @NonCPS
          def play() {
            for (int i = 0; i < 10; i++) {
              sh "echo oops, will not work #$i"
            }
          }
          node {
            play()
          }
          

          would be as follows: ContinuationGroup.methodCall is expecting to call WorkflowScript.play. DSL.invokeStep calls Continuable.suspend which throws the CpsCallableInvocation; it should pass in the name of the method (virtual method DSL.sh in this case) that actually suspended, and this should be recorded in the CpsCallableInvocation. When methodCall catches it, it can then see that there is a mismatch between the method it thought it was calling and the method that actually suspended, and throw an error (or perhaps just print a warning).

          (Or, since Caller.record is already being called, we could perhaps do a check in CpsCallableInvocation.<init> that the methods match.)

          The tricky part would be tracking down all the less common cases where CpsCallableInvocation is thrown or caught, and either fixing those too, or making the fix conservative so it quietly lets those pass.

          Jesse Glick added a comment - Extremely vague sketch of solution for at least the simple case of @NonCPS def play() { for ( int i = 0; i < 10; i++) { sh "echo oops, will not work #$i" } } node { play() } would be as follows: ContinuationGroup.methodCall is expecting to call WorkflowScript.play . DSL.invokeStep calls Continuable.suspend which throws the CpsCallableInvocation ; it should pass in the name of the method (virtual method DSL.sh in this case) that actually suspended, and this should be recorded in the CpsCallableInvocation . When methodCall catches it, it can then see that there is a mismatch between the method it thought it was calling and the method that actually suspended, and throw an error (or perhaps just print a warning). (Or, since Caller.record is already being called, we could perhaps do a check in CpsCallableInvocation.<init> that the methods match.) The tricky part would be tracking down all the less common cases where CpsCallableInvocation is thrown or caught, and either fixing those too, or making the fix conservative so it quietly lets those pass.

          Jesse Glick added a comment -

          A clearer example for svanoort: prior to the introduction of CpsDefaultGroovyMethods,

          def r = [1, 2, 3].collect {x -> readFile "data$x"}
          

          would return a String (from data1) rather than a 3-element List<String> as you would expect, due to CpsCallableInvocation breaking out of the loop in DefaultGroovyMethods.collect, and being misinterpreted as the CPS return value of that method. Ouch!

          Jesse Glick added a comment - A clearer example for svanoort : prior to the introduction of CpsDefaultGroovyMethods , def r = [1, 2, 3].collect {x -> readFile "data$x" } would return a String (from data1 ) rather than a 3-element List<String> as you would expect, due to CpsCallableInvocation breaking out of the loop in DefaultGroovyMethods.collect , and being misinterpreted as the CPS return value of that method. Ouch!

          Jesse Glick added a comment -

          perhaps just print a warning

          This is the most attractive option since any mistakes would not result in a serious regression, just a misleading message. I think this could be accomplished with the help of a new method PrintStream Invoker.getLog() which CpsFlowExecution.start could delegate to owner.getListener().getLogger(), thus making messages appear in the build log (outside the scope of any particular step).

          Jesse Glick added a comment - perhaps just print a warning This is the most attractive option since any mistakes would not result in a serious regression, just a misleading message. I think this could be accomplished with the help of a new method PrintStream Invoker.getLog() which CpsFlowExecution.start could delegate to owner.getListener().getLogger() , thus making messages appear in the build log (outside the scope of any particular step).

          Andrew Bayer added a comment -

          jglick - so, I'm lost. DSL.invokeStep is only happening for actual Step implementations, right? So any other CPS-transformed code never calls Continuable.suspend. And CpsClosure throws CpsCallableInvocation from its call and doCall methods. Given that most of the problem cases are actually CpsClosure instances throwing CpsCallableInvocation in non-CPS contexts, how would any of what you said work there?

          Andrew Bayer added a comment - jglick - so, I'm lost. DSL.invokeStep is only happening for actual Step implementations, right? So any other CPS-transformed code never calls Continuable.suspend . And CpsClosure throws CpsCallableInvocation from its call and doCall methods. Given that most of the problem cases are actually CpsClosure instances throwing CpsCallableInvocation in non-CPS contexts, how would any of what you said work there?

          Jesse Glick added a comment -

          Not sure. Linked to a sketch of doing the detection for the originally suggested case. Definitely there are others, and those might be harder to handle—I am not sure. Maybe you can figure something out from these hints.

          Jesse Glick added a comment - Not sure. Linked to a sketch of doing the detection for the originally suggested case. Definitely there are others, and those might be harder to handle—I am not sure. Maybe you can figure something out from these hints.

          Jesse Glick added a comment -

          Seems to be catching the cases I would want it to catch. For now it is catching too much, and it will be some work to prune the warnings back to legitimate cases.

          Jesse Glick added a comment - Seems to be catching the cases I would want it to catch. For now it is catching too much, and it will be some work to prune the warnings back to legitimate cases.

          Andrew Bayer added a comment -

          I’ll take a look tomorrow.

          Andrew Bayer added a comment - I’ll take a look tomorrow.

          Any progress on this?

          Yngvar Kristiansen added a comment - Any progress on this?

          Jesse Glick added a comment -

          No, I had a prototype but stopped work on this.

          Jesse Glick added a comment - No, I had a prototype but stopped work on this.

          Devin Nusbaum added a comment -

          A fix for this issue was released in Pipeline: Groovy Plugin version 2.71.

          Devin Nusbaum added a comment - A fix for this issue was released in Pipeline: Groovy Plugin version 2.71.

          Hi dnusbaum, I’m afraid this change broke our Jenkinsfile runner based testing because the warnings will be thrown as an exception there. Reproducer here https://github.com/fwilhe/repro-cps-issue

          I know we have to fix this in a proper way, but is there some way to turn it into warnings also on Jenkinsfile runner and not have it throw an exception?

          Florian Wilhelm added a comment - Hi dnusbaum , I’m afraid this change broke our Jenkinsfile runner based testing because the warnings will be thrown as an exception there. Reproducer here  https://github.com/fwilhe/repro-cps-issue I know we have to fix this in a proper way, but is there some way to turn it into warnings also on Jenkinsfile runner and not have it throw an exception?

          Jesse Glick added a comment -

          fwilhe JFR is being tracked in JENKINS-58407.

          Jesse Glick added a comment - fwilhe JFR is being tracked in JENKINS-58407 .

          Sorry, closed by mistake

          Sergei Parshev added a comment - Sorry, closed by mistake

          kutzi added a comment -

          Why is this resolved?
          As far as I can see, those mismatches are only logged as warnings and builds often run completely different groovy code than the author expects.
          Can we at least have a global property to have builds fail with a hard error, if this happens?

          kutzi added a comment - Why is this resolved? As far as I can see, those mismatches are only logged as warnings and builds often run completely different groovy code than the author expects. Can we at least have a global property to have builds fail with a hard error, if this happens?

          Devin Nusbaum added a comment -

          kutzi Feel free to open a PR to change this line to also support activation via system property, see for example a similar property here.

          Devin Nusbaum added a comment - kutzi Feel free to open a PR to change this line to also support activation via system property, see for example a similar property here .

          Jesse Glick added a comment -

          those mismatches are only logged as warnings

          Because there are tons of corner cases and it is hard to be sure there are no false positives.

          Jesse Glick added a comment - those mismatches are only logged as warnings Because there are tons of corner cases and it is hard to be sure there are no false positives.

            jglick Jesse Glick
            jglick Jesse Glick
            Votes:
            16 Vote for this issue
            Watchers:
            27 Start watching this issue

              Created:
              Updated:
              Resolved: