Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-25623

timeout step should be able to kill infinite loop

      Timeout step is currently only able to interrupt executions that are pausing (such as the 'sh' step.)

      It would be nice if it can also interrupt infinite loops like while(true) ; by magically throwing an exception.

      To safely doing this without race seems to require that CpsThread.runNextChunk spontaneously return even without the CPS code explicitly yielding. When resuming, any abnormal Outcome should act as if it's throwing the exception, and normal Outcome should get ignored and its natural result should be used instead.

      The spontaneous return could happen every N instructions. Or CpsTransformer can insert Block that does it, like HotSpot inserts safepoint checks at the beginning of every function and every loop.

          [JENKINS-25623] timeout step should be able to kill infinite loop

          Jesse Glick added a comment -

          We probably need a multistaged approach here.

          • A FlowInterruptedException arising from a containing step like timeout should be able to interrupt CPS-transformed code not inside a StepExecution at a safe point (runNextChunk). I suppose similar to how we implemented the pause function, except aborting rather than pausing.
          • A build interrupt should be able to do the same. This is coming in via different code path but should delegate to the same implementation.
          • If the CPS VM is running native code, Thread.interrupt should be called. It should be given a limited grace period—say, a few seconds—to terminate; after that, resort to Thread.stop¹, making sure we are able to provide a fresh Thread for the pool so we can still run finally blocks or whatever.
          • We may also need some sort of per-build CPS VM CPU quota, distinct from timeout in that we do not care about wall clock time spent running a shell script on an agent, we just care about not overloading the master. Alternately, if a given build starts taking too much CPU time (measurable via System.nanoTime around runNextChunk), gradually being delaying its chunk execution (i.e., CpsThreadGroup.scheduleRun may call schedule rather than submit) so that it does not hog the system, and also institute a hard time limit for individual chunks (such as slow native methods).

          ¹Ignore the stern deprecation warnings. In practice it works fine—have used it for years in NetBeans “internal execution”, such as is used for Ant scripts—and any harm it might cause is AFAIK purely theoretical and a much lesser evil than JENKINS-32986.

          Jesse Glick added a comment - We probably need a multistaged approach here. A FlowInterruptedException arising from a containing step like timeout should be able to interrupt CPS-transformed code not inside a StepExecution at a safe point ( runNextChunk ). I suppose similar to how we implemented the pause function, except aborting rather than pausing. A build interrupt should be able to do the same. This is coming in via different code path but should delegate to the same implementation. If the CPS VM is running native code, Thread.interrupt should be called. It should be given a limited grace period—say, a few seconds—to terminate; after that, resort to Thread.stop ¹, making sure we are able to provide a fresh Thread for the pool so we can still run finally blocks or whatever. We may also need some sort of per-build CPS VM CPU quota, distinct from timeout in that we do not care about wall clock time spent running a shell script on an agent, we just care about not overloading the master. Alternately, if a given build starts taking too much CPU time (measurable via System.nanoTime around runNextChunk ), gradually being delaying its chunk execution (i.e., CpsThreadGroup.scheduleRun may call schedule rather than submit ) so that it does not hog the system, and also institute a hard time limit for individual chunks (such as slow native methods). ¹Ignore the stern deprecation warnings. In practice it works fine—have used it for years in NetBeans “internal execution”, such as is used for Ant scripts—and any harm it might cause is AFAIK purely theoretical and a much lesser evil than JENKINS-32986 .

          Jesse Glick added a comment -

          Another issue is unbounded heap consumption (without consuming much CPU). I doubt we can do anything about this, as Java does not even provide a means to track it. (From CPS-transformed code, perhaps, but not from native code including @NonCPS.)

          Jesse Glick added a comment - Another issue is unbounded heap consumption (without consuming much CPU). I doubt we can do anything about this, as Java does not even provide a means to track it. (From CPS-transformed code, perhaps, but not from native code including @NonCPS .)

          I came here to mention safepoints like HotSpot.

          Kohsuke Kawaguchi added a comment - I came here to mention safepoints like HotSpot.

          Code changed in jenkins
          User: Kohsuke Kawaguchi
          Path:
          pom.xml
          src/main/java/org/jenkinsci/plugins/workflow/cps/CpsGroovyShellFactory.java
          src/main/java/org/jenkinsci/plugins/workflow/cps/CpsThread.java
          src/main/java/org/jenkinsci/plugins/workflow/cps/CpsThreadGroup.java
          src/main/java/org/jenkinsci/plugins/workflow/cps/Safepoint.java
          src/test/java/org/jenkinsci/plugins/workflow/WorkflowJobNonRestartingTest.java
          http://jenkins-ci.org/commit/workflow-cps-plugin/95e73725ec0479685916ace0e1b2d80801519e43
          Log:
          JENKINS-25623

          previously, CpsThreadGroup.run() was greedy. It was running as much as
          it can before it returns. This means if the Pipeline script in question
          has an infinite loop, this method never returns. This prevents other
          activities to take place on CPS VM thread, most notably the attempt to
          kill the execution.

          In this change, CpsThreadGroup.run() is modified to execute just a
          little bit, not as much as it can, by using the new safepoint capability
          in groovy-cps. CpsThreadGroup.scheduleRun() is modified so that if the
          Pipeline Script is still runnable, the next chunk of execution is
          scheduled.

          Future implementation from scheduleRun() was simplified a bit. This code
          is really only used for testing, so it's not that important that the
          cancel() and other methods work correctly.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: pom.xml src/main/java/org/jenkinsci/plugins/workflow/cps/CpsGroovyShellFactory.java src/main/java/org/jenkinsci/plugins/workflow/cps/CpsThread.java src/main/java/org/jenkinsci/plugins/workflow/cps/CpsThreadGroup.java src/main/java/org/jenkinsci/plugins/workflow/cps/Safepoint.java src/test/java/org/jenkinsci/plugins/workflow/WorkflowJobNonRestartingTest.java http://jenkins-ci.org/commit/workflow-cps-plugin/95e73725ec0479685916ace0e1b2d80801519e43 Log: JENKINS-25623 previously, CpsThreadGroup.run() was greedy. It was running as much as it can before it returns. This means if the Pipeline script in question has an infinite loop, this method never returns. This prevents other activities to take place on CPS VM thread, most notably the attempt to kill the execution. In this change, CpsThreadGroup.run() is modified to execute just a little bit, not as much as it can, by using the new safepoint capability in groovy-cps. CpsThreadGroup.scheduleRun() is modified so that if the Pipeline Script is still runnable, the next chunk of execution is scheduled. Future implementation from scheduleRun() was simplified a bit. This code is really only used for testing, so it's not that important that the cancel() and other methods work correctly.

          Implemented the safepoints & kill.

          This change does:

          • allow interruption to kill infinite loops and recursions
          • timeout to work correctly even when the thread is not paused at step
            This change does not:
          • allow interruption to escalate to Thread.interrup() / Thread.stop() to forcibly stop an execution that's happening in non-CPS code
          • do any CPU usage accounting.

          Kohsuke Kawaguchi added a comment - Implemented the safepoints & kill. This change does: allow interruption to kill infinite loops and recursions timeout to work correctly even when the thread is not paused at step This change does not: allow interruption to escalate to Thread.interrup() / Thread.stop() to forcibly stop an execution that's happening in non-CPS code do any CPU usage accounting.

          Jesse Glick added a comment -

          Maybe we can leave the open tasks to JENKINS-32986.

          Jesse Glick added a comment - Maybe we can leave the open tasks to JENKINS-32986 .

          Code changed in jenkins
          User: Jesse Glick
          Path:
          pom.xml
          src/main/java/org/jenkinsci/plugins/workflow/cps/CpsBodyExecution.java
          src/main/java/org/jenkinsci/plugins/workflow/cps/CpsFlowExecution.java
          src/main/java/org/jenkinsci/plugins/workflow/cps/CpsGroovyShellFactory.java
          src/main/java/org/jenkinsci/plugins/workflow/cps/CpsThread.java
          src/main/java/org/jenkinsci/plugins/workflow/cps/CpsThreadGroup.java
          src/main/java/org/jenkinsci/plugins/workflow/cps/Safepoint.java
          src/test/java/org/jenkinsci/plugins/workflow/WorkflowJobNonRestartingTest.java
          http://jenkins-ci.org/commit/workflow-cps-plugin/f4a660726b6d7a6d149c919ae7c5971ac9bb4818
          Log:
          Merge pull request #46 from jenkinsci/JENKINS-25623

          JENKINS-25623 time out & kill infinite recursion

          Compare: https://github.com/jenkinsci/workflow-cps-plugin/compare/1100650f3c9b...f4a660726b6d

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: pom.xml src/main/java/org/jenkinsci/plugins/workflow/cps/CpsBodyExecution.java src/main/java/org/jenkinsci/plugins/workflow/cps/CpsFlowExecution.java src/main/java/org/jenkinsci/plugins/workflow/cps/CpsGroovyShellFactory.java src/main/java/org/jenkinsci/plugins/workflow/cps/CpsThread.java src/main/java/org/jenkinsci/plugins/workflow/cps/CpsThreadGroup.java src/main/java/org/jenkinsci/plugins/workflow/cps/Safepoint.java src/test/java/org/jenkinsci/plugins/workflow/WorkflowJobNonRestartingTest.java http://jenkins-ci.org/commit/workflow-cps-plugin/f4a660726b6d7a6d149c919ae7c5971ac9bb4818 Log: Merge pull request #46 from jenkinsci/ JENKINS-25623 JENKINS-25623 time out & kill infinite recursion Compare: https://github.com/jenkinsci/workflow-cps-plugin/compare/1100650f3c9b...f4a660726b6d

          Code changed in jenkins
          User: Jesse Glick
          Path:
          src/main/java/org/jenkinsci/plugins/workflow/cps/CpsThreadGroup.java
          http://jenkins-ci.org/commit/workflow-cps-plugin/22fa38ac38ea8fd61c9b615779f5b793cd381214
          Log:
          Avoid saving program.dat if we are about to run another chunk anyway.
          Reverts this aspect of behavior to that prior to the fix of JENKINS-25623.
          Also logging time spent saving.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: src/main/java/org/jenkinsci/plugins/workflow/cps/CpsThreadGroup.java http://jenkins-ci.org/commit/workflow-cps-plugin/22fa38ac38ea8fd61c9b615779f5b793cd381214 Log: Avoid saving program.dat if we are about to run another chunk anyway. Reverts this aspect of behavior to that prior to the fix of JENKINS-25623 . Also logging time spent saving.

            kohsuke Kohsuke Kawaguchi
            kohsuke Kohsuke Kawaguchi
            Votes:
            1 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: