Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-63706

Disable CPS transform when build is not resumable anyway

      If you have set WorkflowJob.resumeBlocked to true, there is no benefit to using the groovy-cps library, so there is no reason to pay its costs: overhead compiling, overhead running, poor fidelity of Groovy constructs, etc. In fact if you were running without the sandbox (as you normally would in jenkinsfile-runner), you may as well run stock Groovy.

      This is not as simple as just turning off the CPS transformer, though: various stuff in workflow-cps expects to be interacting with continuations, so there would need to be some sort of stub layer, parallel would need to be modified to use native Thread's, etc. It might be easier to just creating a fresh kind of FlowDefinition in a separate plugin, though there is likely a lot of stuff currently in workflow-cps which really ought to have been done in workflow-support.

          [JENKINS-63706] Disable CPS transform when build is not resumable anyway

          jerry wiltse added a comment -

          TLDR; This would be an exciting capability. I personally know many people in OSS and enterprise who would gladly give up "resumption" as a feature if it meant that shared libraries could be written in "stock groovy" (or as nearly stock as possible).

          I have worked on several large jenkins shared library projects at several different companies, and the cost of CPS has always been exceptionally high in those cases. Although interestingly, not in the ways you have listed. Even with very large shared libraries, the compilation time and execution time of the groovy code has NEVER been a source of discussion. They've never been a significant contributor to overall job times. In contrast, the cost I'm thinking about has come up in just about every enterprise and oss community discussion about jenkins shared libraries which I've been part of, so I'll summarize it here.

          I'll first refer to a metaphor popularized in an old blog post which gained a lot of notoriety and became a meme in the wider programming language community. This post describes the fundamental situation pretty poetically, but with respect to async/sync functions.

          https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/

          In summary, CPS "forks" the functions in a program into two colors, in almost exactly the same way the addition of async/await does, CPS makes groovy functions red. Any functions which can't be CPS translated and are marked @NonCPS are blue, and the crossing between them is either impossible, or fraught with peril and misfortune when it is possible. Local testing, debugging, and predicting the outcome of a change across many pipelines all become vastly more complicated because of the existence of CPS.

          I know this isn't news, but as a user, this complexity is the most significant cost of CPS by a wide margin.

          I love the study CPS from a programming language standpoint, and, the use CPS to provide the capability of resumption for Jenkins Pipelines is novel and brilliant thing. I'm sure there were some teams and companies who needed such a capability, and benefitted from it greatly. I've certainly been pleasantly surprised a few times by having it work during unexpected or unfortunate Jenkins master restarts myself.

          Nonetheless, it's worth pointing out that across multiple devops teams I've worked with as a consultant, and the ones I've worked on as a team member, successful resumption of pipelines during a restart of master was never expected, trusted, or planned around. Resumption can simply never be guaranteed. Restarting Jenkins was a major event, and required a maintenance window where jobs. If Jobs were scheduled or still in flight during such windows, it was widely understood that they would almost certainly fail (and they almost always did for one reason or another). There are just too many conditions outside of Jenkins which can undermine the resumption. Plugin upgrades, jobs which call Jenkins internal API's and REST APIs, the implicit checkouts that take place on master... etc.

          So, from my perspective (which is still very limited compared to many others who might read this), I think many organizations would say that they would rather not pay the total cost of ownership for the ability to resume. My guess is that it wasn't feasible to implement it as opt-in from the beginning, but now that there might be possibility to opt-out in the future, it's very exciting.

          jerry wiltse added a comment - TLDR; This would be an exciting capability. I personally know many people in OSS and enterprise who would gladly give up "resumption" as a feature if it meant that shared libraries could be written in "stock groovy" (or as nearly stock as possible). I have worked on several large jenkins shared library projects at several different companies, and the cost of CPS has always been exceptionally high in those cases. Although interestingly, not in the ways you have listed. Even with very large shared libraries, the compilation time and execution time of the groovy code has NEVER been a source of discussion. They've never been a significant contributor to overall job times. In contrast, the cost I'm thinking about has come up in just about every enterprise and oss community discussion about jenkins shared libraries which I've been part of, so I'll summarize it here. I'll first refer to a metaphor popularized in an old blog post which gained a lot of notoriety and became a meme in the wider programming language community. This post describes the fundamental situation pretty poetically, but with respect to async/sync functions. https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/ In summary, CPS "forks" the functions in a program into two colors, in almost exactly the same way the addition of async/await does, CPS makes groovy functions red. Any functions which can't be CPS translated and are marked @NonCPS are blue, and the crossing between them is either impossible, or fraught with peril and misfortune when it is possible. Local testing, debugging, and predicting the outcome of a change across many pipelines all become vastly more complicated because of the existence of CPS. I know this isn't news, but as a user, this complexity is the most significant cost of CPS by a wide margin. I love the study CPS from a programming language standpoint, and, the use CPS to provide the capability of resumption for Jenkins Pipelines is novel and brilliant thing. I'm sure there were some teams and companies who needed such a capability, and benefitted from it greatly. I've certainly been pleasantly surprised a few times by having it work during unexpected or unfortunate Jenkins master restarts myself. Nonetheless, it's worth pointing out that across multiple devops teams I've worked with as a consultant, and the ones I've worked on as a team member, successful resumption of pipelines during a restart of master was never expected, trusted, or planned around. Resumption can simply never be guaranteed. Restarting Jenkins was a major event, and required a maintenance window where jobs. If Jobs were scheduled or still in flight during such windows, it was widely understood that they would almost certainly fail (and they almost always did for one reason or another). There are just too many conditions outside of Jenkins which can undermine the resumption. Plugin upgrades, jobs which call Jenkins internal API's and REST APIs, the implicit checkouts that take place on master... etc. So, from my perspective (which is still very limited compared to many others who might read this), I think many organizations would say that they would rather not pay the total cost of ownership for the ability to resume. My guess is that it wasn't feasible to implement it as opt-in from the beginning, but now that there might be possibility to opt-out in the future, it's very exciting.

          Justin Vallon added a comment -

          As a Jenkins user and (non-Jenkins) software developer, part of purpose of the CPS code is that the state of the Groovy interpreter can be saved. If CPS transformation does not take place, the state of the program would need to be maintained in a live thread with a Groovy interpreter remaining in-process, waiting for any "sh", etc, command to complete. That would incur a cost in terms of resources.

          Of course, serializing the interpreter state in order to "swap-out" the running Groovy code might incur a different cost.

          Justin Vallon added a comment - As a Jenkins user and (non-Jenkins) software developer, part of purpose of the CPS code is that the state of the Groovy interpreter can be saved. If CPS transformation does not take place, the state of the program would need to be maintained in a live thread with a Groovy interpreter remaining in-process, waiting for any "sh", etc, command to complete. That would incur a cost in terms of resources. Of course, serializing the interpreter state in order to "swap-out" the running Groovy code might incur a different cost.

          Jesse Glick added a comment -

          That would incur a cost in terms of resources.

          Likely orders of magnitude smaller than using the CPS transformer.

          Anyway, this is likely to be a difficult feature to implement.

          Jesse Glick added a comment - That would incur a cost in terms of resources. Likely orders of magnitude smaller than using the CPS transformer. Anyway, this is likely to be a difficult feature to implement.

          Marco added a comment -

          Our use case is regarding a maintenance window. There is never a safe time to do a restart without corrupting some running declarative pipelines. 

          The "Prepare for shutdown" / "Safe restart" functionality pauses the pipeline on next step instead of finishing them. So a more simple way to overcome this CPS pain is to wait until all jobs are REALLY finished instead of pausing them. This implies that we should just prevent new jobs to be scheduled and leave them in the Q

          Marco added a comment - Our use case is regarding a maintenance window. There is never a safe time to do a restart without corrupting some running declarative pipelines.  The "Prepare for shutdown" / "Safe restart" functionality pauses the pipeline on next step instead of finishing them. So a more simple way to overcome this CPS pain is to wait until all jobs are REALLY finished instead of pausing them. This implies that we should just prevent new jobs to be scheduled and leave them in the Q

            Unassigned Unassigned
            jglick Jesse Glick
            Votes:
            8 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated: