Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-47173

Offer a high-performance storage engine for pipeline at some cost to resumability

    XMLWordPrintable

Details

    • Pipeline - October, Pipeline - December

    Description

      As a user of pipeline, I WANNA GO FAST.  

      To do that, I'm okay with sacrificing some of the ability to resume a pipeline after a hard Jenkins crash, as long as it means pipelines run fast.  It would be nice if I can still resume the pipeline after a clean/safe restart of the master, though!

      This differs from https://issues.jenkins-ci.org/browse/JENKINS-30896 because for this we're sacrificing some of the guarantees pipeline normally provides.

      Concrete needs:

      • Implement storage engine (workflow-support)
      • Implement options to select that storage engine (workflow-job)
      • Implement ability to RUN that storage engine (workflow-cps)

      Attachments

        Issue Links

          Activity

            svanoort Sam Van Oort added a comment -

            basil Thank you for reporting your issues, and I'm sorry to hear they bit you rather hard.  I've been doing some investigation on this – it sounds like you may be hitting a pre-existing bug that we saw reported couldn't reproduce (very similar symptoms around use of nesting blocks and executors).  However whether it's pre-existing or a regression from this, if a nested pair of nodes enables us to reproduce (and thus solve it), then you can look forward to a fix soonishly. 

            svanoort Sam Van Oort added a comment - basil Thank you for reporting your issues, and I'm sorry to hear they bit you rather hard.  I've been doing some investigation on this – it sounds like you may be hitting a pre-existing bug that we saw reported couldn't reproduce (very similar symptoms around use of nesting blocks and executors).  However whether it's pre-existing or a regression from this, if a nested pair of nodes enables us to reproduce (and thus solve it), then you can look forward to a fix soonishly. 
            svanoort Sam Van Oort added a comment -

            basil I do not appear to be able to reproduce what you describe with restarts using the new version – after digging deeper, my suspicion is that it might be an issue with the program data persisted before shutdown (using older plugin versions), or as you say it may be a race condition where a job somehow claims a workspace it is not entitled to (since it should be owned by another job) – though I've not seen such before. 

            The WorkspaceListLease un/pickling logic itself doesn't have any alterations, and we did a fair bit of testing around upgrade/downgrade scenarios.  But of course, this logic is quite subtle, complex, and in some cases prior bugs fixed in the course of this could only be observed if actions occurred in specific time windows.

             

            Have you observed / are you able to reproduce this behavior during restarts with the new plugin versions in place?

            svanoort Sam Van Oort added a comment - basil I do not appear to be able to reproduce what you describe with restarts using the new version – after digging deeper, my suspicion is that it might be an issue with the program data persisted before shutdown (using older plugin versions), or as you say it may be a race condition where a job somehow claims a workspace it is not entitled to (since it should be owned by another job) – though I've not seen such before.  The WorkspaceListLease un/pickling logic itself doesn't have any alterations, and we did a fair bit of testing around upgrade/downgrade scenarios.  But of course, this logic is quite subtle, complex, and in some cases prior bugs fixed in the course of this could only be observed if actions occurred in specific time windows.   Have you observed / are you able to reproduce this behavior during restarts with the new plugin versions in place?
            georgecnj George C added a comment -

            I'm also seeing something similar with a pipeline job. This happened after I did two "Reload Configuration From Disk" operations about 5 minutes apart. We see that the job's queue only shows builds up to #6, but builds newer than 6 exist. This is using Jenkins 2.46.3

             

             

            [Pipeline] End of Pipeline
            java.lang.IllegalStateException: JENKINS-37121: something already locked /Users/hudson/build/workspace/ci-ios-bento-sample-apps-apple
                    at org.jenkinsci.plugins.workflow.support.pickles.WorkspaceListLeasePickle$1.tryResolve(WorkspaceListLeasePickle.java:75)
                    at org.jenkinsci.plugins.workflow.support.pickles.WorkspaceListLeasePickle$1.tryResolve(WorkspaceListLeasePickle.java:51)
                    at org.jenkinsci.plugins.workflow.support.pickles.TryRepeatedly$1.run(TryRepeatedly.java:92)
                    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
                    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
                    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
                    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
            Caused: java.io.IOException: Failed to load build state
                    at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$3.onSuccess(CpsFlowExecution.java:697)
                    at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$3.onSuccess(CpsFlowExecution.java:695)
                    at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$4$1.run(CpsFlowExecution.java:744)
                    at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
                    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
                    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
                    at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
                    at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
                    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
                    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
                    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
                    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
                    at java.lang.Thread.run(Thread.java:745)
            Finished: FAILURE

             

            georgecnj George C added a comment - I'm also seeing something similar with a pipeline job. This happened after I did two "Reload Configuration From Disk" operations about 5 minutes apart. We see that the job's queue only shows builds up to #6, but builds newer than 6 exist. This is using Jenkins 2.46.3     [Pipeline] End of Pipeline java.lang.IllegalStateException: JENKINS-37121: something already locked /Users/hudson/build/workspace/ci-ios-bento-sample-apps-apple         at org.jenkinsci.plugins.workflow.support.pickles.WorkspaceListLeasePickle$1.tryResolve(WorkspaceListLeasePickle.java:75)         at org.jenkinsci.plugins.workflow.support.pickles.WorkspaceListLeasePickle$1.tryResolve(WorkspaceListLeasePickle.java:51)         at org.jenkinsci.plugins.workflow.support.pickles.TryRepeatedly$1.run(TryRepeatedly.java:92)         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)         at java.util.concurrent.FutureTask.run(FutureTask.java:266)         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) Caused: java.io.IOException: Failed to load build state         at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$3.onSuccess(CpsFlowExecution.java:697)         at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$3.onSuccess(CpsFlowExecution.java:695)         at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$4$1.run(CpsFlowExecution.java:744)         at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)         at java.util.concurrent.FutureTask.run(FutureTask.java:266)         at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)         at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)         at java.util.concurrent.FutureTask.run(FutureTask.java:266)         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)         at java.lang. Thread .run( Thread .java:745) Finished: FAILURE  
            basil Basil Crow added a comment -

            svanoort Thanks for your reply and for looking into this.

            Have you observed / are you able to reproduce this behavior during restarts with the new plugin versions in place?

            No, I haven't observed this again and I haven't been able to reproduce it. I have also started doing clean shutdowns with the /exit API to ensure that the build queue gets properly flushed to disk before doing Jenkins upgrades.

            Thanks again!

            basil Basil Crow added a comment - svanoort Thanks for your reply and for looking into this. Have you observed / are you able to reproduce this behavior during restarts with the new plugin versions in place? No, I haven't observed this again and I haven't been able to reproduce it. I have also started doing clean shutdowns with the /exit API to ensure that the build queue gets properly flushed to disk before doing Jenkins upgrades. Thanks again!
            svanoort Sam Van Oort added a comment -

            basil georgecnj I just released workflow-cps 2.47 and workflow-job 2.18 which include a ton of fixes to related logic and likely fix the cause(s) of this – if you see this issue recur, could you please open a new JIRA issue and assign to me. Thanks!

            svanoort Sam Van Oort added a comment - basil georgecnj I just released workflow-cps 2.47 and workflow-job 2.18 which include a ton of fixes to related logic and likely fix the cause(s) of this – if you see this issue recur, could you please open a new JIRA issue and assign to me. Thanks!

            People

              svanoort Sam Van Oort
              svanoort Sam Van Oort
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: