Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-53775

FileNotFoundException for program.dat when running a Pipeline Job concurrently with the Job DSL plugin

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Minor Minor
    • None
    • job-dsl 1.76

      I unfortunately don't have an easy reproduer, but this bug happens to me fairly regularly (at least a few times a month). For this failure to occur:

      • A Freestyle job is running and performing a "Process Job DSLs" step. Part of this involves updating an existing Pipeline job.
      • That existing Pipeline job is also running (or starting) at around the same time as the "Process Job DLSs" step is trying to update it.

      When the timing is just right, the Pipeline job fails with:

      java.io.FileNotFoundException: /var/jenkins_home/jobs/devops-gate/jobs/projects/jobs/dx4linux/jobs/delphix-build-and-snapshots/jobs/ami-snapshots/builds/1173/program.dat (No such file or directory)
      	at java.io.FileInputStream.open0(Native Method)
      	at java.io.FileInputStream.open(FileInputStream.java:195)
      	at java.io.FileInputStream.<init>(FileInputStream.java:138)
      	at org.jenkinsci.plugins.workflow.support.pickles.serialization.RiverReader.openStreamAt(RiverReader.java:188)
      	at org.jenkinsci.plugins.workflow.support.pickles.serialization.RiverReader.restorePickles(RiverReader.java:136)
      	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.loadProgramAsync(CpsFlowExecution.java:773)
      	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.onLoad(CpsFlowExecution.java:739)
      	at org.jenkinsci.plugins.workflow.job.WorkflowRun.getExecution(WorkflowRun.java:875)
      	at org.jenkinsci.plugins.workflow.job.WorkflowRun.onLoad(WorkflowRun.java:745)
      	at hudson.model.RunMap.retrieve(RunMap.java:225)
      	at hudson.model.RunMap.retrieve(RunMap.java:57)
      	at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:499)
      	at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:481)
      	at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:379)
      	at jenkins.model.lazy.LazyBuildMixIn.getBuildByNumber(LazyBuildMixIn.java:231)
      	at org.jenkinsci.plugins.workflow.job.WorkflowJob.getBuildByNumber(WorkflowJob.java:234)
      	at org.jenkinsci.plugins.workflow.job.WorkflowJob.getBuildByNumber(WorkflowJob.java:105)
      	at hudson.model.Run.fromExternalizableId(Run.java:2436)
      	at org.jenkinsci.plugins.workflow.support.steps.build.RunWrapper.getRawBuild(RunWrapper.java:71)
      	at org.jenkinsci.plugins.workflow.support.steps.build.RunWrapper.build(RunWrapper.java:75)
      	at org.jenkinsci.plugins.workflow.support.steps.build.RunWrapper.setResult(RunWrapper.java:87)
      	at sun.reflect.GeneratedMethodAccessor820.invoke(Unknown Source)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:498)
      	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
      	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
      	at groovy.lang.MetaClassImpl.setProperty(MetaClassImpl.java:2725)
      	at groovy.lang.MetaClassImpl.setProperty(MetaClassImpl.java:3770)
      	at org.codehaus.groovy.runtime.InvokerHelper.setProperty(InvokerHelper.java:201)
      	at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.setProperty(ScriptBytecodeAdapter.java:484)
      	at org.kohsuke.groovy.sandbox.impl.Checker$7.call(Checker.java:347)
      	at org.kohsuke.groovy.sandbox.GroovyInterceptor.onSetProperty(GroovyInterceptor.java:84)
      	at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxInterceptor.onSetProperty(SandboxInterceptor.java:197)
      	at org.kohsuke.groovy.sandbox.impl.Checker$7.call(Checker.java:344)
      	at org.kohsuke.groovy.sandbox.impl.Checker.checkedSetProperty(Checker.java:351)
      	at com.cloudbees.groovy.cps.sandbox.SandboxInvoker.setProperty(SandboxInvoker.java:33)
      	at com.cloudbees.groovy.cps.impl.PropertyAccessBlock.rawSet(PropertyAccessBlock.java:24)
      	at com.cloudbees.groovy.cps.impl.PropertyishBlock$ContinuationImpl.set(PropertyishBlock.java:88)
      	at com.cloudbees.groovy.cps.impl.AssignmentBlock$ContinuationImpl.assignAndDone(AssignmentBlock.java:70)
      	at sun.reflect.GeneratedMethodAccessor706.invoke(Unknown Source)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:498)
      	at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)
      	at com.cloudbees.groovy.cps.impl.ConstantBlock.eval(ConstantBlock.java:21)
      	at com.cloudbees.groovy.cps.Next.step(Next.java:83)
      	at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:174)
      	at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:163)
      	at org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:122)
      	at org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:261)
      	at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:163)
      	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$101(SandboxContinuable.java:34)
      	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.lambda$run0$0(SandboxContinuable.java:59)
      	at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.GroovySandbox.runInSandbox(GroovySandbox.java:108)
      	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:58)
      	at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:174)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:332)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$200(CpsThreadGroup.java:83)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:244)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:232)
      	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:64)
      Caused: java.io.IOException: Failed to load build state
      	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$3.onSuccess(CpsFlowExecution.java:854)
      	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$3.onSuccess(CpsFlowExecution.java:852)
      	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$4$1.run(CpsFlowExecution.java:906)
      	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:131)
      	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
      	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      Finished: FAILURE
      [Pipeline] stage
      [Pipeline] { (Cloning repos)
      [Pipeline] node
      19:23:07 Running on jenkins-agent3 in /var/tmp/jenkins_slaves/jenkins-ops/workspace/devops-gate/projects/dx4linux/delphix-build-and-snapshots/ami-snapshots
      [Pipeline] {
      [Pipeline] checkout
      19:23:07 Wiping out workspace first.
      19:23:08 Cloning the remote Git repository
      [...]
      [Pipeline] // timestamps
      [Pipeline] End of Pipeline
      Finished: SUCCESS
      

      Several things are unusual:

      • The job continues running, even after the Pipeline plugin printed "Finished: FAILURE."
      • If the job was running from a parent job, the parent job shows the job as failed (even though the child job is still running).
      • The job gets to the finally block of my pipeline and sends me a notification that it failed, however, the value of currentBuild.result is paradoxically SUCCESS, which leads me to believe that somehow the pipeline got to my finally block but not my catch block (which sets currentBuild.result to FAILURE on any exception). While the details of this behavior might be specific to my pipeline, the general behavior is very wonky.

          [JENKINS-53775] FileNotFoundException for program.dat when running a Pipeline Job concurrently with the Job DSL plugin

          Basil Crow added a comment -

          svanoort, does this seem similar to other issues you've investigated where jobs mysteriously resume after they've "completed"?

          Basil Crow added a comment - svanoort , does this seem similar to other issues you've investigated where jobs mysteriously resume after they've "completed"?

          Basil Crow added a comment -

          Hit this again today. This has become an annoyance for us, since when we push a change to a Job DSL job, we frequently see this behavior on any running instances of that job.

          Basil Crow added a comment - Hit this again today. This has become an annoyance for us, since when we push a change to a Job DSL job, we frequently see this behavior on any running instances of that job.

          Sam Van Oort added a comment -

          jtaboada Could I persuade you to please take a look at this? It looks like it should be a synchronization bug around pickle loading. I'm not actually as concerned with JobDSL + Pipeline because that's kind of an edge case situation, but I strongly suspect that a threading issue here will be causing other failures that people just don't know the cause of. Thanks!

          Sam Van Oort added a comment - jtaboada Could I persuade you to please take a look at this? It looks like it should be a synchronization bug around pickle loading. I'm not actually as concerned with JobDSL + Pipeline because that's kind of an edge case situation, but I strongly suspect that a threading issue here will be causing other failures that people just don't know the cause of. Thanks!

          Sam Van Oort added a comment -

          jtaboada Could you please provide some details? There should not be a race condition possible (if so, that's a bugfix we need to do). It sounds like you have what you need to fix that, from what you're saying. Thanks!

          Sam Van Oort added a comment - jtaboada Could you please provide some details? There should not be a race condition possible (if so, that's a bugfix we need to do). It sounds like you have what you need to fix that, from what you're saying. Thanks!

          Will Freeman added a comment -

          Also seeing this error pop up fairly consistently when a number of PR builds all fire at the same time on a multibranch job.  After the error, the jobs all stick around until the master is rebooted.

          Will Freeman added a comment - Also seeing this error pop up fairly consistently when a number of PR builds all fire at the same time on a multibranch job.  After the error, the jobs all stick around until the master is rebooted.

          Basil Crow added a comment -

          Hit this again today.

          I'm not actually as concerned with JobDSL + Pipeline because that's kind of an edge case situation

          svanoort, can you explain why you consider this an edge case situation? Is it because you'd rather that users use a Jenkinsfile with the Pipeline Multibranch Plugin rather than Job DSL? I'd love to do that, but I can't. My organization centralizes company-wide build logic in one team, and one repository within that team. Due to JENKINS-43749, the Pipeline Multibranch Plugin doesn't work for my use case. In that bug Jesse recommended the use of Job DSL. So that is what I am doing. Is this an edge case situation? I'd be happy to move to a different setup if one was available, but I don't see any other options, I'm afraid. And this bug continues to affect us.

          Basil Crow added a comment - Hit this again today. I'm not actually as concerned with JobDSL + Pipeline because that's kind of an edge case situation svanoort , can you explain why you consider this an edge case situation? Is it because you'd rather that users use a Jenkinsfile with the Pipeline Multibranch Plugin rather than Job DSL? I'd love to do that, but I can't. My organization centralizes company-wide build logic in one team, and one repository within that team. Due to JENKINS-43749 , the Pipeline Multibranch Plugin doesn't work for my use case. In that bug Jesse recommended the use of Job DSL. So that is what I am doing. Is this an edge case situation? I'd be happy to move to a different setup if one was available, but I don't see any other options, I'm afraid. And this bug continues to affect us.

          Basil Crow added a comment -

          Hit this again today.

          Basil Crow added a comment - Hit this again today.

          Nikita Bochenko added a comment - - edited

          Constantly running into the same issue. Using both Job DSL+Pipeline and Pipeline directly to generate new jobs or update their configuration. Have no idea how to reproduce it as it seems very random.

          It seems if I reload configuration from disk, the error goes away for some time.

          Nikita Bochenko added a comment - - edited Constantly running into the same issue. Using both Job DSL+Pipeline and Pipeline directly to generate new jobs or update their configuration. Have no idea how to reproduce it as it seems very random. It seems if I reload configuration from disk, the error goes away for some time.

          Basil Crow added a comment -

          mbochenk and therealwaldo, I made an interesting observation today. I've only seen this problem on jobs where Job DSL was setting concurrentBuild false. I've never seen the problem on jobs where concurrentBuild is set to true. Does this match your experience? It seems too uncanny to be a coincidence. And whenever I hit this bug, the two jobs seem to in fact be running concurrently, which "should" never happen because concurrentBuild is set to false. I wonder if this could actually be a Jenkins core issue, where two jobs are scheduled to run concurrently (even though they shouldn't be) and later trample on each other.

          Basil Crow added a comment - mbochenk and therealwaldo , I made an interesting observation today. I've only seen this problem on jobs where Job DSL was setting concurrentBuild false . I've never seen the problem on jobs where concurrentBuild is set to true. Does this match your experience? It seems too uncanny to be a coincidence. And whenever I hit this bug, the two jobs seem to in fact be running concurrently, which "should" never happen because concurrentBuild is set to false. I wonder if this could actually be a Jenkins core issue, where two jobs are scheduled to run concurrently (even though they shouldn't be) and later trample on each other.

          Basil Crow added a comment -

          Here is the behavior I have observed:

          1. A WorkflowJob is created using Job DSL with concurrentBuilds false.
          2. The job starts running.
          3. Another build of the WorkflowJob is scheduled (due to an SCM commit) but remains in the queue, blocked on the first build.
          4. A Job DSL job runs again and updates the above WorkflowJob's job definition.
          5. At this point, what should be impossible happens: the second build starts running concurrently with the first build (even though concurrentBuilds is false in the DSL!). I can clearly see both running in the Jenkins classic UI.
          6. The first build fails with the java.io.FileNotFoundException mentioned above (program.dat (No such file or directory)).

          Here is some background information. When Job DSL creates a pipeline job with concurrentBuilds false, it emits the following XML in the flow definition:

          <concurrentBuild>false</concurrentBuild>
          

          Now, note that this field is deprecated in WorkflowJob:

          /** @deprecated replaced by {@link DisableConcurrentBuildsJobProperty} */
          private @CheckForNull Boolean concurrentBuild;
          

          In fact, the getter and setter in WorkflowJob use this deprecated field just to set a DisableConcurrentBuildsJobProperty property on the job:

              @Exported
              @Override public boolean isConcurrentBuild() {
                  return getProperty(DisableConcurrentBuildsJobProperty.class) == null;
              }
          [...]
              public void setConcurrentBuild(boolean b) throws IOException {
                  concurrentBuild = null;
          
                  boolean propertyExists = getProperty(DisableConcurrentBuildsJobProperty.class) != null;
          
                  // If the property exists, concurrent builds are disabled. So if the argument here is true and the
                  // property exists, we need to remove the property, while if the argument is false and the property
                  // does not exist, we need to add the property. Yay for flipping boolean values around!
                  if (propertyExists == b) {
                      BulkChange bc = new BulkChange(this);
                      try {
                          removeProperty(DisableConcurrentBuildsJobProperty.class);
                          if (!b) {
                              addProperty(new DisableConcurrentBuildsJobProperty());
                          }
                          bc.commit();
                      } finally {
                          bc.abort();
                      }
                  }
              }
          

          The deserialization from XML takes place in WorkflowJob#onLoad:

              @Override public void onLoad(ItemGroup<? extends Item> parent, String name) throws IOException {
                  super.onLoad(parent, name);
          
                  if (buildMixIn == null) {
                      buildMixIn = createBuildMixIn();
                  }
                  buildMixIn.onLoad(parent, name);
                  if (triggers != null && !triggers.isEmpty()) {
                      setTriggers(triggers.toList());
                  }
                  if (concurrentBuild != null) {
                      setConcurrentBuild(concurrentBuild);
                  }
          

          We know that Job DSL is writing out the XML with <concurrentBuild>false</concurrentBuild>. So when the job is deserialized, the deprecated field concurrentBuild must be set to false. The onLoad method checks this field, sees that it is not null, and calls WorkflowJob#setConcurrentBuild(false). This changes the value of the field from false to null and adds the DisableConcurrentBuildsJobProperty property to the job in a bulk change.

          My theory is that while this is taking place, another caller concurrently invokes WorkflowJob#isConcurrentBuild. Since the DisableConcurrentBuildsJobProperty is not yet set on the job, this method returns true. Hence the scheduler starts running this job concurrently, erroneously. Then later on, we reach a pathological state in Pipeline and the java.io.FileNotFoundException is thrown.

          Next, I will try to prove my theory. I'll post updates in this bug, but I welcome any suggestions.

          Basil Crow added a comment - Here is the behavior I have observed: A WorkflowJob is created using Job DSL with concurrentBuilds false . The job starts running. Another build of the WorkflowJob is scheduled (due to an SCM commit) but remains in the queue, blocked on the first build. A Job DSL job runs again and updates the above WorkflowJob 's job definition. At this point, what should be impossible happens: the second build starts running concurrently with the first build (even though concurrentBuilds is false in the DSL!). I can clearly see both running in the Jenkins classic UI. The first build fails with the java.io.FileNotFoundException mentioned above ( program.dat (No such file or directory) ). Here is some background information. When Job DSL creates a pipeline job with concurrentBuilds false , it emits the following XML in the flow definition: <concurrentBuild>false</concurrentBuild> Now, note that this field is deprecated in WorkflowJob : /** @deprecated replaced by {@link DisableConcurrentBuildsJobProperty} */ private @CheckForNull Boolean concurrentBuild; In fact, the getter and setter in WorkflowJob use this deprecated field just to set a DisableConcurrentBuildsJobProperty property on the job: @Exported @Override public boolean isConcurrentBuild() { return getProperty(DisableConcurrentBuildsJobProperty.class) == null; } [...] public void setConcurrentBuild(boolean b) throws IOException { concurrentBuild = null; boolean propertyExists = getProperty(DisableConcurrentBuildsJobProperty.class) != null; // If the property exists, concurrent builds are disabled. So if the argument here is true and the // property exists, we need to remove the property, while if the argument is false and the property // does not exist, we need to add the property. Yay for flipping boolean values around! if (propertyExists == b) { BulkChange bc = new BulkChange(this); try { removeProperty(DisableConcurrentBuildsJobProperty.class); if (!b) { addProperty(new DisableConcurrentBuildsJobProperty()); } bc.commit(); } finally { bc.abort(); } } } The deserialization from XML takes place in WorkflowJob#onLoad : @Override public void onLoad(ItemGroup<? extends Item> parent, String name) throws IOException { super.onLoad(parent, name); if (buildMixIn == null) { buildMixIn = createBuildMixIn(); } buildMixIn.onLoad(parent, name); if (triggers != null && !triggers.isEmpty()) { setTriggers(triggers.toList()); } if (concurrentBuild != null) { setConcurrentBuild(concurrentBuild); } We know that Job DSL is writing out the XML with <concurrentBuild>false</concurrentBuild> . So when the job is deserialized, the deprecated field concurrentBuild must be set to false . The onLoad method checks this field, sees that it is not null , and calls WorkflowJob#setConcurrentBuild(false) . This changes the value of the field from false to null and adds the DisableConcurrentBuildsJobProperty property to the job in a bulk change. My theory is that while this is taking place, another caller concurrently invokes WorkflowJob#isConcurrentBuild . Since the DisableConcurrentBuildsJobProperty is not yet set on the job, this method returns true. Hence the scheduler starts running this job concurrently, erroneously. Then later on, we reach a pathological state in Pipeline and the java.io.FileNotFoundException is thrown. Next, I will try to prove my theory. I'll post updates in this bug, but I welcome any suggestions.

          Basil Crow added a comment -

          I verified my understanding of the above by checking config.xml after Job DSL had generated the job with concurrentBuild false. It had the <concurrentBuild>false</concurrentBuild> attribute. I then checked how this had been deserialized in the Script Console:

          println job.@concurrentBuild
          println job.concurrentBuild
          

          This printed null and false, which was what I expected. The field had been set to null, and the getter was instead relying on the property. Next, I called job.save() from the Script Console and checked the contents of config.xml again. Now, <concurrentBuild>false</concurrentBuild> was gone. In its place, in the properties section was <org.jenkinsci.plugins.workflow.job.properties.DisableConcurrentBuildsJobProperty/>.

          Based on the above, I think the fix should be for Job DSL to generate the property rather than setting the deprecated concurrentBuild field. In other words, there seems to be a race in converting the deprecated field to the new-style property. If Job DSL simply used the new-style property in the first place, we would avoid the race and therefore we wouldn't put Pipeline in a pathological state that ultimate results in a FileNotFoundException.

          daspilker, do you have any concerns with this approach? If not, I will prepare a PR.

          Basil Crow added a comment - I verified my understanding of the above by checking config.xml after Job DSL had generated the job with concurrentBuild false . It had the <concurrentBuild>false</concurrentBuild> attribute. I then checked how this had been deserialized in the Script Console: println job.@concurrentBuild println job.concurrentBuild This printed null and false , which was what I expected. The field had been set to null, and the getter was instead relying on the property. Next, I called job.save() from the Script Console and checked the contents of config.xml again. Now, <concurrentBuild>false</concurrentBuild> was gone. In its place, in the properties section was <org.jenkinsci.plugins.workflow.job.properties.DisableConcurrentBuildsJobProperty/> . Based on the above, I think the fix should be for Job DSL to generate the property rather than setting the deprecated concurrentBuild field. In other words, there seems to be a race in converting the deprecated field to the new-style property. If Job DSL simply used the new-style property in the first place, we would avoid the race and therefore we wouldn't put Pipeline in a pathological state that ultimate results in a FileNotFoundException . daspilker , do you have any concerns with this approach? If not, I will prepare a PR.

          Basil Crow added a comment -

          Thinking about this some more, the fix might be even simpler. The Dynamic DSL already supports the new-style property. So all that we need to do in Job DSL is deprecated concurrentBuilds for Pipeline jobs and encourage users to migrate to the new property via the Dynamic DSL. In other words, users should convert this syntax …

          concurrentBuild false
          

          … to this syntax …

          properties {
            disableConcurrentBuilds()
          }
          

          I confirmed that this generates the new-style XML and that the "Do not allow concurrent builds" was checked when I viewed the generated job in the Jenkins UI.

          Basil Crow added a comment - Thinking about this some more, the fix might be even simpler. The Dynamic DSL already supports the new-style property. So all that we need to do in Job DSL is deprecated concurrentBuilds for Pipeline jobs and encourage users to migrate to the new property via the Dynamic DSL. In other words, users should convert this syntax … concurrentBuild false … to this syntax … properties { disableConcurrentBuilds() } I confirmed that this generates the new-style XML and that the "Do not allow concurrent builds" was checked when I viewed the generated job in the Jenkins UI.

          Nikita Bochenko added a comment - - edited

          basil thank you for extensive investigation. It seems highly plausible. I did observe multiple jobs running where there should not be, or a job stuck in running state although it has been completed. This also could explain why sometimes reload configuration helps to find "missing" jobs that are stuck in this state.

          We do generate some jobs via DSL and some directly from the scripted pipeline, so I'd need to check how concurrentBuild is set up there, I suspect it is also using ye olde way of setting concurrent builds values.

          One additional observation: it also happens on the jobs that are using Throttle Concurrent Builds plugin - the cause could be a similar one. Basically I just set something like these:

          throttleConcurrentBuilds {
            categories(['some-category'])
          }
          
          throttleConcurrentBuilds {   
            maxPerNode(2)
            maxTotal(8)
          }
          

           

          Nikita Bochenko added a comment - - edited basil  thank you for extensive investigation. It seems highly plausible. I did observe multiple jobs running where there should not be, or a job stuck in running state although it has been completed. This also could explain why sometimes reload configuration helps to find "missing" jobs that are stuck in this state. We do generate some jobs via DSL and some directly from the scripted pipeline, so I'd need to check how concurrentBuild is set up there, I suspect it is also using ye olde way of setting concurrent builds values. One additional observation: it also happens on the jobs that are using Throttle Concurrent Builds plugin - the cause could be a similar one. Basically I just set something like these: throttleConcurrentBuilds { categories([ 'some-category' ]) } throttleConcurrentBuilds { maxPerNode(2) maxTotal(8) }  

          I need to clarify: I believe we are explicitly setting concurrentBuild to either true or false for most, if not all, builds. Even the ones with throttleConcurrentBuilds - so it might not be related to the plugin itself. I will investigate this when I get some time - right now I am working on some projects that do not give me enough time to investigate this deeper.

          Nikita Bochenko added a comment - I need to clarify: I believe we are explicitly setting  concurrentBuild  to either true or false for most, if not all, builds. Even the ones with  throttleConcurrentBuilds - so it might not be related to the plugin itself. I will investigate this when I get some time - right now I am working on some projects that do not give me enough time to investigate this deeper.

          P.S.

          Had a quick look and in many places we are using setConcurrentProperty which is not marked deprecated in docs. This is not related to DSL, however it seems to me that this is the same bug, just in a different context.

          Nikita Bochenko added a comment - P.S. Had a quick look and in many places we are using setConcurrentProperty which is not marked deprecated in docs. This is not related to DSL, however it seems to me that this is the same bug, just in a different context.

          P.P.S.

          For freestyle jobs I have to set up {{concurrentBuild(true).}}I think by default freestyle jobs are not allowed concurrent execution. Not related to pipeline, of course, but feels inconsistent and may cause confusion?

          Nikita Bochenko added a comment - P.P.S. For freestyle jobs I have to set up {{concurrentBuild(true).}}I think by default freestyle jobs are not allowed concurrent execution. Not related to pipeline, of course, but feels inconsistent and may cause confusion?

          Basil Crow added a comment -

          I'm not surprised Throttle Concurrent Builds is on the scene of the crime here, but I don't want to jump to any conclusions yet without knowing more details about your configuration. It may or may not be related to the Pipeline race being described here. As an aside, if you're using Throttle Concurrent Builds with categories and Pipeline, you really should be using my patch from throttle-concurrent-builds-plugin#57, which improves CPU usage drastically and also has some correctness benefits. But that may or may not be related to this bug.

          Basil Crow added a comment - I'm not surprised Throttle Concurrent Builds is on the scene of the crime here, but I don't want to jump to any conclusions yet without knowing more details about your configuration. It may or may not be related to the Pipeline race being described here. As an aside, if you're using Throttle Concurrent Builds with categories and Pipeline, you really should be using my patch from throttle-concurrent-builds-plugin#57 , which improves CPU usage drastically and also has some correctness benefits. But that may or may not be related to this bug.

          Basil Crow added a comment -

          PS if you use that Throttle Concurrent Builds patch and find that it helps, please comment on the PR. I've been using it successfully for over 6 months and trying to get it merged/released (including asking for the privileges to merge/release it myself) but so far have received no response.

          Basil Crow added a comment - PS if you use that Throttle Concurrent Builds patch and find that it helps, please comment on the PR. I've been using it successfully for over 6 months and trying to get it merged/released (including asking for the privileges to merge/release it myself) but so far have received no response.

          Basil Crow added a comment -

          I've been running with the new syntax described in my previous comment for about a month, and this issue hasn't occurred again. Previously it occurred almost every time I deployed changes to a Job DSL pipeline.

          Basil Crow added a comment - I've been running with the new syntax described in my previous comment for about a month, and this issue hasn't occurred again. Previously it occurred almost every time I deployed changes to a Job DSL pipeline.

            daspilker Daniel Spilker
            basil Basil Crow
            Votes:
            3 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: