Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-50474

ArrayIndexOutOfBoundsException cause job "hang"

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • workflow-cps-plugin
    • Jenkins ver. 2.113
      Red-hat 6

      Since some time I'm getting jobs hand when executing node("master"), today I found this stack-trace in node description.

      Any ideas ?

      <wf.a.ErrorAction plugin="workflow-api@2.26">
      <error class="java.lang.ArrayIndexOutOfBoundsException">
      <detailMessage>0</detailMessage>
      <stackTrace>
      <trace>org.jenkinsci.plugins.workflow.cps.DSL$ThreadTaskImpl.invokeBody(DSL.java:588)</trace>
      <trace>org.jenkinsci.plugins.workflow.cps.DSL$ThreadTaskImpl.eval(DSL.java:559)</trace>
      <trace>org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:184)</trace>
      <trace>org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:331)</trace>
      <trace>org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$200(CpsThreadGroup.java:82)</trace>
      <trace>org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:243)</trace>
      <trace>org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:231)</trace>
      <trace>org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:64)</trace>
      <trace>java.util.concurrent.FutureTask.run(FutureTask.java:266)</trace>
      <trace>hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:131)</trace>
      <trace>jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)</trace>
      <trace>java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)</trace>
      <trace>java.util.concurrent.FutureTask.run(FutureTask.java:266)</trace>
      <trace>java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)</trace>
      <trace>java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)</trace>
      <trace>java.lang.Thread.run(Thread.java:748)</trace>
      </stackTrace>
      <suppressedExceptions class="java.util.Collections$UnmodifiableRandomAccessList" resolves-to="java.util.Collections$UnmodifiableList">
      <c class="list"/>
      <list reference="../c"/>
      </suppressedExceptions>
      </error>
      </wf.a.ErrorAction>
      

          [JENKINS-50474] ArrayIndexOutOfBoundsException cause job "hang"

          Andrew Bayer added a comment -

          Can you provide the Jenkinsfile that you're running, or a minimal reproduction case?

          svanoort - this looks to be FlowHead wackiness. Before I try to figure it out, anything jump out at you?

          Andrew Bayer added a comment - Can you provide the Jenkinsfile that you're running, or a minimal reproduction case? svanoort - this looks to be FlowHead wackiness. Before I try to figure it out, anything jump out at you?

          Sam Van Oort added a comment -

          abayer I don't see anything looking like it's FlowHead wackiness here, but lets get some more info:

          clockman Do you have Jenkins logs for this one? Anything like "failed to resume" with the job name? And does the build log show what durability mode was used here?

          Sam Van Oort added a comment - abayer I don't see anything looking like it's FlowHead wackiness here, but lets get some more info: clockman Do you have Jenkins logs for this one? Anything like "failed to resume" with the job name? And does the build log show what durability mode was used here?

          Andrew Bayer added a comment -

          Sorry svanoort - I meant the exception is here in DSL, which is FlowHead-related. Sorry for not being more clear/detailed - that's what I get for commenting from my phone. =)

          Andrew Bayer added a comment - Sorry svanoort - I meant the exception is here in DSL , which is FlowHead-related. Sorry for not being more clear/detailed - that's what I get for commenting from my phone. =)

          Piotr Zegar added a comment -

          I going to check logs tomorrow.

          Piotr Zegar added a comment - I going to check logs tomorrow.

          Piotr Zegar added a comment -

          Few things,

          I started to see these exceptions after I added usage of this function before every state:

           

          def isNodeActive()
          {
          try
          {
          def nodes = jenkins.model.Jenkins.getInstance().getNodes();
          def nodeName = "${NODE_NAME}";
          for(Slave node in nodes)
          {
          if (node.getNodeName() != nodeName)
          {
          continue;
          }
          
          if (node.getComputer().isAcceptingTasks())
          {
          return true;
          }
          }
          }
          catch (Exception e) {
          return false;
          }
          return false;
          }
          

          I running most of tests in single jobs mainly during night on 40+ servers.

           

          And wanted to stop starting new tests when servers run offline due to schedule.

          I also have separate jobs that run this code, to kill these hanging jobs (that hang on node allocation).

          import hudson.model.FreeStyleBuild
          import hudson.model.Job
          import hudson.model.Result
          import hudson.model.Run
          import java.util.Calendar
          import jenkins.model.Jenkins
          import org.jenkinsci.plugins.workflow.job.WorkflowRun
          import org.jenkinsci.plugins.workflow.support.steps.StageStepExecution
          
          //24 hours in a day, 3600 seconds in 1 hour, 1000 milliseconds in 1 second
          long time_in_millis = 24*3600*1000
          Calendar rightNow = Calendar.getInstance()
          
          for(Job job in Jenkins.instance.getAllItems(Job.class))
          {
          if (!job.isBuilding())
          {
          continue;
          }
          
          for(Run item in job.builds)
          {
          if (!item.isBuilding())
          {
          continue;
          }
          
          if ((rightNow.getTimeInMillis() - item.getStartTimeInMillis()) < time_in_millis)
          {
          continue;
          }
          
          if(item in WorkflowRun) {
          WorkflowRun run = (WorkflowRun) item
          //hard kill
          run.doKill()
          //release pipeline concurrency locks
          StageStepExecution.exit(run)
          println "Killed ${run}"
          } else if(item in FreeStyleBuild) {
          FreeStyleBuild run = (FreeStyleBuild) item
          run.executor.interrupt(Result.ABORTED)
          println "Killed ${run}"
          } else {
          println "WARNING: Don't know how to handle ${item.class}"
          }
          }
          }
          

          But above function shouldn't be executed in this case:

           

          [Pipeline] node
          Running on Jenkins in /var/lib/jenkins/workspace/checkCommit@46
          [Pipeline] End of Pipeline
          java.lang.ArrayIndexOutOfBoundsException
          Finished: FAILURE
          

          Above "node" is allocated in try - finally block, but looks like finally block where not executed.

          And then job hangs (still taking place in node - could be above script issue).

          Piotr Zegar added a comment - Few things, I started to see these exceptions after I added usage of this function before every state:   def isNodeActive() { try { def nodes = jenkins.model.Jenkins.getInstance().getNodes(); def nodeName = "${NODE_NAME}" ; for (Slave node in nodes) { if (node.getNodeName() != nodeName) { continue ; } if (node.getComputer().isAcceptingTasks()) { return true ; } } } catch (Exception e) { return false ; } return false ; } I running most of tests in single jobs mainly during night on 40+ servers.   And wanted to stop starting new tests when servers run offline due to schedule. – I also have separate jobs that run this code, to kill these hanging jobs (that hang on node allocation). import hudson.model.FreeStyleBuild import hudson.model.Job import hudson.model.Result import hudson.model.Run import java.util.Calendar import jenkins.model.Jenkins import org.jenkinsci.plugins.workflow.job.WorkflowRun import org.jenkinsci.plugins.workflow.support.steps.StageStepExecution //24 hours in a day, 3600 seconds in 1 hour, 1000 milliseconds in 1 second long time_in_millis = 24*3600*1000 Calendar rightNow = Calendar.getInstance() for (Job job in Jenkins.instance.getAllItems(Job.class)) { if (!job.isBuilding()) { continue ; } for (Run item in job.builds) { if (!item.isBuilding()) { continue ; } if ((rightNow.getTimeInMillis() - item.getStartTimeInMillis()) < time_in_millis) { continue ; } if (item in WorkflowRun) { WorkflowRun run = (WorkflowRun) item //hard kill run.doKill() //release pipeline concurrency locks StageStepExecution.exit(run) println "Killed ${run}" } else if (item in FreeStyleBuild) { FreeStyleBuild run = (FreeStyleBuild) item run.executor.interrupt(Result.ABORTED) println "Killed ${run}" } else { println "WARNING: Don't know how to handle ${item.class}" } } } – But above function shouldn't be executed in this case:   [Pipeline] node Running on Jenkins in / var /lib/jenkins/workspace/checkCommit@46 [Pipeline] End of Pipeline java.lang.ArrayIndexOutOfBoundsException Finished: FAILURE Above "node" is allocated in try - finally block, but looks like finally block where not executed. And then job hangs (still taking place in node - could be above script issue).

          Devin Nusbaum added a comment - - edited

          FWIW I ran into this trying to reproduce a totally different issue. The following (contrived) test in workflow-cps reproduces it somewhat consistently for me. Changing my nonBlocking step for a simpler block-scope step like timeout made the problem go away, so perhaps this only happens if the step execution is some kind of non-blocking execution that runs on a separate thread.

          @Issue("JENKINS-56890") // The otherwise unrelated issue I was investigating.
          @Test public void generalNonBlockingStepExecutionCpsThreadSafety() {
              rr.then(r -> {
                  WorkflowJob p = r.jenkins.createProject(WorkflowJob.class, "demo");
                  p.setDefinition(new CpsFlowDefinition(
                          "Map branches = [failFast: false]\n" +
                          "for (int i = 0; i < 250; i++) {\n" + // Increasing this value seems to make JENKINS-50474 more likely. 
                          "  branches[\"branch${i}\"] = {\n" +
                          "    nonBlocking {\n" +
                          "      sleep(time: Math.random() * 100, unit: 'MILLISECONDS')\n" +
                          "    }\n" +
                          "  }\n" +
                          "}\n" +
                          "parallel(branches)\n", false));
                  WorkflowRun b = r.buildAndAssertSuccess(p);
              });
          }
          
          /**
           * Step that uses {@link StepContext} in GeneralNonBlockingStepExecution.TailCall while not on the CPS VM thread
           * to demonstrate a concurrency issue in {@link CpsStepContext}.
           */
          public static class NonBlockingStep extends Step {
              @DataBoundConstructor
              public NonBlockingStep() { }
              @Override
              public StepExecution start(StepContext context) throws Exception {
                  return new Execution(context);
              }
              private static class Execution extends GeneralNonBlockingStepExecution {
                  public Execution(StepContext context) {
                      super(context);
                  }
                  @Override
                  public boolean start() throws Exception {
                      run(() -> {
                          getContext().newBodyInvoker().withCallback(new Callback()).start();
                      });
                      return false;
                  }
                  private final class Callback extends TailCall {
                      @Override
                      protected void finished(StepContext context) throws Exception {
                          Run<?, ?> run = context.get(Run.class);
                          TaskListener listener = context.get(TaskListener.class);
                          listener.getLogger().println("Finished executing nonBlockingStep in " + run);
                      }
                  }
              }
              @TestExtension("generalNonBlockingStepExecutionThreadSafety")
              public static class DescriptorImpl extends StepDescriptor {
                  @Override
                  public String getFunctionName() {
                      return "nonBlocking";
                  }
                  @Override
                  public boolean takesImplicitBlockArgument() {
                      return true;
                  }
                  @Override
                  public Set<? extends Class<?>> getRequiredContext() {
                      return ImmutableSet.of(Run.class, TaskListener.class);
                  }
              }
          }
          

          In addition to the exception in the issue description I have also seen this one:

          java.util.ConcurrentModificationException
              at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909)
              at java.util.ArrayList$Itr.next(ArrayList.java:859)
              at org.jenkinsci.plugins.workflow.cps.DSL$ThreadTaskImpl.invokeBody(DSL.java:623)
              at org.jenkinsci.plugins.workflow.cps.DSL$ThreadTaskImpl.eval(DSL.java:593)
              at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:184)
              at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:348)
              at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$200(CpsThreadGroup.java:93)
              at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:260)
              at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:248)
              at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:64)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:131)
              at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
              at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748)
          

          I've also seen the test just hang indefinitely and time out, and as far as I can tell from the thread dump nothing is running so I'm not sure what is happening in that case.

          Devin Nusbaum added a comment - - edited FWIW I ran into this trying to reproduce a totally different issue. The following (contrived) test in workflow-cps reproduces it somewhat consistently for me. Changing my nonBlocking step for a simpler block-scope step like timeout made the problem go away, so perhaps this only happens if the step execution is some kind of non-blocking execution that runs on a separate thread. @Issue( "JENKINS-56890" ) // The otherwise unrelated issue I was investigating. @Test public void generalNonBlockingStepExecutionCpsThreadSafety() { rr.then(r -> { WorkflowJob p = r.jenkins.createProject(WorkflowJob.class, "demo" ); p.setDefinition( new CpsFlowDefinition( "Map branches = [failFast: false ]\n" + " for ( int i = 0; i < 250; i++) {\n" + // Increasing this value seems to make JENKINS-50474 more likely. " branches[\" branch${i}\ "] = {\n" + " nonBlocking {\n" + " sleep(time: Math .random() * 100, unit: 'MILLISECONDS' )\n" + " }\n" + " }\n" + "}\n" + "parallel(branches)\n" , false )); WorkflowRun b = r.buildAndAssertSuccess(p); }); } /** * Step that uses {@link StepContext} in GeneralNonBlockingStepExecution.TailCall while not on the CPS VM thread * to demonstrate a concurrency issue in {@link CpsStepContext}. */ public static class NonBlockingStep extends Step { @DataBoundConstructor public NonBlockingStep() { } @Override public StepExecution start(StepContext context) throws Exception { return new Execution(context); } private static class Execution extends GeneralNonBlockingStepExecution { public Execution(StepContext context) { super (context); } @Override public boolean start() throws Exception { run(() -> { getContext().newBodyInvoker().withCallback( new Callback()).start(); }); return false ; } private final class Callback extends TailCall { @Override protected void finished(StepContext context) throws Exception { Run<?, ?> run = context.get(Run.class); TaskListener listener = context.get(TaskListener.class); listener.getLogger().println( "Finished executing nonBlockingStep in " + run); } } } @TestExtension( "generalNonBlockingStepExecutionThreadSafety" ) public static class DescriptorImpl extends StepDescriptor { @Override public String getFunctionName() { return "nonBlocking" ; } @Override public boolean takesImplicitBlockArgument() { return true ; } @Override public Set<? extends Class <?>> getRequiredContext() { return ImmutableSet.of(Run.class, TaskListener.class); } } } In addition to the exception in the issue description I have also seen this one: java.util.ConcurrentModificationException at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909) at java.util.ArrayList$Itr.next(ArrayList.java:859) at org.jenkinsci.plugins.workflow.cps.DSL$ThreadTaskImpl.invokeBody(DSL.java:623) at org.jenkinsci.plugins.workflow.cps.DSL$ThreadTaskImpl.eval(DSL.java:593) at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:184) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:348) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$200(CpsThreadGroup.java:93) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:260) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:248) at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:64) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:131) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) I've also seen the test just hang indefinitely and time out, and as far as I can tell from the thread dump nothing is running so I'm not sure what is happening in that case.

          I'm also having the same issue with hanging jobs. See error below:

           

           [Pipeline] End of Pipeline
           java.util.ConcurrentModificationException
           	at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909)
           	at java.util.ArrayList$Itr.next(ArrayList.java:859)
           	at org.jenkinsci.plugins.workflow.cps.DSL$ThreadTaskImpl.invokeBody(DSL.java:645)
           	at org.jenkinsci.plugins.workflow.cps.DSL$ThreadTaskImpl.eval(DSL.java:615)
           	at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:196)
           	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:370)
           	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$200(CpsThreadGroup.java:93)
           	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:282)
           	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:270)
           	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:66)
           	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
           	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:131)
           	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
           	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
           	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
           	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
           	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           	at java.lang.Thread.run(Thread.java:748)
           Finished: FAILURE
          

          Ebrahim Moshaya added a comment - I'm also having the same issue with hanging jobs. See error below:   [Pipeline] End of Pipeline java.util.ConcurrentModificationException at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909) at java.util.ArrayList$Itr.next(ArrayList.java:859) at org.jenkinsci.plugins.workflow.cps.DSL$ThreadTaskImpl.invokeBody(DSL.java:645) at org.jenkinsci.plugins.workflow.cps.DSL$ThreadTaskImpl.eval(DSL.java:615) at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:196) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:370) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$200(CpsThreadGroup.java:93) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:282) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:270) at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:66) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:131) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang. Thread .run( Thread .java:748) Finished: FAILURE

            Unassigned Unassigned
            clockman Piotr Zegar
            Votes:
            2 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: