Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-40057

Stage view is completely missing for long durations after restarts

      When running any pipeline job, I can see that the build progresses but stages blocks don't always show up on the main job screen (localhost:8080/job/pipeline1). Nor do I see stages when pressing 'Full stage view' (getting a white screen). The console, pipeline syntax and Jenkins logs are OK and show no interesting remarks, and the jobs run to completion.

      This behavior occurs at the beginning of a day; usually this is resolved during the day at noon/afternoon when running some pipeline job with stages (meaning, I can see the stage blocks, suddenly, apparently without having done anything). It should be noted that the Jenkins windows server (the OS) is being restarted every night at 10PM (this is mandatory and beyond my control). I'm not sure if this is related, as all other server and plugin functionalities are fine in the mornings.

      I reinstalled the stage view plugin several times now, to no avail.

      After extensive searches all over the net, I can't find anything on this particular issue and would like to ask for directions on how to proceed.

          [JENKINS-40057] Stage view is completely missing for long durations after restarts

          Sam Van Oort added a comment -

          nashpaz23 Okay, so it's relieving to hear that the issue is on 2.10 and not 2.11. I'm going to reverse my advice here after having a spark of insight – I think a plugin change isn't going to fix this one. This is linked to a deserialization issue we've seen in rare cases. The pattern fits perfectly.

          There's one (or a few) runs that are a problem. You can find them with this Groovy script script in the script console – warning: it will touch all the build data for every pipeline execution run, so do it during a quiet period and be patient for it to run.

          import org.jenkinsci.plugins.workflow.job.WorkflowJob;
          import org.jenkinsci.plugins.workflow.job.WorkflowRun;
          import org.jenkinsci.plugins.workflow.graph.FlowGraphWalker;
          import org.jenkinsci.plugins.workflow.graph.FlowNode;
          import org.jenkinsci.plugins.workflow.flow.FlowExecution;
          import org.jenkinsci.plugins.workflow.graph.FlowStartNode;
          
          List<FlowNode> failures = new ArrayList<FlowNode>();
          for (WorkflowJob job : Jenkins.instance.getAllItems(WorkflowJob.class)) {
              for (WorkflowRun run : job.builds) {
                  FlowExecution fe = run.execution;
                  if (fe == null || fe.currentHeads.size() == 0) { continue;}
                  FlowNode lastNode = fe.currentHeads.get(0);
                  if (lastNode.getParents().size() == 0) {
                      out.println("Parentless end flownode!!! [Job, Build, NodeId]: ["+job.fullName+", "+run.id+", "+fe.currentHeads.get(0).id+"]");
                      failures.add(f);
                      continue;
                  }
                  for (FlowNode f : new FlowGraphWalker(run.execution)) {
                      if (f.getActions().size() == 0 && !(f instanceof FlowStartNode)) {
                          out.println("Actionless FlowNode!!! [Job, Build, NodeID, class, isLast]: ["
                              +job.fullName+", "
                              +run.id+", "
                              +f.id+", "
                              +f.class.name.replace('org.jenkinsci.plugins.','o.j.p.')
                              +", "+(f.equals(lastNode))
                          +"]")
                          failures.add(f);
                      }
                  }
              }
          }
          

          If you remove the offending run(s) pipeline will just work. I believe this will be one more datapoint for fixing the issue in question, as well.

          Sam Van Oort added a comment - nashpaz23 Okay, so it's relieving to hear that the issue is on 2.10 and not 2.11. I'm going to reverse my advice here after having a spark of insight – I think a plugin change isn't going to fix this one. This is linked to a deserialization issue we've seen in rare cases. The pattern fits perfectly. There's one (or a few) runs that are a problem. You can find them with this Groovy script script in the script console – warning: it will touch all the build data for every pipeline execution run, so do it during a quiet period and be patient for it to run. import org.jenkinsci.plugins.workflow.job.WorkflowJob; import org.jenkinsci.plugins.workflow.job.WorkflowRun; import org.jenkinsci.plugins.workflow.graph.FlowGraphWalker; import org.jenkinsci.plugins.workflow.graph.FlowNode; import org.jenkinsci.plugins.workflow.flow.FlowExecution; import org.jenkinsci.plugins.workflow.graph.FlowStartNode; List<FlowNode> failures = new ArrayList<FlowNode>(); for (WorkflowJob job : Jenkins.instance.getAllItems(WorkflowJob.class)) { for (WorkflowRun run : job.builds) { FlowExecution fe = run.execution; if (fe == null || fe.currentHeads.size() == 0) { continue ;} FlowNode lastNode = fe.currentHeads.get(0); if (lastNode.getParents().size() == 0) { out.println( "Parentless end flownode!!! [Job, Build, NodeId]: [" +job.fullName+ ", " +run.id+ ", " +fe.currentHeads.get(0).id+ "]" ); failures.add(f); continue ; } for (FlowNode f : new FlowGraphWalker(run.execution)) { if (f.getActions().size() == 0 && !(f instanceof FlowStartNode)) { out.println( "Actionless FlowNode!!! [Job, Build, NodeID, class, isLast]: [" +job.fullName+ ", " +run.id+ ", " +f.id+ ", " +f. class. name.replace( 'org.jenkinsci.plugins.' , 'o.j.p.' ) + ", " +(f.equals(lastNode)) + "]" ) failures.add(f); } } } } If you remove the offending run(s) pipeline will just work. I believe this will be one more datapoint for fixing the issue in question, as well.

          Nash Paz added a comment -

          By 'runs' do you mean jobs? Or builds of the jobs?
          I'm in an off-the-internet environment so it'll be a bit of a mess to bring this in. I could just remove all the jobs, and delete all the builds (I'm the only one currently using pipelines). Should that fix the issue?

          Nash Paz added a comment - By 'runs' do you mean jobs? Or builds of the jobs? I'm in an off-the-internet environment so it'll be a bit of a mess to bring this in. I could just remove all the jobs, and delete all the builds (I'm the only one currently using pipelines). Should that fix the issue?

          Nash Paz added a comment - - edited

          No Dice
          I've run the script, it indeed identified a bunch of runs that were stuck/disruptive. I deleted all the pipelines in the server, made sure nothing remained, then Restarted the Jenkins service. A new pipeline shows the stage view, but still after the nightly restart the stage view is gone (for pipelines that existed before the restart).
          Right now I'm dropping this issue as I truly can't find time to deal with it. The log is enough for the DevOps team, for management showoffs we can use the workaround.

          Nash Paz added a comment - - edited No Dice I've run the script, it indeed identified a bunch of runs that were stuck/disruptive. I deleted all the pipelines in the server, made sure nothing remained, then Restarted the Jenkins service. A new pipeline shows the stage view, but still after the nightly restart the stage view is gone (for pipelines that existed before the restart). Right now I'm dropping this issue as I truly can't find time to deal with it. The log is enough for the DevOps team, for management showoffs we can use the workaround.

          Joachim Herb added a comment - - edited

          I see exactly the same behavior.
          Jenkins: 2.44
          stage view plugin: 2.4
          Windows 7

          Perhaps interesting information: I changed

          -Dcom.cloudbees.workflow.rest.external.JobExt.maxRunsPerJob=50
          

          in jenkins.xml. After that the stage view was empty all the time.
          Then I changed it to

          -Dcom.cloudbees.workflow.rest.external.JobExt.maxRunsPerJob=5
          

          and restarted Jenkins.
          Until and including the completion of the 5th build of the pipeline job, the stage view did not show up. But after I started the sixth build, it worked again and also showed the first 5 builds.

          For whatever reasons, on Linux it seems to work (even with 50 builds to be shown in the stage view).

          Joachim Herb added a comment - - edited I see exactly the same behavior. Jenkins: 2.44 stage view plugin: 2.4 Windows 7 Perhaps interesting information: I changed -Dcom.cloudbees.workflow. rest .external.JobExt.maxRunsPerJob=50 in jenkins.xml. After that the stage view was empty all the time. Then I changed it to -Dcom.cloudbees.workflow. rest .external.JobExt.maxRunsPerJob=5 and restarted Jenkins. Until and including the completion of the 5th build of the pipeline job, the stage view did not show up. But after I started the sixth build, it worked again and also showed the first 5 builds. For whatever reasons, on Linux it seems to work (even with 50 builds to be shown in the stage view).

          Joachim Herb added a comment -

          Joachim Herb added a comment - possible duplicate of https://issues.jenkins-ci.org/browse/JENKINS-40096

          Same issue here, on Jenkins 2.45. Just migrated all my old chained jobs to pipeline scripts to discover they stop working after a restart. =[
          Any tips here will be nice, thanks.

          Marcos Brigante added a comment - Same issue here, on Jenkins 2.45. Just migrated all my old chained jobs to pipeline scripts to discover they stop working after a restart. =[ Any tips here will be nice, thanks.

          Joachim Herb added a comment -

          Joachim Herb added a comment - Possible duplicate of https://issues.jenkins-ci.org/browse/JENKINS-39143

          Clint Chapman added a comment -

          I think this is a duplicate of https://issues.jenkins-ci.org/browse/JENKINS-39143

          It appears some state is intermittently not being persisted to files.  The error doesn't occur until the jenkins master is restarted and it tries to reload state from disk.  It also appears this might be a windows only problem.  I don't believe this is a client/browser side issue.

          Clint Chapman added a comment - I think this is a duplicate of  https://issues.jenkins-ci.org/browse/JENKINS-39143 It appears some state is intermittently not being persisted to files.  The error doesn't occur until the jenkins master is restarted and it tries to reload state from disk.  It also appears this might be a windows only problem.  I don't believe this is a client/browser side issue.

          It not only appears on windows, but on our Linux Jenkins instance too.

          Deleting the mentioned build from the history helps to show the stage view again, but that s not exactly a solution to the cause of the problem.

          Daniel Geißler added a comment - It not only appears on windows, but on our Linux Jenkins instance too. Deleting the mentioned build from the history helps to show the stage view again, but that s not exactly a solution to the cause of the problem.

          Mor L added a comment -

          possible duplicate according to the log

          Mor L added a comment - possible duplicate according to the log

            Unassigned Unassigned
            nashpaz23 Nash Paz
            Votes:
            13 Vote for this issue
            Watchers:
            18 Start watching this issue

              Created:
              Updated: