Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-68254

Jenkins build queue is cleared after restart (regression in 2.333)

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • core
    • Jenkins 2.332.2 LTS JDK11
    • 2.343

      After a Jenkins update from 2.319.3 LTS to 2.332.2 LTS we have noticed that when shutting down the Jenkins docker container the build queue is no longer preserved and all scheduled builds are lost when starting Jenkins again. When rolling back to 2.319.3 it is working again as expected.

      Observed behavior:

      • Jenkins build queue (/var/jenkins_home/queue.xml) is cleared when shutting down the jenkins container

      Expected behavior:

      • Jenkins build queue is preserved in /var/jenkins_home/queue.xml when shutting down the jenkins container and loaded at the next Jenkins startup

       

      #########
      #Steps to reproduce
      #########

      1. Run the Jenkins docker container with tag 2.332.2-lts-jdk11

      docker run -d --name jenkins -p 8080:8080 -p 50000:50000 --restart always -v jenkins_home:/var/jenkins_home jenkins/jenkins:2.332.2-lts-jdk11

       

      2. Install the "Schedule Build" plugin

      3. Create a simple testjob

      4. Navigate to the created testjob. Press the "Schedule Build" button. Then enter a time and date which is in the future (i.e. in 1 hour) and press "Schedule". You will see a job in the build queue waiting to be executed at the time you have entered before.

      5. Monitor the file /var/jenkins_home/queue.xml wherever you have mounted the volume outside the container. The queue.xml isn't created immediately, but should be created within a minute after the job has been scheduled and has the information of your scheduled build.

      <?xml version='1.1' encoding='UTF-8'?>
      <hudson.model.Queue_-State>
        <counter>18</counter>
        <items>
          <hudson.model.Queue_-WaitingItem>
            <actions>
              <hudson.model.CauseAction>
                <causeBag class="linked-hash-map">
                  <entry>
                    <hudson.model.Cause_-UserIdCause>
                      <userId>admin</userId>
                    </hudson.model.Cause_-UserIdCause>
                    <int>1</int>
                  </entry>
                </causeBag>
              </hudson.model.CauseAction>
            </actions>
            <id>18</id>
            <task class="hudson.model.FreeStyleProject">testjob</task>
            <inQueueSince>1649775922437</inQueueSince>
            <timestamp>
              <time>1649800801433</time>
              <timezone>Etc/UTC</timezone>
            </timestamp>
          </hudson.model.Queue_-WaitingItem>
        </items> 

      6. Stop the Jenkins container

      docker stop jenkins

      7. Monitor /var/jenkins_home/queue.xml in your volume mount again. The file has been altered, and the information for the scheduled job inside the xml structure is gone

      <?xml version='1.1' encoding='UTF-8'?>
      <hudson.model.Queue_-State>
        <counter>18</counter>
        <items/>
      </hudson.model.Queue_-State>

      8. Start the Jenkins container

      docker start jenkins

      9. Jenkins Build queue is empty

       

          [JENKINS-68254] Jenkins build queue is cleared after restart (regression in 2.333)

          Basil Crow added a comment -

          You have not provided enough information for others to be able to help you. I followed your instructions to the letter, and at Step 4 ("Schedule a build for the testjob") my build completed without creating a queue.xml file.

          Provide detailed instructions to reproduce the problem from scratch.

          Basil Crow added a comment - You have not provided enough information for others to be able to help you. I followed your instructions to the letter, and at Step 4 ("Schedule a build for the testjob") my build completed without creating a queue.xml file. Provide detailed instructions to reproduce the problem from scratch .

          basil I've updated the description for Step 4. The job needs to be in the queue and waiting to be executed. I hope it's more clear now.

          Christian Stoff added a comment - basil I've updated the description for Step 4. The job needs to be in the queue and waiting to be executed. I hope it's more clear now.

          Basil Crow added a comment -

          I followed your instructions to the letter, but I cannot reproduce this problem. At step 7, I see the correct queue with the job still enqueued. And we also have an integration test that covers this scenario, which is passing. I have no idea what is going wrong on your setup.

          If you can reproduce this reliably, check to see if the problem occurs on the latest weekly. And if it doesn't, try cherry-picking commit a6fa7b3abd onto jenkins-2.332.2 and see if that commit makes a difference.

          Basil Crow added a comment - I followed your instructions to the letter, but I cannot reproduce this problem. At step 7, I see the correct queue with the job still enqueued. And we also have an integration test that covers this scenario, which is passing. I have no idea what is going wrong on your setup. If you can reproduce this reliably, check to see if the problem occurs on the latest weekly. And if it doesn't, try cherry-picking commit a6fa7b3abd onto jenkins-2.332.2 and see if that commit makes a difference.

          Basil Crow added a comment -

          After adding this debug code

          diff --git a/core/src/main/java/hudson/model/Queue.java b/core/src/main/java/hudson/model/Queue.java
          index e0abed68cc..b937cfc231 100644
          --- a/core/src/main/java/hudson/model/Queue.java
          +++ b/core/src/main/java/hudson/model/Queue.java
          @@ -465,10 +465,14 @@ public class Queue extends ResourceController implements Saveable {
                           if (item.task instanceof TransientTask)  continue;
                           state.items.add(item);
                       }
          +            if (state.items.isEmpty()) {
          +                new Throwable("Saving empty queue!").printStackTrace(System.err);
          +            }
          

          I am able to see the problem:

          2022-04-12 18:37:56.887+0000 [id=26]    INFO    winstone.Logger#logInternal: JVM is terminating. Shutting down Jetty
          java.lang.Throwable: Saving empty queue!
                  at hudson.model.Queue.save(Queue.java:469)
                  at jenkins.model.Jenkins._cleanUpPersistQueue(Jenkins.java:3835)
                  at jenkins.model.Jenkins.cleanUp(Jenkins.java:3590)
                  at hudson.lifecycle.Lifecycle.lambda$new$0(Lifecycle.java:63)
                  at java.base/java.lang.Thread.run(Thread.java:829)
          

          This stack trace is coming from our cleanup handler, not Winstone's. Our cleanup handler is running at the wrong point in the lifecycle and is saving an empty queue.

          This bug was introduced in commit 1a42044347 (https://github.com/jenkinsci/jenkins/pull/6230) in Jenkins 2.333. It was reverted in commit a6fa7b3abd (https://github.com/jenkinsci/jenkins/pull/6454) in 2.343. I verified that after cherry-picking commit a6fa7b3abd onto jenkins-2.332.2 that I could no longer reproduce the problem. Therefore I am marking this bug as resolved.

          Unfortunately the offending commit 1a42044347 was cherry-picked onto stable-2.332 as commit 3097bb612a and is present in 2.332.1 and 2.332.2. Therefore I have added the lts-candidate label to this PR in order to request that commit a6fa7b3abd be backported to stable-2.332 and released in 2.332.3.

          Basil Crow added a comment - After adding this debug code diff --git a/core/src/main/java/hudson/model/Queue.java b/core/src/main/java/hudson/model/Queue.java index e0abed68cc..b937cfc231 100644 --- a/core/src/main/java/hudson/model/Queue.java +++ b/core/src/main/java/hudson/model/Queue.java @@ -465,10 +465,14 @@ public class Queue extends ResourceController implements Saveable { if (item.task instanceof TransientTask) continue; state.items.add(item); } + if (state.items.isEmpty()) { + new Throwable("Saving empty queue!").printStackTrace(System.err); + } I am able to see the problem: 2022-04-12 18:37:56.887+0000 [id=26] INFO winstone.Logger#logInternal: JVM is terminating. Shutting down Jetty java.lang.Throwable: Saving empty queue! at hudson.model.Queue.save(Queue.java:469) at jenkins.model.Jenkins._cleanUpPersistQueue(Jenkins.java:3835) at jenkins.model.Jenkins.cleanUp(Jenkins.java:3590) at hudson.lifecycle.Lifecycle.lambda$new$0(Lifecycle.java:63) at java.base/java.lang.Thread.run(Thread.java:829) This stack trace is coming from our cleanup handler, not Winstone's. Our cleanup handler is running at the wrong point in the lifecycle and is saving an empty queue. This bug was introduced in commit 1a42044347 ( https://github.com/jenkinsci/jenkins/pull/6230 ) in Jenkins 2.333. It was reverted in commit a6fa7b3abd ( https://github.com/jenkinsci/jenkins/pull/6454 ) in 2.343. I verified that after cherry-picking commit a6fa7b3abd onto jenkins-2.332.2 that I could no longer reproduce the problem. Therefore I am marking this bug as resolved. Unfortunately the offending commit 1a42044347 was cherry-picked onto stable-2.332 as commit 3097bb612a and is present in 2.332.1 and 2.332.2. Therefore I have added the lts-candidate label to this PR in order to request that commit a6fa7b3abd be backported to stable-2.332 and released in 2.332.3.

          Basil Crow added a comment - - edited

          To summarize, the problem exists in 2.333 through 2.342 inclusive as well as 2.332.1 and 2.332.2. We will need verification prior to backporting this to 2.332.3. chri4774 reinholdfuereder Would you be able to verify the fix? You can do this by upgrading to 2.343, or else cherry-picking commit a6fa7b3abd onto stable-2.332 and upgrading to the resulting WAR. Either way, confirmation that https://github.com/jenkinsci/jenkins/pull/6454 resolved the issue will help facilitate the backport to 2.332.3. Thanks!

          Basil Crow added a comment - - edited To summarize, the problem exists in 2.333 through 2.342 inclusive as well as 2.332.1 and 2.332.2. We will need verification prior to backporting this to 2.332.3. chri4774 reinholdfuereder Would you be able to verify the fix? You can do this by upgrading to 2.343, or else cherry-picking commit a6fa7b3abd onto stable-2.332 and upgrading to the resulting WAR. Either way, confirmation that https://github.com/jenkinsci/jenkins/pull/6454 resolved the issue will help facilitate the backport to 2.332.3. Thanks!

          basil thanks for the fast response! I've just tested it with the jenkins/jenkins:2.343-jdk11 docker image and can confirm that this new version fixed the issue.

          Christian Stoff added a comment - basil  thanks for the fast response! I've just tested it with the jenkins/jenkins:2.343-jdk11 docker image and can confirm that this new version fixed the issue.

          Basil Crow added a comment -

          Thanks chri4774 for confirming that the issue has been resolved.

          Basil Crow added a comment - Thanks chri4774 for confirming that the issue has been resolved.

          Basil Crow added a comment -

          This fix should be present in the forthcoming 2.332.3 release candidate, but for anyone who wants to try a preview, this incremental is available:

          https://repo.jenkins-ci.org/incrementals/org/jenkins-ci/main/jenkins-war/2.332.3-rc32044.d35b_42c4e6e3/

          Basil Crow added a comment - This fix should be present in the forthcoming 2.332.3 release candidate, but for anyone who wants to try a preview, this incremental is available: https://repo.jenkins-ci.org/incrementals/org/jenkins-ci/main/jenkins-war/2.332.3-rc32044.d35b_42c4e6e3/

            basil Basil Crow
            chri4774 Christian Stoff
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: