Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-33926

Jenkins no longer appears to shutdown correctly

    XMLWordPrintable

Details

    Description

      Start Jenkins with java -jar jenkins-2.0-beta-1.jar on windows

      In the console hit ctrl+c

      expected behaviour

      Jenkins does a graceful shutdown and this is appropriately logged

      actual behaviour

      it appears as though Jenkins is brutally terminated and does not do a graceful shutdown.
      There are no logs to indicate a graceful termination

      In 1.x I see the following logs
      Mar 31, 2016 12:02:39 PM winstone.Logger logInternal INFO: JVM is terminating. Shutting down Winstone

      in 2.x I do not see those entries - and there is nothing to indicate in the log that Jenkins is shutting down gracefully

      Attachments

        Issue Links

          Activity

            teilo James Nord added a comment - - edited

            whilst this sometimes occurs in 1.x testing (<10% of the time) during my 2.0 testing I get it > 90% of the time in 2.x implying something is worse somewhere.

            teilo James Nord added a comment - - edited whilst this sometimes occurs in 1.x testing (<10% of the time) during my 2.0 testing I get it > 90% of the time in 2.x implying something is worse somewhere.

            Critical to investigate for RC. If this is only a problem for interactive runs, we could probably live with it for GA, but if this is affecting production instances running as services, that would be really nasty.

            swashbuck1r Spike Washburn added a comment - Critical to investigate for RC. If this is only a problem for interactive runs, we could probably live with it for GA, but if this is affecting production instances running as services, that would be really nasty.
            danielbeck Daniel Beck added a comment -

            Same issue for me on OS X so does not seem limited to Windows.

            If this means that e.g. service restarts don't do a clean restart, and e.g. lose the queue contents, that would be bad.

            If it's just a case of Jetty 9 no longer logging this, we'll survive

            danielbeck Daniel Beck added a comment - Same issue for me on OS X so does not seem limited to Windows. If this means that e.g. service restarts don't do a clean restart, and e.g. lose the queue contents, that would be bad. If it's just a case of Jetty 9 no longer logging this, we'll survive

            While investigating, I didn't see any error show up consistently before missing the log statement. I didn't see this occur as often on 2.0 before clicking the "Install New Features". Luckily it doesn't seem to be disruptive to my Jenkins, so that's a plus. Trying to see if there's a problem after "Installing new features" or running it under heavy stress.

            kwhetstone Kristin Whetstone added a comment - While investigating, I didn't see any error show up consistently before missing the log statement. I didn't see this occur as often on 2.0 before clicking the "Install New Features". Luckily it doesn't seem to be disruptive to my Jenkins, so that's a plus. Trying to see if there's a problem after "Installing new features" or running it under heavy stress.
            danielbeck Daniel Beck added a comment -

            Would be interesting to know whether the shutdown procedure (that includes saving the queue to disk) gets executed every time, log statement or not.

            danielbeck Daniel Beck added a comment - Would be interesting to know whether the shutdown procedure (that includes saving the queue to disk) gets executed every time, log statement or not.

            Right, the investigation needs to prove/disprove if the shutdown sequence is consistently running as expected. Are there some clear signatures to this that Kristin could rely on? Examples: Some expected file update times change, a debug message that a Servlet.destroy got called, etc.

            swashbuck1r Spike Washburn added a comment - Right, the investigation needs to prove/disprove if the shutdown sequence is consistently running as expected. Are there some clear signatures to this that Kristin could rely on? Examples: Some expected file update times change, a debug message that a Servlet.destroy got called, etc.
            danielbeck Daniel Beck added a comment -

            swashbuck1r queue.xml gets written (possibly only when there are items in the queue, not sure).

            danielbeck Daniel Beck added a comment - swashbuck1r queue.xml gets written (possibly only when there are items in the queue, not sure).
            kwhetstone Kristin Whetstone added a comment - - edited

            I think this (JENKINS-30909) is similar since it's referring to a missing queue log.

            In testing where I physically go through the whole computer shutdown process, it appears that things run properly. (I find a properly timestamped queue.xml.bak file) When I just stop the process from the command line, I don't see any queue.xml file. I assume this case falls under this defect.

            So far, the behavior I've seen echos this work item. I'll have to look to a different mechanism (James Nord brought up TermMilestone) to see if there's another issue.

            kwhetstone Kristin Whetstone added a comment - - edited I think this ( JENKINS-30909 ) is similar since it's referring to a missing queue log. In testing where I physically go through the whole computer shutdown process, it appears that things run properly. (I find a properly timestamped queue.xml.bak file) When I just stop the process from the command line, I don't see any queue.xml file. I assume this case falls under this defect. So far, the behavior I've seen echos this work item. I'll have to look to a different mechanism (James Nord brought up TermMilestone) to see if there's another issue.
            kwhetstone Kristin Whetstone added a comment - - edited

            After installing Jenkins 2.0 as a Windows service and running a heavy mock load, I'm still seeing the queue.xml file appear. Since the service is able to start up earlier than I can access my file system, I know this file is created because of queue.xml.bak which isn't removed & contains the same information.

            The logs left in jenkins.out.log and jenkins.err.log don't have any messages for shutting down, but I have a feeling they're overwritten on service restart, so they're likely not useful. Checking the log on the Jenkins instance also only covers the current run.

            I linked to JENKINS-30909 to highlight the behavior I'm seeing as part of restarting/destroying-recreating the service. Nothing really appears to be out of the ordinary.

            From what I can tell during these tests, I don't think this is a blocker issue for 2.0 release.

            kwhetstone Kristin Whetstone added a comment - - edited After installing Jenkins 2.0 as a Windows service and running a heavy mock load, I'm still seeing the queue.xml file appear. Since the service is able to start up earlier than I can access my file system, I know this file is created because of queue.xml.bak which isn't removed & contains the same information. The logs left in jenkins.out.log and jenkins.err.log don't have any messages for shutting down, but I have a feeling they're overwritten on service restart, so they're likely not useful. Checking the log on the Jenkins instance also only covers the current run. I linked to JENKINS-30909 to highlight the behavior I'm seeing as part of restarting/destroying-recreating the service. Nothing really appears to be out of the ordinary. From what I can tell during these tests, I don't think this is a blocker issue for 2.0 release.
            danielbeck Daniel Beck added a comment -

            When I just stop the process from the command line, I don't see any queue.xml file. I assume this case falls under this defect.

            Would be interesting to know whether a Linux service behaves differently in shutting down. Not sure which signals each sends. Loss of the queue on every shutdown would be a regression.

            danielbeck Daniel Beck added a comment - When I just stop the process from the command line, I don't see any queue.xml file. I assume this case falls under this defect. Would be interesting to know whether a Linux service behaves differently in shutting down. Not sure which signals each sends. Loss of the queue on every shutdown would be a regression.
            teilo James Nord added a comment - - edited

            kwhetstoneno need to restart the OS, just start stop the service either via the SCM or sc stop jenkins so you always have access to the filesystem

            teilo James Nord added a comment - - edited kwhetstone no need to restart the OS, just start stop the service either via the SCM or sc stop jenkins so you always have access to the filesystem

            So I ran through a number of scenarios on Windows and Linux on 2.0 and 1.X and it appears that for Windows we never actually save the queue when we stop the service. Linux is fine; I started and stopped the service while every executor is used, and everything is saved and restarted. I think this might be a difference in the command signals sent to the service when stopping in Windows vs Linux, but that's just a theory at this point.

            As to other shutdown actions taken when the system is shutting down, since I'm not able to replicate the initial condition, all of my exit conditions (TermMilestones, etc) are hit. Other than this queue conditional, I don't see anything odd with the service.

            kwhetstone Kristin Whetstone added a comment - So I ran through a number of scenarios on Windows and Linux on 2.0 and 1.X and it appears that for Windows we never actually save the queue when we stop the service. Linux is fine; I started and stopped the service while every executor is used, and everything is saved and restarted. I think this might be a difference in the command signals sent to the service when stopping in Windows vs Linux, but that's just a theory at this point. As to other shutdown actions taken when the system is shutting down, since I'm not able to replicate the initial condition, all of my exit conditions (TermMilestones, etc) are hit. Other than this queue conditional, I don't see anything odd with the service.

            Summary of test results from Kristin:

            • Does the message get printed on shutdown? (Windows: 100% appears consistently, Linux: 100% appears consistently)
            • Does the shutdown sequence seem to complete?
            • Marker:queue.xml gets saved (yes on Linux, yes on Windows cmdline, no on Windows service). No on windows service seems consistent with 1.x line of Jenkins.
            • Marker:plugin shutdown() is being called (yes on Linux and Windows)

            Decision [Daniel, Kristin]: At this time, there's no indication of a regression on beta2 vs 1.x, so this is not a stop-ship

            swashbuck1r Spike Washburn added a comment - Summary of test results from Kristin: Does the message get printed on shutdown? (Windows: 100% appears consistently, Linux: 100% appears consistently) Does the shutdown sequence seem to complete? Marker:queue.xml gets saved (yes on Linux, yes on Windows cmdline, no on Windows service). No on windows service seems consistent with 1.x line of Jenkins. Marker:plugin shutdown() is being called (yes on Linux and Windows) Decision [Daniel, Kristin] : At this time, there's no indication of a regression on beta2 vs 1.x, so this is not a stop-ship

            Next steps:

            • Daniel will add testcase for this for the RC and future LTSs.
            • At this time, this issues appears to be not reproducible...Daniel will close.
            • Kristin will open another ticket about the problem where queue.xml doesn't get saved via Windows Service shutdown.
            swashbuck1r Spike Washburn added a comment - Next steps: Daniel will add testcase for this for the RC and future LTSs. At this time, this issues appears to be not reproducible...Daniel will close. Kristin will open another ticket about the problem where queue.xml doesn't get saved via Windows Service shutdown.
            danielbeck Daniel Beck added a comment -

            Resolving as Cannot Reproduce for now. I'll add test cases to RC/LTS testing plans so we make sure to check this for the 2.0 RC.

            danielbeck Daniel Beck added a comment - Resolving as Cannot Reproduce for now. I'll add test cases to RC/LTS testing plans so we make sure to check this for the 2.0 RC.
            teilo James Nord added a comment -

            While investigating, I didn't see any error show up consistently before missing the log statement. I didn't see this occur as often on 2.0 before clicking the "Install New Features". Luckily it doesn't seem to be disruptive to my Jenkins, so that's a plus.

            Seems to imply it is reproduceable from the command line?

            Anyway I can reproduce it so you can always assign it to me

            teilo James Nord added a comment - While investigating, I didn't see any error show up consistently before missing the log statement. I didn't see this occur as often on 2.0 before clicking the "Install New Features". Luckily it doesn't seem to be disruptive to my Jenkins, so that's a plus. Seems to imply it is reproduceable from the command line? Anyway I can reproduce it so you can always assign it to me
            svanoort Sam Van Oort added a comment -

            100% reproducible with the 2.0-rc WAR running on Mac.

            Test case:

            • Create a pipeline job

            node {
            echo 'stuffs'
            sleep 100
            stage 'Stage 2'
            }

            • Create a freestyle job, set to allow concurrent builds, with the following shell step:
              echo 'I do stuff'
              sleep 100
            • Start up several builds of the pipeline job, to consume executors.
            • Queue up the freestyle project
            • Shutdown jenkins via Ctrl+C

            Result:

            • Command line reports winstone/jetty shutting down
            • queue.xml & queue.xml.bak files are created but contain no information about the queued build or builds. Pipelines that were running will restart however.
            • if Jenkins is started again, build will not run from queue. Indeed it is as if the freestyle project was never queued: new builds will use its build #, no record it ever existed.
            svanoort Sam Van Oort added a comment - 100% reproducible with the 2.0-rc WAR running on Mac. Test case: Create a pipeline job node { echo 'stuffs' sleep 100 stage 'Stage 2' } Create a freestyle job, set to allow concurrent builds, with the following shell step: echo 'I do stuff' sleep 100 Start up several builds of the pipeline job, to consume executors. Queue up the freestyle project Shutdown jenkins via Ctrl+C Result: Command line reports winstone/jetty shutting down queue.xml & queue.xml.bak files are created but contain no information about the queued build or builds. Pipelines that were running will restart however. if Jenkins is started again, build will not run from queue. Indeed it is as if the freestyle project was never queued: new builds will use its build #, no record it ever existed.
            svanoort Sam Van Oort added a comment -

            Reopening because it exists in a painful and reproducible way (see previous comments); the entire build queue is lost on shutdown.

            svanoort Sam Van Oort added a comment - Reopening because it exists in a painful and reproducible way (see previous comments); the entire build queue is lost on shutdown.

            James, I'm sending it over to you since I couldn't reproduce it on beta while Ctrl-C or while stopping the service. I'm going to see if I can reproduce it on the RC, but I must not be going through the correct steps or something.

            Note: the part about losing the job queue when restarted as a Windows service looks like an older problem. I definitely think that this should be a bug if it's not already.

            kwhetstone Kristin Whetstone added a comment - James, I'm sending it over to you since I couldn't reproduce it on beta while Ctrl-C or while stopping the service. I'm going to see if I can reproduce it on the RC, but I must not be going through the correct steps or something. Note: the part about losing the job queue when restarted as a Windows service looks like an older problem. I definitely think that this should be a bug if it's not already.

            Sam, it sounds like your defect is something different than what the original problem covered by this work item. While the queue is saved, it's not saving everything. That might be a bug in the queue.save(). In the Windows version it's not getting saved at all during Ctrl-C. We should open a ticket for that problem so it could be fixed.

            kwhetstone Kristin Whetstone added a comment - Sam, it sounds like your defect is something different than what the original problem covered by this work item. While the queue is saved, it's not saving everything. That might be a bug in the queue.save(). In the Windows version it's not getting saved at all during Ctrl-C. We should open a ticket for that problem so it could be fixed.
            svanoort Sam Van Oort added a comment -

            Yeah, I can't tell for sure if it's the same issue, something similar, or these are all different manifestations of the same underlying cause.

            The previous issue with losing the build queue apparently had some sort of handling with nextBuildNumber that resolved it (I think?). This one seems to be some combination of shutdown logging & queue writing, which could also be two separate issues as well.

            svanoort Sam Van Oort added a comment - Yeah, I can't tell for sure if it's the same issue, something similar, or these are all different manifestations of the same underlying cause. The previous issue with losing the build queue apparently had some sort of handling with nextBuildNumber that resolved it (I think?). This one seems to be some combination of shutdown logging & queue writing, which could also be two separate issues as well.

            To me, the file showing up in the first place and that it's successfully read on startup means the file was created correctly (there's no missing information when being written and it's not formatted incorrectly) it might just be incomplete. Obviously I'm making some assumptions on the queue behavior, and spent some time trying to go through where the queue was written so figure out how it came into being. Apparently there are 2 types of shutdown: one that writes the queue and one that's "quick" not counting the error case, the secret not-so-fun shutdown. From what I understand the error case just completely tanks jetty, et al. If it's able to save state enough to pick back up on a restart, I think that's slightly better than not having anything at all.

            I'd check out if the queue being incomplete issue exists on the latest 1.x release. That will at least give some context into whether it's something new or not. Either way it's not good, but at least it's not worse!

            kwhetstone Kristin Whetstone added a comment - To me, the file showing up in the first place and that it's successfully read on startup means the file was created correctly (there's no missing information when being written and it's not formatted incorrectly) it might just be incomplete. Obviously I'm making some assumptions on the queue behavior, and spent some time trying to go through where the queue was written so figure out how it came into being. Apparently there are 2 types of shutdown: one that writes the queue and one that's "quick" not counting the error case, the secret not-so-fun shutdown. From what I understand the error case just completely tanks jetty, et al. If it's able to save state enough to pick back up on a restart, I think that's slightly better than not having anything at all. I'd check out if the queue being incomplete issue exists on the latest 1.x release. That will at least give some context into whether it's something new or not. Either way it's not good, but at least it's not worse!
            svanoort Sam Van Oort added a comment -

            Oh! I'd tested on CJE 1.625.16.1 - the queue is maintained when killed by Ctrl+C. Just rechecked, and that is also true on 1.642.x - so it's clearly a regression since then.

            To be clear: we're not saving state to pick back up on restart at all. The queue.xml is empty of items, this is one sample:

            <?xml version='1.0' encoding='UTF-8'?>
            <hudson.model.Queue_-State>
              <counter>42</counter>
              <items/>
            </hudson.model.Queue_-State>
            
            svanoort Sam Van Oort added a comment - Oh! I'd tested on CJE 1.625.16.1 - the queue is maintained when killed by Ctrl+C. Just rechecked, and that is also true on 1.642.x - so it's clearly a regression since then. To be clear: we're not saving state to pick back up on restart at all. The queue.xml is empty of items, this is one sample: <?xml version= '1.0' encoding= 'UTF-8' ?> <hudson.model.Queue_-State> <counter>42</counter> <items/> </hudson.model.Queue_-State>
            danielbeck Daniel Beck added a comment -

            Hmmm… could this be related to JENKINS-34029?

            danielbeck Daniel Beck added a comment - Hmmm… could this be related to JENKINS-34029 ?

            Cool, so that's actually a regression from beta2 where that information was saved. I think that's a completely separate defect since this jetty issue happened before then. Good find during TestFest!

            kwhetstone Kristin Whetstone added a comment - Cool, so that's actually a regression from beta2 where that information was saved. I think that's a completely separate defect since this jetty issue happened before then. Good find during TestFest!
            teilo James Nord added a comment -

            Not saving the queue on windows was fixed recently on master.
            I'm unassigning from myself for the moment as I'm not looking at this. If I get to look at it and it's not taken I will re-assign it.

            teilo James Nord added a comment - Not saving the queue on windows was fixed recently on master. I'm unassigning from myself for the moment as I'm not looking at this. If I get to look at it and it's not taken I will re-assign it.
            svanoort Sam Van Oort added a comment -

            Closing this since Kristin opened https://issues.jenkins-ci.org/browse/JENKINS-34281 which looks to be a completely separate issue.

            svanoort Sam Van Oort added a comment - Closing this since Kristin opened https://issues.jenkins-ci.org/browse/JENKINS-34281 which looks to be a completely separate issue.
            svanoort Sam Van Oort added a comment -

            Closing this, since Kristin has forked this out.

            svanoort Sam Van Oort added a comment - Closing this, since Kristin has forked this out.

            People

              Unassigned Unassigned
              teilo James Nord
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: