Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-29060

Jenkins incorrectly reports timestamps are inconsistent

    • Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Icon: Major Major
    • core
    • None

      After recently upgrading to Jenkins LTS v1.596.3 we noticed that dozens, if not hundreds of builds per day are being reported as having "inconsistent" time stamps. Here are a few specific details that may help isolate the problem:

      • I have confirmed that the build number (ie: integer values) for the offending builds as reported on the Jenkins dashboard is consistent with what is recorded in the build.xml files on the server, which are also consistent with the symbolic links stored on disk that point to those builds
      • I have confirmed that the build IDs (ie: the time-stamped formatted identifiers) for the offending builds as reported on the Jenkins dashboard are consistent with what is recorded in the build.xml files on the server, which are also consistent with the names of the log folders for each of these builds
      • I have confirmed that the <startTime> elements in the build.xml files resolve to the same build IDs used to store and organize the builds
      • I have confirmed that none of these builds interfere or overlap with previous or subsequent builds of the same jobs (ie: if offending build is #536, I have confirmed that it does in fact run after build #535 and before build #537, and that all 3 such builds have consistent references to build numbers and IDs throughout the logs)
      • After upgrading to this version I discovered that all build agents connected to our master are now reporting their clocks as being out of sync, despite the fact that I have confirmed that this is not actually correct (ie: as described under JENKINS-18671). I suspect that this incorrect time reporting may be somehow confusing Jenkins into thinking builds have incorrect timestamps when in fact they don't.

      I probably should also mention that we've attempted to upgrade to the latest LTS edition, v1.609.1, in the off chance there may be some improvements included there that may resolve this issue (ie: since it appears to have re-structured how builds are stored on disk, eliminating the 'time stamp' labelling mechanism, which may correct this problem) however due to other production-stop defects included in that release (ie: JENKINS-28513) we were forced to remain on the 1.596.3 version for the time being.

          [JENKINS-29060] Jenkins incorrectly reports timestamps are inconsistent

          I probably should also pose the question: what criteria does Jenkins use to decide whether the timestamps on a job are "inconsistent" or not? (ie: what are they inconsistent with?) As described above, so far as I can tell all of the data stored in the actual build logs looks correct to me, however I may be overlooking something.

          Kevin Phillips added a comment - I probably should also pose the question: what criteria does Jenkins use to decide whether the timestamps on a job are "inconsistent" or not? (ie: what are they inconsistent with?) As described above, so far as I can tell all of the data stored in the actual build logs looks correct to me, however I may be overlooking something.

          Daniel Beck added a comment -

          Unfortunately, 1.596.x LTS is no longer supported so I'll have to reject this bug report. Since 1.597 changed the storage layout of builds on disk, the reported issue should no longer occur there.

          Consider asking for advice on the jenkinsci-users mailing list, or in #jenkins on Freenode.

          Daniel Beck added a comment - Unfortunately, 1.596.x LTS is no longer supported so I'll have to reject this bug report. Since 1.597 changed the storage layout of builds on disk, the reported issue should no longer occur there. Consider asking for advice on the jenkinsci-users mailing list, or in #jenkins on Freenode.

          Daniel Beck added a comment -

          The out-of-order monitoring code is at https://github.com/jenkinsci/jenkins/tree/stable-1.596/core/src/main/java/jenkins/diagnostics/ooom, maybe that helps.

          Basically this occurs when a build with a higher build number has an earlier start date. Note that other bugs may result in the same build number being assigned multiple times (and only one of those can be loaded by Jenkins). You need to compare the files on disk, not the loaded build records.

          Daniel Beck added a comment - The out-of-order monitoring code is at https://github.com/jenkinsci/jenkins/tree/stable-1.596/core/src/main/java/jenkins/diagnostics/ooom , maybe that helps. Basically this occurs when a build with a higher build number has an earlier start date. Note that other bugs may result in the same build number being assigned multiple times (and only one of those can be loaded by Jenkins). You need to compare the files on disk, not the loaded build records.

          Since 1.597 changed the storage layout of builds on disk, the reported issue should no longer occur there.

          Good to know. Hopefully the other product-stop bugs that were introduced by the 1.609.1 update can be fixed sooner rather than later so we can get those fixes as well.

          You need to compare the files on disk, not the loaded build records.

          Understood. I did actually compare the on-disk log and configuration files as stored on the Jenkins master when examining the data and could see nothing that would suggest that build numbers or time stamps were in any way out of order or duplicated with any other. That's what confused me.

          Kevin Phillips added a comment - Since 1.597 changed the storage layout of builds on disk, the reported issue should no longer occur there. Good to know. Hopefully the other product-stop bugs that were introduced by the 1.609.1 update can be fixed sooner rather than later so we can get those fixes as well. You need to compare the files on disk, not the loaded build records. Understood. I did actually compare the on-disk log and configuration files as stored on the Jenkins master when examining the data and could see nothing that would suggest that build numbers or time stamps were in any way out of order or duplicated with any other. That's what confused me.

          One last question: Assuming you understand the Jenkins internals better than I, can you say whether the weird 'agents out of sync' problem described by JENKINS-18671 may be the cause of this out-of-order build issue? (ie: if the agent says a build started at 12:59:07pm but the clock on the Jenkins master says the current time is 12:59:03, could that lead Jenkins to think the build times are broken when in fact they aren't in actuality?)

          Kevin Phillips added a comment - One last question: Assuming you understand the Jenkins internals better than I, can you say whether the weird 'agents out of sync' problem described by JENKINS-18671 may be the cause of this out-of-order build issue? (ie: if the agent says a build started at 12:59:07pm but the clock on the Jenkins master says the current time is 12:59:03, could that lead Jenkins to think the build times are broken when in fact they aren't in actuality?)

          Kevin Phillips added a comment - - edited

          NOTE: Based on my brief review of the source code in the link provided above, the 'inconsistent' build labels are defined as build numbers with ascending values that have descending time stamps, or vice versa. As I mentioned in my bug report I have confirmed that none of the builds being reported by our Jenkins instance fall into that category.

          When I examine the build.xml files on disk on the Jenkins master, the build numbers, time stamps, folder names and symbolic links are all consistent with one another, and all of the values are always in ascending order (ie: build 123 has a time stamp larger than build 122 and smaller than build 124, in every reported case).

          I suppose this point may be moot if the underlying architecture has been changed in such a way that prevents any of these kinds of problems from happening, but I just thought I should mention it in case a similar bug may still be present in the latest version as well.

          Kevin Phillips added a comment - - edited NOTE: Based on my brief review of the source code in the link provided above, the 'inconsistent' build labels are defined as build numbers with ascending values that have descending time stamps, or vice versa. As I mentioned in my bug report I have confirmed that none of the builds being reported by our Jenkins instance fall into that category. When I examine the build.xml files on disk on the Jenkins master, the build numbers, time stamps, folder names and symbolic links are all consistent with one another, and all of the values are always in ascending order (ie: build 123 has a time stamp larger than build 122 and smaller than build 124, in every reported case). I suppose this point may be moot if the underlying architecture has been changed in such a way that prevents any of these kinds of problems from happening, but I just thought I should mention it in case a similar bug may still be present in the latest version as well.

          Daniel Beck added a comment -

          Hopefully the other product-stop bugs that were introduced by the 1.609.1 update can be fixed sooner rather than later so we can get those fixes as well.

          I know about the queue/build blocker/block while something else is building issues, and expect 1.609.3 to have them resolved. Anything else a blocker for you?

          No idea re JENKINS-18671. I doubt you're seeing the same issue. Make sure your nodes are all in synced time.

          Daniel Beck added a comment - Hopefully the other product-stop bugs that were introduced by the 1.609.1 update can be fixed sooner rather than later so we can get those fixes as well. I know about the queue/build blocker/block while something else is building issues, and expect 1.609.3 to have them resolved. Anything else a blocker for you? No idea re JENKINS-18671 . I doubt you're seeing the same issue. Make sure your nodes are all in synced time.

          Anything else a blocker for you?

          One other behavioral change in 1.609.1 is that the BUILD_ID property was changed from a time stamp to a numeric identifier. All of our build configurations and releases processes were dependent upon a builds identifier being a time stamp, and we more-or-less use that BUILD_ID variable everywhere (scripts, code, etc.). I did find a plugin that allows one to create a new time-stamp based environment variable for each build (Zen Time Stamp) but we'll need to reconfigure our infrastructure to use the new variable name throughout.

          We hit a LOT of problems with the SVN plugin as well but I believe we've finally sorted all those out.

          I'm also currently investigating some bugs with the way the new Jenkins version handles jobs that are enqueued for builds. There are a couple of cases where jobs may get deadlocked in the queue for various reasons. Still trying to isolate specific use cases though before I file any more bugs.

          NOTE: We may have to stick with v1.596.3 for the time being. Assuming there are a great number of new changes / improvements / fixes included in the next LTS release, we'll likely have to invest considerable time and expense making sure none of those additional changes break anything else anew.

          No idea re JENKINS-18671. I doubt you're seeing the same issue. Make sure your nodes are all in synced time.

          All of our agents, and the master, are synchronized using the same NTP server and I have confirmed via direct examination of the agents that all the physical clocks on all machines are in fact correct and in sync with each other (to within < 1S variance at least). It's only the Jenkins dashboard that suggests otherwise. If you believe JENKINS-18671 (or the underlying code change from JENKINS-18438) just let me know and I'll create a new defect for that.

          If this problem is in fact a new / different bug, and it is a contributing factor to the inconsistent time stamps being reported, then I would say that would be a significant issue that we'd also want fixed in the next LTS release as well. We can't have build logs getting corrupted even if the problem is intermittent. We use Jenkins as a central service now for all of our production work and we need to be able to rely on its ability to preserve the details of those build operations for auditing purposes. (although, I guess this is a problem we're currently stuck with on 1.596.3 so it hopefully wouldn't get any worse if we were to move to a newer LTS version at least).

          Kevin Phillips added a comment - Anything else a blocker for you? One other behavioral change in 1.609.1 is that the BUILD_ID property was changed from a time stamp to a numeric identifier. All of our build configurations and releases processes were dependent upon a builds identifier being a time stamp, and we more-or-less use that BUILD_ID variable everywhere (scripts, code, etc.). I did find a plugin that allows one to create a new time-stamp based environment variable for each build ( Zen Time Stamp ) but we'll need to reconfigure our infrastructure to use the new variable name throughout. We hit a LOT of problems with the SVN plugin as well but I believe we've finally sorted all those out. I'm also currently investigating some bugs with the way the new Jenkins version handles jobs that are enqueued for builds. There are a couple of cases where jobs may get deadlocked in the queue for various reasons. Still trying to isolate specific use cases though before I file any more bugs. NOTE: We may have to stick with v1.596.3 for the time being. Assuming there are a great number of new changes / improvements / fixes included in the next LTS release, we'll likely have to invest considerable time and expense making sure none of those additional changes break anything else anew. No idea re JENKINS-18671 . I doubt you're seeing the same issue. Make sure your nodes are all in synced time. All of our agents, and the master, are synchronized using the same NTP server and I have confirmed via direct examination of the agents that all the physical clocks on all machines are in fact correct and in sync with each other (to within < 1S variance at least). It's only the Jenkins dashboard that suggests otherwise. If you believe JENKINS-18671 (or the underlying code change from JENKINS-18438 ) just let me know and I'll create a new defect for that. If this problem is in fact a new / different bug, and it is a contributing factor to the inconsistent time stamps being reported, then I would say that would be a significant issue that we'd also want fixed in the next LTS release as well. We can't have build logs getting corrupted even if the problem is intermittent. We use Jenkins as a central service now for all of our production work and we need to be able to rely on its ability to preserve the details of those build operations for auditing purposes. (although, I guess this is a problem we're currently stuck with on 1.596.3 so it hopefully wouldn't get any worse if we were to move to a newer LTS version at least).

          Daniel Beck added a comment -

          We hit a LOT of problems with the SVN plugin as well but I believe we've finally sorted all those out.

          More information would be nice, but be aware of JENKINS-21785 etc. (i.e. 'Additional Credentials' is deliberately needed in many situations for security concerns).

          I'm also currently investigating some bugs with the way the new Jenkins version handles jobs that are enqueued for builds.

          I was referring to those in my earlier comment. I expect further fixes in 1.609.3.

          Daniel Beck added a comment - We hit a LOT of problems with the SVN plugin as well but I believe we've finally sorted all those out. More information would be nice, but be aware of JENKINS-21785 etc. (i.e. 'Additional Credentials' is deliberately needed in many situations for security concerns). I'm also currently investigating some bugs with the way the new Jenkins version handles jobs that are enqueued for builds. I was referring to those in my earlier comment. I expect further fixes in 1.609.3.

          Kevin Phillips added a comment - - edited

          be aware of JENKINS-21785 etc

          That was the issue that help us resolve the major SVN issues we were experiencing. Thanks for the link though.

          The difficulty involved updating our 1000+ jobs to include the appropriate credentials definitions, which were not a requirement in previous LTS releases. Most, but not all, of our jobs monitor SVN repositories but it was a bit tricky to ensure that all the affected jobs, and only those affected by this change, got updated. And unfortunately we gleaned no benefit from the improved security as all of our Jenkins instances are run behind a firewall and operate within a Windows domain so only trusted system have access to our key services. The net result for us was a lot of superfluous work for little to no gain.

          For future reference, any time that you include an invasive change like this in an LTS edition you probably should provide an easier way for users to upgrade their configurations instead of leaving them up to their own vices to hack around the changes. It would greatly reduce the pain of upgrading.

          I was referring to those in my earlier comment.

          Sorry - my misunderstanding.

          Kevin Phillips added a comment - - edited be aware of JENKINS-21785 etc That was the issue that help us resolve the major SVN issues we were experiencing. Thanks for the link though. The difficulty involved updating our 1000+ jobs to include the appropriate credentials definitions, which were not a requirement in previous LTS releases. Most, but not all, of our jobs monitor SVN repositories but it was a bit tricky to ensure that all the affected jobs, and only those affected by this change, got updated. And unfortunately we gleaned no benefit from the improved security as all of our Jenkins instances are run behind a firewall and operate within a Windows domain so only trusted system have access to our key services. The net result for us was a lot of superfluous work for little to no gain. For future reference, any time that you include an invasive change like this in an LTS edition you probably should provide an easier way for users to upgrade their configurations instead of leaving them up to their own vices to hack around the changes. It would greatly reduce the pain of upgrading. I was referring to those in my earlier comment. Sorry - my misunderstanding.

            Unassigned Unassigned
            leedega Kevin Phillips
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: