Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-5393

Jenkins falsely fails builds due to aggressive up-to-date check on junit artifacts

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • junit-plugin
    • None
    • RedHat Linux VMware

      I'm getting this error at the end of a build in Hudson that should have been successful. I don't know why this age check is failing but I would be satisfied if I could simply disable the check.

      Archiving artifacts
      Recording test results
      Test reports were found but none of them are new. Did tests run?
      For example, /opt/hudson-workspace/workspace/project/target/test/reports/unittest/Ant suite/Ant test.xml is 3 min 13 sec old

      These files are, without question, up to date and generated only by the current built.

      Here are my questions:
      1: The comparison is between the date of the file and... what other thing? What do "new" and "old" mean in this context?
      2: How can I disable or deceive this comparison? It's completely unwanted and unnecessary.

      I know there was a bug for this in earlier versions of Hudson, but apparently the error message was the only thing changed.

      The problem still occurs in 1.341. As for the environment, these are RedHat Linux virtual machines, and the dates on the system are synchronized to within 1 second. I cannot understand why Hudson thinks these files are too old.

          [JENKINS-5393] Jenkins falsely fails builds due to aggressive up-to-date check on junit artifacts

          carltonb created issue -

          Alan Harder added a comment -

          Looking at the code, the comparison appears to be: is the timestamp of all test files more than 3 seconds before this build was started? (with some attempt to account for master/slave clock difference)

          Is your build running on a slave or the master?
          How long does your build take to run?

          Alan Harder added a comment - Looking at the code, the comparison appears to be: is the timestamp of all test files more than 3 seconds before this build was started? (with some attempt to account for master/slave clock difference) Is your build running on a slave or the master? How long does your build take to run?

          carltonb added a comment - - edited

          The build was running on an ssh slave. The build takes 3 to 5 minutes to run depending on activity on the box. Also, there is no time difference between the master and slave as tested by running the unix 'date' command in 2 separate terminal windows.

          I suspect the problem is the attempt to account for the master/slave clock difference. It seems to me that it's just asking for trouble to compare remote file timestamps to a clock on another machine.

          Could you point me to the code where you found this behavior? I would like to take a look and understand this a little better.

          carltonb added a comment - - edited The build was running on an ssh slave. The build takes 3 to 5 minutes to run depending on activity on the box. Also, there is no time difference between the master and slave as tested by running the unix 'date' command in 2 separate terminal windows. I suspect the problem is the attempt to account for the master/slave clock difference. It seems to me that it's just asking for trouble to compare remote file timestamps to a clock on another machine. Could you point me to the code where you found this behavior? I would like to take a look and understand this a little better.

          carltonb added a comment -

          Just a sample of the time differentials:
          Build time: 4 minutes 22 seconds Ant test.xml is 2 min 31 sec old
          Build time: 3 minutes 38 seconds Ant test.xml is 3 min 13 sec old
          Build time: 5 minutes 38 seconds Ant test.xml is 1 min 26 sec old

          I could see maybe 10-20 seconds being too close, but this master/slave has never been 4 minutes apart.

          carltonb added a comment - Just a sample of the time differentials: Build time: 4 minutes 22 seconds Ant test.xml is 2 min 31 sec old Build time: 3 minutes 38 seconds Ant test.xml is 3 min 13 sec old Build time: 5 minutes 38 seconds Ant test.xml is 1 min 26 sec old I could see maybe 10-20 seconds being too close, but this master/slave has never been 4 minutes apart.

          carltonb added a comment -

          After digging around in Fisheye I see where this is happening.

          http://fisheye.jenkins-ci.org/browse/Hudson/trunk/hudson/main/core/src/main/java/hudson/tasks/junit/TestResult.java?r=HEAD

          It looks like some refactoring has taken place, and there is a TODO to make a certain property 'checkTimestamps' configurable as a system property.

          I presume that the person who made that comment delayed this change for a reason and will finish that task when the time is right.

          carltonb added a comment - After digging around in Fisheye I see where this is happening. http://fisheye.jenkins-ci.org/browse/Hudson/trunk/hudson/main/core/src/main/java/hudson/tasks/junit/TestResult.java?r=HEAD It looks like some refactoring has taken place, and there is a TODO to make a certain property 'checkTimestamps' configurable as a system property. I presume that the person who made that comment delayed this change for a reason and will finish that task when the time is right.

          carltonb added a comment -

          I finally discovered the reason for the apparent clock difference. It was not that the slave and master were out of sync. Rather, the slave and master both store all their data on an NFS mount, and the NFS server was 6 minutes behind. Fortunately this is an NFS server that my team controls, so we were able to correct the clock and pass builds.

          IMO this is just another illustration of why we need the option to disable this aggressive config check. There are just too many ways it can falsely fail for reasons that the Hudson admin may not be able to mitigate.

          carltonb added a comment - I finally discovered the reason for the apparent clock difference. It was not that the slave and master were out of sync. Rather, the slave and master both store all their data on an NFS mount, and the NFS server was 6 minutes behind. Fortunately this is an NFS server that my team controls, so we were able to correct the clock and pass builds. IMO this is just another illustration of why we need the option to disable this aggressive config check. There are just too many ways it can falsely fail for reasons that the Hudson admin may not be able to mitigate.
          carltonb made changes -
          Environment Original: RedHad Linux VMware New: RedHat Linux VMware

          Note that if this check is not in place, tests that are removed at one point will continue to be tabulated by Hudson, until you do a clean build.

          Maybe touching a file in the beginning of the build and reading its timestamp back to use as the basis might be the way to go.

          Kohsuke Kawaguchi added a comment - Note that if this check is not in place, tests that are removed at one point will continue to be tabulated by Hudson, until you do a clean build. Maybe touching a file in the beginning of the build and reading its timestamp back to use as the basis might be the way to go.

          carltonb added a comment -

          That sounds like a good plan, as long as the reference file is created in the same way the unit test files are created (on the same slave, in the same workspace) to avoid unnecessary differences.

          carltonb added a comment - That sounds like a good plan, as long as the reference file is created in the same way the unit test files are created (on the same slave, in the same workspace) to avoid unnecessary differences.
          OHTAKE Tomohiro made changes -
          Link New: This issue is duplicated by JENKINS-7375 [ JENKINS-7375 ]

            Unassigned Unassigned
            carltonb carltonb
            Votes:
            14 Vote for this issue
            Watchers:
            15 Start watching this issue

              Created:
              Updated: