• Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • None
    • Hudson 1.354 (?)

      This started over in JENKINS-6254, but I think it's not limited to Perforce, so I'm submitting this in Core.
      Today, after upgrading to 1.354, I'm seeing cases where builds move from their job into other jobs instead. In JENKINS-6254 I note one case where two builds (412+413) from one job move to another job, and possibly cause the Perforce plugin to get very confused about where to build.
      Earlier I had two jobs where 3 builds from each had switched sides and now were listed as belonging to the other job.
      The switching was complete in the sense that the build directories on the Hudson server were in the wrong places. I stopped the service and moved the directories around, and they looked fine again.
      One of the those jobs consistently failed like this:

      FATAL: null
      java.lang.NullPointerException
      	at hudson.tasks.ArtifactArchiver.prebuild(ArtifactArchiver.java:147)
      	at hudson.model.AbstractBuild$AbstractRunner.preBuild(AbstractBuild.java:595)
      	at hudson.model.AbstractBuild$AbstractRunner.preBuild(AbstractBuild.java:590)
      	at hudson.model.AbstractBuild$AbstractRunner.preBuild(AbstractBuild.java:586)
      	at hudson.model.Build$RunnerImpl.doRun(Build.java:114)
      	at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:416)
      	at hudson.model.Run.run(Run.java:1244)
      	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
      	at hudson.model.ResourceController.execute(ResourceController.java:88)
      	at hudson.model.Executor.run(Executor.java:122)
      

      I'm not really keen on going through all our jobs to scan for switched builds, so I'll likely have to downgrade to 1.353.
      A curious thing, though, is that all of the switched jobs (so far) took place April 7th (a week ago). We were not running 1.354 then, but I didn't see the problems that I do now, so I'm mostly inclined to believe that it's something to do with 1.354.
      After downgrading, I cannot really be of much help as we don't have a "testing" server (but I'm beginning to think we should).

          [JENKINS-6256] Builds are listed under the wrong job

          torbent created issue -

          torbent added a comment -

          I've now moved the one build I know of back to its rightful place, and will keep running 1.354 for the time being - if you need me to check something, let me know?

          torbent added a comment - I've now moved the one build I know of back to its rightful place, and will keep running 1.354 for the time being - if you need me to check something, let me know?

          Alan Harder added a comment -

          "moved back" - how, exactly? Was the entire build directory under the wrong job directory, so you moved it back under the right job? Did it have a symlink? In the right or wrong job's builds dir? Any connection between the right/wrong job? (upstream/downstream, same SCM, anything?)

          Very odd (and disturbing) behavior, but w/o steps to reproduce this from a new install, not sure what we can do.

          Alan Harder added a comment - "moved back" - how, exactly? Was the entire build directory under the wrong job directory, so you moved it back under the right job? Did it have a symlink? In the right or wrong job's builds dir? Any connection between the right/wrong job? (upstream/downstream, same SCM, anything?) Very odd (and disturbing) behavior, but w/o steps to reproduce this from a new install, not sure what we can do.

          torbent added a comment -

          Yes, the entire build directory was under the wrong job. I stopped Hudson, moved the directory, and restarted. Appeared to be fine.
          For another job I just moved the directory and asked Hudson to reload the configurations. Also appeared to work.

          The master is a Windows XP machine, so there were no symlinks anywhere, AFAIK.
          The jobs use the same SCM (Perforce - but see also JENKINS-6254).
          Some of them were related up/down, others weren't. I noticed this for 7 builds on 3 jobs.
          All the builds that moved were performed around the same time (within an hour or so) in the afternoon on April 7th. In build.xml, they're marked as been performed with 1.353. The jobs immediately before these were done with 1.351, so perhaps it was something in the upgrade that caused this? I noticed a lot of log output about configurations being modified by plugins.
          During that upgrade I believe I added a couple of extra plugins, but I can't offhand remember which. I can check if you'd like?
          What else? Yes, I had to downgrade the Perforce plugin quite quickly after the upgrade - I can't remember if that would have happened before or after these builds.

          Yes, I agree that it's extremely difficult to reproduce - I mostly just wanted to record the failure, in case it wasn't just me

          torbent added a comment - Yes, the entire build directory was under the wrong job. I stopped Hudson, moved the directory, and restarted. Appeared to be fine. For another job I just moved the directory and asked Hudson to reload the configurations. Also appeared to work. The master is a Windows XP machine, so there were no symlinks anywhere, AFAIK. The jobs use the same SCM (Perforce - but see also JENKINS-6254 ). Some of them were related up/down, others weren't. I noticed this for 7 builds on 3 jobs. All the builds that moved were performed around the same time (within an hour or so) in the afternoon on April 7th. In build.xml, they're marked as been performed with 1.353. The jobs immediately before these were done with 1.351, so perhaps it was something in the upgrade that caused this? I noticed a lot of log output about configurations being modified by plugins. During that upgrade I believe I added a couple of extra plugins, but I can't offhand remember which. I can check if you'd like? What else? Yes, I had to downgrade the Perforce plugin quite quickly after the upgrade - I can't remember if that would have happened before or after these builds. Yes, I agree that it's extremely difficult to reproduce - I mostly just wanted to record the failure, in case it wasn't just me

          torbent added a comment -

          This problem hit me again when upgrading to 1.357 today!
          So I had to stop our server, and investigated a lot in the files that were on the disk. On close look it appears that it's only the build.xml file that's misplaced. The rest of the build subdirectory (named with a timestamp) is from the job it's located in (looked at changelog.xml where possible, junit-results.xml, archive/ etc).
          Further, I was able to find the "source" build.xml in their proper locations in all cases, so they were just copied over the wrong places. Apparently, the builds (source and destination) took place within minutes of each other, but were not necessarily related (up/downstream). Some were, though. The source builds that were copied (5 or 6 in all) were not performed on the same day.
          I've had to delete the builds that now had bad build.xml files.

          My only hint of suspicion is the DownStreamBuildView plugin which has been very loud during startup, converting jobs and builds to include such a view. I've disabled it and will try to remember to keep an eye out next time I upgrade.
          The problem might not be related to upgrading, btw. Upgrading is usually also the only time I restart Hudson

          torbent added a comment - This problem hit me again when upgrading to 1.357 today! So I had to stop our server, and investigated a lot in the files that were on the disk. On close look it appears that it's only the build.xml file that's misplaced. The rest of the build subdirectory (named with a timestamp) is from the job it's located in (looked at changelog.xml where possible, junit-results.xml , archive/ etc). Further, I was able to find the "source" build.xml in their proper locations in all cases, so they were just copied over the wrong places. Apparently, the builds (source and destination) took place within minutes of each other, but were not necessarily related (up/downstream). Some were, though. The source builds that were copied (5 or 6 in all) were not performed on the same day. I've had to delete the builds that now had bad build.xml files. My only hint of suspicion is the DownStreamBuildView plugin which has been very loud during startup, converting jobs and builds to include such a view. I've disabled it and will try to remember to keep an eye out next time I upgrade. The problem might not be related to upgrading, btw. Upgrading is usually also the only time I restart Hudson

          ralfhergert added a comment -

          We have had this issue exactly after we installed the DownstreamBuildViewPlugin on our Hudson (1.358). After installing the plugin about 50 of 550 projects have been rebuild, till we noticed that in 8 projects builds were displayed which had a higher build number, then last regular build. We shutted down our server and inspected hudson's filesystem. It showed out, that in all affected builds each build.xml (.hudson/jobs/<project>/builds/<buildnumber>/build.xml) was overwritten by a build.xml of another project. All other files of the affected build were correct.

          I guess torbent is right suspecting the DownstreamBuildView plugin.

          ralfhergert added a comment - We have had this issue exactly after we installed the DownstreamBuildViewPlugin on our Hudson (1.358). After installing the plugin about 50 of 550 projects have been rebuild, till we noticed that in 8 projects builds were displayed which had a higher build number, then last regular build. We shutted down our server and inspected hudson's filesystem. It showed out, that in all affected builds each build.xml (.hudson/jobs/<project>/builds/<buildnumber>/build.xml) was overwritten by a build.xml of another project. All other files of the affected build were correct. I guess torbent is right suspecting the DownstreamBuildView plugin.

          torbent added a comment -

          When upgrading to 1.359 the other day (without DownstreamBuildView) I noticed no ill effects (had written a script to detect it!). Will check again when I, according to my plan, upgrade to 1.360 tomorrow. It's looking a bit more like DownstreamBuildView is to blame, so I'll move the issue to that component.
          ralfhergert, you shouldn't only look for higher numbers - I had several cases where the bad number was much lower. And I think there was one where the numbers sort of were ok; that was a bit harder to spot.

          torbent added a comment - When upgrading to 1.359 the other day (without DownstreamBuildView) I noticed no ill effects (had written a script to detect it!). Will check again when I, according to my plan, upgrade to 1.360 tomorrow. It's looking a bit more like DownstreamBuildView is to blame, so I'll move the issue to that component. ralfhergert, you shouldn't only look for higher numbers - I had several cases where the bad number was much lower. And I think there was one where the numbers sort of were ok; that was a bit harder to spot.

          torbent added a comment -

          Move to downstream-buildview component, as that's the prime suspect.

          torbent added a comment - Move to downstream-buildview component, as that's the prime suspect.
          torbent made changes -
          Component/s New: downstream-buildview [ 15699 ]
          Component/s Original: core [ 15593 ]

          ralfhergert added a comment -

          hello torbent, we did not rely only on the buildnumbers shown on hudson's gui. We wrote a shell script to find all build.xml files which are located in a workspace in which they are not belonging to. Our script uses the <workspace>-tag inside the build.xml for analysis.

          #!/bin/sh
          # this script will list all build.xml files which are located in a wrong build directory.
          # execute this script in the directory where hudson stores all jobs. (.hudson)
          for i in `find jobs/*/builds -type d | awk 'BEGIN {FS="/"} { print $2; }'  | uniq`
          do
                  echo testing project: $i
          
                  cd jobs/$i/builds
                  # check whether the projectname in <workspace> equals the directory project name
                  find . -name "build.xml" -exec grep -in "workspace" {} /dev/null \; | grep -v $i
          
                  cd ../../..
          done
          

          ralfhergert added a comment - hello torbent, we did not rely only on the buildnumbers shown on hudson's gui. We wrote a shell script to find all build.xml files which are located in a workspace in which they are not belonging to. Our script uses the <workspace>-tag inside the build.xml for analysis. #!/bin/sh # this script will list all build.xml files which are located in a wrong build directory. # execute this script in the directory where hudson stores all jobs. (.hudson) for i in `find jobs/*/builds -type d | awk 'BEGIN {FS= "/" } { print $2; }' | uniq` do echo testing project: $i cd jobs/$i/builds # check whether the projectname in <workspace> equals the directory project name find . -name "build.xml" -exec grep -in "workspace" {} /dev/ null \; | grep -v $i cd ../../.. done

            Unassigned Unassigned
            torbent torbent
            Votes:
            4 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: