Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-26545

Persist build(s) log(s) in MySQL (or similar)

      Persisting build(s) log(s) in MySQL (or similar) will make Jenkins updates much easier to manage.

      In fact, when build logs change their format we sometimes end up with a broken system which it is often unable to read logs of old builds. Software changes even break stored builds info from time to time

          [JENKINS-26545] Persist build(s) log(s) in MySQL (or similar)

          Daniel... does the number "1.597" ring any bell to you?

          Libero Scarcelli added a comment - Daniel... does the number "1.597" ring any bell to you?

          Daniel Beck added a comment -

          Not sure what I've done to deserve condescending answers such as that one. I'm volunteering to do a bit of work for the Jenkins project in my free time. I've warned on the jenkinsci-users mailing list about installing it, I've provided an explanation and workarounds to the disk-usage plugin problem on JENKINS-26496, I've pointed Jesse (the migration code author) to the Windows migration issue JENKINS-26519 and increased its priority, and I've even asked other project members for more public announcements mentioning possible side effects of this change prior to release (which didn't happen, unfortunately).

          I'm unsubscribing from this issue, as it appears to serve no constructive purpose. You're clearly not interested in this leading to any kind of improvement. This should have been a post on your blog.

          Daniel Beck added a comment - Not sure what I've done to deserve condescending answers such as that one. I'm volunteering to do a bit of work for the Jenkins project in my free time. I've warned on the jenkinsci-users mailing list about installing it, I've provided an explanation and workarounds to the disk-usage plugin problem on JENKINS-26496 , I've pointed Jesse (the migration code author) to the Windows migration issue JENKINS-26519 and increased its priority, and I've even asked other project members for more public announcements mentioning possible side effects of this change prior to release (which didn't happen, unfortunately). I'm unsubscribing from this issue, as it appears to serve no constructive purpose. You're clearly not interested in this leading to any kind of improvement. This should have been a post on your blog.

          Libero Scarcelli added a comment - - edited

          Sorry Daniel if you felt offended (although I am not sure whether you are reading this one) but I was just pointing out the fact that probably using a DB would fix many problems we have been experiencing (especially on Windows!). In fact, lets say you have a big project with 10.000 builds (yes it is easy to collect so many of them if you work for a company that has at least 15 developers) do you really want to have your Jenkins parse 10.000 XML files? Also considering the fact that in case of errors the system will try parsing them forever producing large error log files till the service crashes?

          Yes I do value the time you have been spending on this project but allow me to say we probably could improve the system.

          The build log format has changed many times so far causing many issues. In some cases you can fix the problem by discarding the offending builds (generally the old ones) which of course fails the main principle of any software pipeline.

          Also using all of those symlinks on Windows... You can back-up and restore all projects no problem but if you try restoring log files then you might run into problems

          https://issues.jenkins-ci.org/browse/JENKINS-18924

          That said... my two cents... I was just trying to help

          Libero Scarcelli added a comment - - edited Sorry Daniel if you felt offended (although I am not sure whether you are reading this one) but I was just pointing out the fact that probably using a DB would fix many problems we have been experiencing (especially on Windows!). In fact, lets say you have a big project with 10.000 builds (yes it is easy to collect so many of them if you work for a company that has at least 15 developers) do you really want to have your Jenkins parse 10.000 XML files? Also considering the fact that in case of errors the system will try parsing them forever producing large error log files till the service crashes? Yes I do value the time you have been spending on this project but allow me to say we probably could improve the system. The build log format has changed many times so far causing many issues. In some cases you can fix the problem by discarding the offending builds (generally the old ones) which of course fails the main principle of any software pipeline. Also using all of those symlinks on Windows... You can back-up and restore all projects no problem but if you try restoring log files then you might run into problems https://issues.jenkins-ci.org/browse/JENKINS-18924 That said... my two cents... I was just trying to help

          Build logs are txt files, not XML files. Job configuration, history, etc. could be better stored in a database instead of the filesystem, but not logs. Logs don't belong in a database.

          Daniel Serodio added a comment - Build logs are txt files, not XML files. Job configuration, history, etc. could be better stored in a database instead of the filesystem, but not logs. Logs don't belong in a database.

          Hi,

          maybe I'm not fully aware of the structure builds are stored in, but I think can understand a little bit what's the intention to store builds including build log in databases.

          For me it looks like Jenkins reads jobs (jobs/*/config.xml) as well as some parts of the builds of each job everytime the Jenkins master gets started.
          Same goes for file system based config-history.
          At least our splitted file system shows such activity every time the Jenkins master gets started (splitting listed later on).

          From what I see on our file systems, each build of most of our jobs jobs contains something like

          • "build.xml"
          • "log"
          • "revision.txt"
          • "InjectedEnvVars.txt"
          • "changelog.xml"
            Maybe build/log doesn't get read permanently, but I would suspect the build.xml might contain valuable information, possibly even the txt files..

          We splitted jenkins to different disks to hunt down some massive performance issues we ran into, namely
          /dev/sdb1 30G 1,2G 27G 5% /opt/jenkins
          /dev/sdg1 40G 7,0G 31G 19% /opt/jenkins/jenkins_jobs_builds ---> defined using <buildsDir>
          tmpfs 4,0G 732K 4,0G 1% /opt/jenkins/temp
          tmpfs 4,0G 0 4,0G 0% /opt/jenkins/fingerprints
          /dev/sdd1 197G 6,0G 181G 4% /opt/jenkins/jobs
          /dev/sdc1 50G 4,2G 43G 10% /opt/jenkins/config-history

          When starting Jenkins master, disk activity "breaks any scale" on our monitoring on disks containing
          /opt/jenkins/jenkins_jobs_builds
          /opt/jenkins/jobs
          /opt/jenkins/config-history

          We've already cut down job history to only two builds in history for each job, otherwise we sometimes got startup times far beyond half an hour (even more than one hour as we were running Jenkins master on Windows previously).
          In all maven builds we disabled fingerprinting as it became possible to do so, as we needed to flush the tmpfs /opt/jenkins/fingerprints nearly once per minute before.

          Maybe we're an example of a company excessively using jenkins or simply "building too large projects too often" as some might think.

          Regarding our experiences, it would really be useful to store as much data as possible in some database, limiting data reads to just the used fields (maybe who started a build and when it happened, what were the test results and so on) instead of reading build.xml for more than 200 maven modules per single build run per job.

          Also regarding portability for test systems (on site integration testing before upgrading production systems) as well as high availability it would be easier to store configuration and job data in a (replicated) database rather than using a file system.

          I agree, usually one would say storing logs (as clob/blob) in a database is a waste of database resources and a bad attitude regarding performance for website delivery, especially when some single build logs can reach more than 200mb in size, but at least it would be usefull as an option in enterprise environments where software production is primary reason to use a CI system.

          Tim-Christian Bloss added a comment - Hi, maybe I'm not fully aware of the structure builds are stored in, but I think can understand a little bit what's the intention to store builds including build log in databases. For me it looks like Jenkins reads jobs (jobs/*/config.xml) as well as some parts of the builds of each job everytime the Jenkins master gets started. Same goes for file system based config-history. At least our splitted file system shows such activity every time the Jenkins master gets started (splitting listed later on). From what I see on our file systems, each build of most of our jobs jobs contains something like "build.xml" "log" "revision.txt" "InjectedEnvVars.txt" "changelog.xml" Maybe build/log doesn't get read permanently, but I would suspect the build.xml might contain valuable information, possibly even the txt files.. We splitted jenkins to different disks to hunt down some massive performance issues we ran into, namely /dev/sdb1 30G 1,2G 27G 5% /opt/jenkins /dev/sdg1 40G 7,0G 31G 19% /opt/jenkins/jenkins_jobs_builds ---> defined using <buildsDir> tmpfs 4,0G 732K 4,0G 1% /opt/jenkins/temp tmpfs 4,0G 0 4,0G 0% /opt/jenkins/fingerprints /dev/sdd1 197G 6,0G 181G 4% /opt/jenkins/jobs /dev/sdc1 50G 4,2G 43G 10% /opt/jenkins/config-history When starting Jenkins master, disk activity "breaks any scale" on our monitoring on disks containing /opt/jenkins/jenkins_jobs_builds /opt/jenkins/jobs /opt/jenkins/config-history We've already cut down job history to only two builds in history for each job, otherwise we sometimes got startup times far beyond half an hour (even more than one hour as we were running Jenkins master on Windows previously). In all maven builds we disabled fingerprinting as it became possible to do so, as we needed to flush the tmpfs /opt/jenkins/fingerprints nearly once per minute before. Maybe we're an example of a company excessively using jenkins or simply "building too large projects too often" as some might think. Regarding our experiences, it would really be useful to store as much data as possible in some database, limiting data reads to just the used fields (maybe who started a build and when it happened, what were the test results and so on) instead of reading build.xml for more than 200 maven modules per single build run per job. Also regarding portability for test systems (on site integration testing before upgrading production systems) as well as high availability it would be easier to store configuration and job data in a (replicated) database rather than using a file system. I agree, usually one would say storing logs (as clob/blob) in a database is a waste of database resources and a bad attitude regarding performance for website delivery, especially when some single build logs can reach more than 200mb in size, but at least it would be usefull as an option in enterprise environments where software production is primary reason to use a CI system.

          Daniel Beck added a comment -

          for more than 200 maven modules per single build run per job.

          Don't use the Maven job type. It has some well known, severe performance problems. Problem solved.

          Daniel Beck added a comment - for more than 200 maven modules per single build run per job. Don't use the Maven job type. It has some well known, severe performance problems. Problem solved.

          Thanks for the info danielbeck, is this documented somewhere?

          Daniel Serodio added a comment - Thanks for the info danielbeck , is this documented somewhere?

          Daniel Beck added a comment -

          More learned through experience by a number of community members. I doubt it's explicitly documented somewhere. From my limited experience with Maven I'd also consider 200 modules in a single build to be rather unusual, so the performance issues you experience likely aren't experienced by most users. When loading a single build involves parsing hundreds of XML files (compared to one in the case of freestyle), performance problems are to be expected.

          Note that there are other issues with the job type, which were written up by stephenconnolly, a contributor to Jenkins and Maven, here:
          http://javaadventure.blogspot.de/2013/11/jenkins-maven-job-type-considered-evil.html

          Daniel Beck added a comment - More learned through experience by a number of community members. I doubt it's explicitly documented somewhere. From my limited experience with Maven I'd also consider 200 modules in a single build to be rather unusual, so the performance issues you experience likely aren't experienced by most users. When loading a single build involves parsing hundreds of XML files (compared to one in the case of freestyle), performance problems are to be expected. Note that there are other issues with the job type, which were written up by stephenconnolly , a contributor to Jenkins and Maven, here: http://javaadventure.blogspot.de/2013/11/jenkins-maven-job-type-considered-evil.html

          Part of the issue in start-up performance for the evil job type is the mutability of build results issue...

          You can see this in the other side-effect issues: JENKINS-20731, JENKINS-25075

          Basically, back from when this job type was truly completely and utterly evil, it would run each module as a separate "job" in parallel and basically re-implement the whole Maven Reactor with a bunch of hacks... You can still enable this mode of using the evil job type, but it is at least no longer the default.

          So what you have is that you can re-run a single module only... and the build result for an evil job is the aggregate build result of the most recent build of all the modules...

          Thus if you get a foo-project build #5 broken due to a flakey test in foo-manchu-module, you just re-trigger the failing module and presto! your foo-project build #5 is now overall fixed because the foo-manchu-module build #6 was a success (note that the original build of foo-project #5 resulted in foo-manchu-module build #5)

          So when Jenkins loads an evil job build record, not only does it have to parse all latest build child module descriptors, but it has to re-evaluate all the latest build results of all modules in the current build...

          And then the weather column wants to evaluate the build stability... so it goes asking for the build result of the previous 4 builds...

          So actually I would contend (and teilo would agree) that the root cause of the performance issues is actually the mutability of the build result in the evil job type... fix that and a lot of the evil job's performance issues can be resolved... there are other performance issues with the evil job... and it will still be evil

          (FYI: I call it evil from my PoV as a committer to the Apache Maven project because it is a job type that expressly goes completely against the core principles of Maven... one of which being that the build can be completely reproducible using an identical environment variables, user permissions pom and command-line... the evil one doesn't do this because it modifies the effective pom in ways you cannot see or deterministically determine without injecting some additional inspection code into your Maven binaries)

          Stephen Connolly added a comment - Part of the issue in start-up performance for the evil job type is the mutability of build results issue... You can see this in the other side-effect issues: JENKINS-20731 , JENKINS-25075 Basically, back from when this job type was truly completely and utterly evil, it would run each module as a separate "job" in parallel and basically re-implement the whole Maven Reactor with a bunch of hacks... You can still enable this mode of using the evil job type, but it is at least no longer the default. So what you have is that you can re-run a single module only... and the build result for an evil job is the aggregate build result of the most recent build of all the modules... Thus if you get a foo-project build #5 broken due to a flakey test in foo-manchu-module, you just re-trigger the failing module and presto! your foo-project build #5 is now overall fixed because the foo-manchu-module build #6 was a success (note that the original build of foo-project #5 resulted in foo-manchu-module build #5) So when Jenkins loads an evil job build record, not only does it have to parse all latest build child module descriptors, but it has to re-evaluate all the latest build results of all modules in the current build... And then the weather column wants to evaluate the build stability... so it goes asking for the build result of the previous 4 builds... So actually I would contend (and teilo would agree) that the root cause of the performance issues is actually the mutability of the build result in the evil job type... fix that and a lot of the evil job's performance issues can be resolved... there are other performance issues with the evil job... and it will still be evil (FYI: I call it evil from my PoV as a committer to the Apache Maven project because it is a job type that expressly goes completely against the core principles of Maven... one of which being that the build can be completely reproducible using an identical environment variables, user permissions pom and command-line... the evil one doesn't do this because it modifies the effective pom in ways you cannot see or deterministically determine without injecting some additional inspection code into your Maven binaries)

          Oleg Nenashev added a comment -

          FTR there was some design work and prototyping in JENKINS-38313

          Oleg Nenashev added a comment - FTR there was some design work and prototyping in JENKINS-38313

            Unassigned Unassigned
            czerny Libero Scarcelli
            Votes:
            2 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated: