Hi,
maybe I'm not fully aware of the structure builds are stored in, but I think can understand a little bit what's the intention to store builds including build log in databases.
For me it looks like Jenkins reads jobs (jobs/*/config.xml) as well as some parts of the builds of each job everytime the Jenkins master gets started.
Same goes for file system based config-history.
At least our splitted file system shows such activity every time the Jenkins master gets started (splitting listed later on).
From what I see on our file systems, each build of most of our jobs jobs contains something like
- "build.xml"
- "log"
- "revision.txt"
- "InjectedEnvVars.txt"
- "changelog.xml"
Maybe build/log doesn't get read permanently, but I would suspect the build.xml might contain valuable information, possibly even the txt files..
We splitted jenkins to different disks to hunt down some massive performance issues we ran into, namely
/dev/sdb1 30G 1,2G 27G 5% /opt/jenkins
/dev/sdg1 40G 7,0G 31G 19% /opt/jenkins/jenkins_jobs_builds ---> defined using <buildsDir>
tmpfs 4,0G 732K 4,0G 1% /opt/jenkins/temp
tmpfs 4,0G 0 4,0G 0% /opt/jenkins/fingerprints
/dev/sdd1 197G 6,0G 181G 4% /opt/jenkins/jobs
/dev/sdc1 50G 4,2G 43G 10% /opt/jenkins/config-history
When starting Jenkins master, disk activity "breaks any scale" on our monitoring on disks containing
/opt/jenkins/jenkins_jobs_builds
/opt/jenkins/jobs
/opt/jenkins/config-history
We've already cut down job history to only two builds in history for each job, otherwise we sometimes got startup times far beyond half an hour (even more than one hour as we were running Jenkins master on Windows previously).
In all maven builds we disabled fingerprinting as it became possible to do so, as we needed to flush the tmpfs /opt/jenkins/fingerprints nearly once per minute before.
Maybe we're an example of a company excessively using jenkins or simply "building too large projects too often" as some might think.
Regarding our experiences, it would really be useful to store as much data as possible in some database, limiting data reads to just the used fields (maybe who started a build and when it happened, what were the test results and so on) instead of reading build.xml for more than 200 maven modules per single build run per job.
Also regarding portability for test systems (on site integration testing before upgrading production systems) as well as high availability it would be easier to store configuration and job data in a (replicated) database rather than using a file system.
I agree, usually one would say storing logs (as clob/blob) in a database is a waste of database resources and a bad attitude regarding performance for website delivery, especially when some single build logs can reach more than 200mb in size, but at least it would be usefull as an option in enterprise environments where software production is primary reason to use a CI system.
Daniel... does the number "1.597" ring any bell to you?