[JENKINS-26545] Persist build(s) log(s) in MySQL (or similar)

Daniel Beck added a comment - 2015-01-22 08:12

A major architectural change like this needs to be discussed on the dev list.

FWIW a more exhaustive description of failure modes in the current system is needed – right now it's not clear at all what the problem is, or how use of a database would help.

Daniel Beck added a comment - 2015-01-22 08:12 A major architectural change like this needs to be discussed on the dev list. FWIW a more exhaustive description of failure modes in the current system is needed – right now it's not clear at all what the problem is, or how use of a database would help.

Libero Scarcelli added a comment - 2015-01-22 08:20

Ok... first of all... how to correctly back that stuff up? On Windows copying all those links is a problem, some back up systems run into issues trying to copying those links.

Then what if a code change breaks build logs? You will end up with a completely broken system. And this is happening on Windows...

Libero Scarcelli added a comment - 2015-01-22 08:20 Ok... first of all... how to correctly back that stuff up? On Windows copying all those links is a problem, some back up systems run into issues trying to copying those links. Then what if a code change breaks build logs? You will end up with a completely broken system. And this is happening on Windows...

Libero Scarcelli added a comment - 2015-01-22 08:21

Ok... first of all... how to correctly back that stuff up? On Windows copying all those links is a problem, some back up systems run into issues trying to copying those links.
Then what if a code change breaks build logs? You will end up with a completely broken system. And this is happening on Windows...

Libero Scarcelli added a comment - 2015-01-22 08:21 Ok... first of all... how to correctly back that stuff up? On Windows copying all those links is a problem, some back up systems run into issues trying to copying those links. Then what if a code change breaks build logs? You will end up with a completely broken system. And this is happening on Windows...

Mads Nielsen added a comment - 2015-01-22 08:43

Why did you put this under the logging-plugin component? The logging-plugin has nothing to do with this core change?

Mads Nielsen added a comment - 2015-01-22 08:43 Why did you put this under the logging-plugin component? The logging-plugin has nothing to do with this core change?

Libero Scarcelli added a comment - 2015-01-22 08:52 - edited

Because after modifying the Core you might want to modify the logging-plugin as well in order to extend it using new functionality... But yes... it's not necessary

Libero Scarcelli added a comment - 2015-01-22 08:52 - edited Because after modifying the Core you might want to modify the logging-plugin as well in order to extend it using new functionality... But yes... it's not necessary

Daniel Beck added a comment - 2015-01-22 09:15 - edited

On Windows copying all those links is a problem, some back up systems run into issues trying to copying those links.

In other words, you want Jenkins to not use OS features because your crap software doesn't support them. Well, if you prohibit creation of symbolic links on Windows through Group Policy, Jenkins will not create them.

Then what if a code change breaks build logs?

You're still incredibly vague.

Please note that build logs in Jenkins are simple text files. They cannot break (unless moving to a completely different architecture like IBM's servers, I suppose). Please mind your terminology, otherwise communication will be incredibly difficult.

As to code changes making parts of build records unloadable (including disabling plugins that store data in configuration or build records), that's handled by Old Data Monitor. You can discard these plugins' data. For deliberate code changes, plugin authors can write backward compatibility code to transform old data into a new format.

So what exactly is the problem you're trying to solve, and how is a database supposed to solve it?

Daniel Beck added a comment - 2015-01-22 09:15 - edited On Windows copying all those links is a problem, some back up systems run into issues trying to copying those links. In other words, you want Jenkins to not use OS features because your crap software doesn't support them. Well, if you prohibit creation of symbolic links on Windows through Group Policy, Jenkins will not create them. Then what if a code change breaks build logs? You're still incredibly vague. Please note that build logs in Jenkins are simple text files. They cannot break (unless moving to a completely different architecture like IBM's servers, I suppose). Please mind your terminology, otherwise communication will be incredibly difficult. As to code changes making parts of build records unloadable (including disabling plugins that store data in configuration or build records), that's handled by Old Data Monitor. You can discard these plugins' data. For deliberate code changes, plugin authors can write backward compatibility code to transform old data into a new format. So what exactly is the problem you're trying to solve, and how is a database supposed to solve it?

Libero Scarcelli added a comment - 2015-01-22 09:18

Also... it's not just about not being able to access old builds but this leads to have a system crash when Jenkins starts logging continuously because it is not able to reach previous builds information

Libero Scarcelli added a comment - 2015-01-22 09:18 Also... it's not just about not being able to access old builds but this leads to have a system crash when Jenkins starts logging continuously because it is not able to reach previous builds information

Daniel Beck added a comment - 2015-01-22 09:24

This looks a lot like a situation similar to the XY problem. You observe some behavior and propose a solution that you think would help. But you're not telling us what the exact behavior you're seeing is. It would help if you did that.

Daniel Beck added a comment - 2015-01-22 09:24 This looks a lot like a situation similar to the XY problem. You observe some behavior and propose a solution that you think would help. But you're not telling us what the exact behavior you're seeing is. It would help if you did that.

Libero Scarcelli added a comment - 2015-01-22 09:26

Daniel you probably have never used Jenkins on Windows extensively. Why should I discard old builds? So what's the point in having a build pipeline?

For "broken" logs I mean logs that are made unreadable which once again may lead to crash (out of memory, CPU starvation, etc...)

Libero Scarcelli added a comment - 2015-01-22 09:26 Daniel you probably have never used Jenkins on Windows extensively. Why should I discard old builds? So what's the point in having a build pipeline? For "broken" logs I mean logs that are made unreadable which once again may lead to crash (out of memory, CPU starvation, etc...)

Daniel Beck added a comment - 2015-01-22 10:11

You're still incredibly vague, with no actionable information as to what the problems you're observing are (unreadable data is different from OOM situations, and for both, more information is needed to investigate further – see https://wiki.jenkins-ci.org/display/JENKINS/How+to+report+an+issue and https://wiki.jenkins-ci.org/display/JENKINS/Obtaining+a+thread+dump).

Responding with a variant of "You don't know what you're talking about" when I try to redirect your responses to something constructive isn't helpful. Note that I've never written that you discard old builds (for one thing, I still don't know what the problems are). This report contains no substance, and the justification for requesting a major architectural change is questionable at best.

So, could you please explain, in detail, what the problems are that you're facing?

Daniel Beck added a comment - 2015-01-22 10:11 You're still incredibly vague, with no actionable information as to what the problems you're observing are (unreadable data is different from OOM situations, and for both, more information is needed to investigate further – see https://wiki.jenkins-ci.org/display/JENKINS/How+to+report+an+issue and https://wiki.jenkins-ci.org/display/JENKINS/Obtaining+a+thread+dump ). Responding with a variant of "You don't know what you're talking about" when I try to redirect your responses to something constructive isn't helpful. Note that I've never written that you discard old builds (for one thing, I still don't know what the problems are). This report contains no substance, and the justification for requesting a major architectural change is questionable at best. So, could you please explain, in detail, what the problems are that you're facing?

Libero Scarcelli added a comment - 2015-01-22 10:27

Daniel... does the number "1.597" ring any bell to you?

Libero Scarcelli added a comment - 2015-01-22 10:27 Daniel... does the number "1.597" ring any bell to you?

Daniel Beck added a comment - 2015-01-22 11:26

Not sure what I've done to deserve condescending answers such as that one. I'm volunteering to do a bit of work for the Jenkins project in my free time. I've warned on the jenkinsci-users mailing list about installing it, I've provided an explanation and workarounds to the disk-usage plugin problem on ~~JENKINS-26496~~, I've pointed Jesse (the migration code author) to the Windows migration issue ~~JENKINS-26519~~ and increased its priority, and I've even asked other project members for more public announcements mentioning possible side effects of this change prior to release (which didn't happen, unfortunately).

I'm unsubscribing from this issue, as it appears to serve no constructive purpose. You're clearly not interested in this leading to any kind of improvement. This should have been a post on your blog.

Daniel Beck added a comment - 2015-01-22 11:26 Not sure what I've done to deserve condescending answers such as that one. I'm volunteering to do a bit of work for the Jenkins project in my free time. I've warned on the jenkinsci-users mailing list about installing it, I've provided an explanation and workarounds to the disk-usage plugin problem on JENKINS-26496 , I've pointed Jesse (the migration code author) to the Windows migration issue JENKINS-26519 and increased its priority, and I've even asked other project members for more public announcements mentioning possible side effects of this change prior to release (which didn't happen, unfortunately). I'm unsubscribing from this issue, as it appears to serve no constructive purpose. You're clearly not interested in this leading to any kind of improvement. This should have been a post on your blog.

Libero Scarcelli added a comment - 2015-01-22 12:56 - edited

Sorry Daniel if you felt offended (although I am not sure whether you are reading this one) but I was just pointing out the fact that probably using a DB would fix many problems we have been experiencing (especially on Windows!). In fact, lets say you have a big project with 10.000 builds (yes it is easy to collect so many of them if you work for a company that has at least 15 developers) do you really want to have your Jenkins parse 10.000 XML files? Also considering the fact that in case of errors the system will try parsing them forever producing large error log files till the service crashes?

Yes I do value the time you have been spending on this project but allow me to say we probably could improve the system.

The build log format has changed many times so far causing many issues. In some cases you can fix the problem by discarding the offending builds (generally the old ones) which of course fails the main principle of any software pipeline.

Also using all of those symlinks on Windows... You can back-up and restore all projects no problem but if you try restoring log files then you might run into problems

https://issues.jenkins-ci.org/browse/JENKINS-18924

That said... my two cents... I was just trying to help

Libero Scarcelli added a comment - 2015-01-22 12:56 - edited Sorry Daniel if you felt offended (although I am not sure whether you are reading this one) but I was just pointing out the fact that probably using a DB would fix many problems we have been experiencing (especially on Windows!). In fact, lets say you have a big project with 10.000 builds (yes it is easy to collect so many of them if you work for a company that has at least 15 developers) do you really want to have your Jenkins parse 10.000 XML files? Also considering the fact that in case of errors the system will try parsing them forever producing large error log files till the service crashes? Yes I do value the time you have been spending on this project but allow me to say we probably could improve the system. The build log format has changed many times so far causing many issues. In some cases you can fix the problem by discarding the offending builds (generally the old ones) which of course fails the main principle of any software pipeline. Also using all of those symlinks on Windows... You can back-up and restore all projects no problem but if you try restoring log files then you might run into problems https://issues.jenkins-ci.org/browse/JENKINS-18924 That said... my two cents... I was just trying to help

Daniel Serodio added a comment - 2015-10-30 13:28

Build logs are txt files, not XML files. Job configuration, history, etc. could be better stored in a database instead of the filesystem, but not logs. Logs don't belong in a database.

Daniel Serodio added a comment - 2015-10-30 13:28 Build logs are txt files, not XML files. Job configuration, history, etc. could be better stored in a database instead of the filesystem, but not logs. Logs don't belong in a database.

Tim-Christian Bloss added a comment - 2015-11-16 09:33

Hi,

maybe I'm not fully aware of the structure builds are stored in, but I think can understand a little bit what's the intention to store builds including build log in databases.

For me it looks like Jenkins reads jobs (jobs/*/config.xml) as well as some parts of the builds of each job everytime the Jenkins master gets started.
Same goes for file system based config-history.
At least our splitted file system shows such activity every time the Jenkins master gets started (splitting listed later on).

From what I see on our file systems, each build of most of our jobs jobs contains something like

"build.xml"
"log"
"revision.txt"
"InjectedEnvVars.txt"
"changelog.xml"
Maybe build/log doesn't get read permanently, but I would suspect the build.xml might contain valuable information, possibly even the txt files..

We splitted jenkins to different disks to hunt down some massive performance issues we ran into, namely
/dev/sdb1 30G 1,2G 27G 5% /opt/jenkins
/dev/sdg1 40G 7,0G 31G 19% /opt/jenkins/jenkins_jobs_builds ---> defined using <buildsDir>
tmpfs 4,0G 732K 4,0G 1% /opt/jenkins/temp
tmpfs 4,0G 0 4,0G 0% /opt/jenkins/fingerprints
/dev/sdd1 197G 6,0G 181G 4% /opt/jenkins/jobs
/dev/sdc1 50G 4,2G 43G 10% /opt/jenkins/config-history

When starting Jenkins master, disk activity "breaks any scale" on our monitoring on disks containing
/opt/jenkins/jenkins_jobs_builds
/opt/jenkins/jobs
/opt/jenkins/config-history

We've already cut down job history to only two builds in history for each job, otherwise we sometimes got startup times far beyond half an hour (even more than one hour as we were running Jenkins master on Windows previously).
In all maven builds we disabled fingerprinting as it became possible to do so, as we needed to flush the tmpfs /opt/jenkins/fingerprints nearly once per minute before.

Maybe we're an example of a company excessively using jenkins or simply "building too large projects too often" as some might think.

Regarding our experiences, it would really be useful to store as much data as possible in some database, limiting data reads to just the used fields (maybe who started a build and when it happened, what were the test results and so on) instead of reading build.xml for more than 200 maven modules per single build run per job.

Also regarding portability for test systems (on site integration testing before upgrading production systems) as well as high availability it would be easier to store configuration and job data in a (replicated) database rather than using a file system.

I agree, usually one would say storing logs (as clob/blob) in a database is a waste of database resources and a bad attitude regarding performance for website delivery, especially when some single build logs can reach more than 200mb in size, but at least it would be usefull as an option in enterprise environments where software production is primary reason to use a CI system.

Tim-Christian Bloss added a comment - 2015-11-16 09:33 Hi, maybe I'm not fully aware of the structure builds are stored in, but I think can understand a little bit what's the intention to store builds including build log in databases. For me it looks like Jenkins reads jobs (jobs/*/config.xml) as well as some parts of the builds of each job everytime the Jenkins master gets started. Same goes for file system based config-history. At least our splitted file system shows such activity every time the Jenkins master gets started (splitting listed later on). From what I see on our file systems, each build of most of our jobs jobs contains something like "build.xml" "log" "revision.txt" "InjectedEnvVars.txt" "changelog.xml" Maybe build/log doesn't get read permanently, but I would suspect the build.xml might contain valuable information, possibly even the txt files.. We splitted jenkins to different disks to hunt down some massive performance issues we ran into, namely /dev/sdb1 30G 1,2G 27G 5% /opt/jenkins /dev/sdg1 40G 7,0G 31G 19% /opt/jenkins/jenkins_jobs_builds ---> defined using <buildsDir> tmpfs 4,0G 732K 4,0G 1% /opt/jenkins/temp tmpfs 4,0G 0 4,0G 0% /opt/jenkins/fingerprints /dev/sdd1 197G 6,0G 181G 4% /opt/jenkins/jobs /dev/sdc1 50G 4,2G 43G 10% /opt/jenkins/config-history When starting Jenkins master, disk activity "breaks any scale" on our monitoring on disks containing /opt/jenkins/jenkins_jobs_builds /opt/jenkins/jobs /opt/jenkins/config-history We've already cut down job history to only two builds in history for each job, otherwise we sometimes got startup times far beyond half an hour (even more than one hour as we were running Jenkins master on Windows previously). In all maven builds we disabled fingerprinting as it became possible to do so, as we needed to flush the tmpfs /opt/jenkins/fingerprints nearly once per minute before. Maybe we're an example of a company excessively using jenkins or simply "building too large projects too often" as some might think. Regarding our experiences, it would really be useful to store as much data as possible in some database, limiting data reads to just the used fields (maybe who started a build and when it happened, what were the test results and so on) instead of reading build.xml for more than 200 maven modules per single build run per job. Also regarding portability for test systems (on site integration testing before upgrading production systems) as well as high availability it would be easier to store configuration and job data in a (replicated) database rather than using a file system. I agree, usually one would say storing logs (as clob/blob) in a database is a waste of database resources and a bad attitude regarding performance for website delivery, especially when some single build logs can reach more than 200mb in size, but at least it would be usefull as an option in enterprise environments where software production is primary reason to use a CI system.

Daniel Beck added a comment - 2015-11-17 21:03

for more than 200 maven modules per single build run per job.

Don't use the Maven job type. It has some well known, severe performance problems. Problem solved.

Daniel Beck added a comment - 2015-11-17 21:03 for more than 200 maven modules per single build run per job. Don't use the Maven job type. It has some well known, severe performance problems. Problem solved.

Daniel Serodio added a comment - 2015-11-22 02:11

Thanks for the info danielbeck, is this documented somewhere?

Daniel Serodio added a comment - 2015-11-22 02:11 Thanks for the info danielbeck , is this documented somewhere?

Daniel Beck added a comment - 2015-11-23 12:45

More learned through experience by a number of community members. I doubt it's explicitly documented somewhere. From my limited experience with Maven I'd also consider 200 modules in a single build to be rather unusual, so the performance issues you experience likely aren't experienced by most users. When loading a single build involves parsing hundreds of XML files (compared to one in the case of freestyle), performance problems are to be expected.

Note that there are other issues with the job type, which were written up by stephenconnolly, a contributor to Jenkins and Maven, here:
http://javaadventure.blogspot.de/2013/11/jenkins-maven-job-type-considered-evil.html

Daniel Beck added a comment - 2015-11-23 12:45 More learned through experience by a number of community members. I doubt it's explicitly documented somewhere. From my limited experience with Maven I'd also consider 200 modules in a single build to be rather unusual, so the performance issues you experience likely aren't experienced by most users. When loading a single build involves parsing hundreds of XML files (compared to one in the case of freestyle), performance problems are to be expected. Note that there are other issues with the job type, which were written up by stephenconnolly , a contributor to Jenkins and Maven, here: http://javaadventure.blogspot.de/2013/11/jenkins-maven-job-type-considered-evil.html

Stephen Connolly added a comment - 2015-11-23 14:26

Part of the issue in start-up performance for the evil job type is the mutability of build results issue...

You can see this in the other side-effect issues: ~~JENKINS-20731~~, JENKINS-25075

Basically, back from when this job type was truly completely and utterly evil, it would run each module as a separate "job" in parallel and basically re-implement the whole Maven Reactor with a bunch of hacks... You can still enable this mode of using the evil job type, but it is at least no longer the default.

So what you have is that you can re-run a single module only... and the build result for an evil job is the aggregate build result of the most recent build of all the modules...

Thus if you get a foo-project build #5 broken due to a flakey test in foo-manchu-module, you just re-trigger the failing module and presto! your foo-project build #5 is now overall fixed because the foo-manchu-module build #6 was a success (note that the original build of foo-project #5 resulted in foo-manchu-module build #5)

So when Jenkins loads an evil job build record, not only does it have to parse all latest build child module descriptors, but it has to re-evaluate all the latest build results of all modules in the current build...

And then the weather column wants to evaluate the build stability... so it goes asking for the build result of the previous 4 builds...

So actually I would contend (and teilo would agree) that the root cause of the performance issues is actually the mutability of the build result in the evil job type... fix that and a lot of the evil job's performance issues can be resolved... there are other performance issues with the evil job... and it will still be evil

(FYI: I call it evil from my PoV as a committer to the Apache Maven project because it is a job type that expressly goes completely against the core principles of Maven... one of which being that the build can be completely reproducible using an identical environment variables, user permissions pom and command-line... the evil one doesn't do this because it modifies the effective pom in ways you cannot see or deterministically determine without injecting some additional inspection code into your Maven binaries)

Stephen Connolly added a comment - 2015-11-23 14:26 Part of the issue in start-up performance for the evil job type is the mutability of build results issue... You can see this in the other side-effect issues: JENKINS-20731 , JENKINS-25075 Basically, back from when this job type was truly completely and utterly evil, it would run each module as a separate "job" in parallel and basically re-implement the whole Maven Reactor with a bunch of hacks... You can still enable this mode of using the evil job type, but it is at least no longer the default. So what you have is that you can re-run a single module only... and the build result for an evil job is the aggregate build result of the most recent build of all the modules... Thus if you get a foo-project build #5 broken due to a flakey test in foo-manchu-module, you just re-trigger the failing module and presto! your foo-project build #5 is now overall fixed because the foo-manchu-module build #6 was a success (note that the original build of foo-project #5 resulted in foo-manchu-module build #5) So when Jenkins loads an evil job build record, not only does it have to parse all latest build child module descriptors, but it has to re-evaluate all the latest build results of all modules in the current build... And then the weather column wants to evaluate the build stability... so it goes asking for the build result of the previous 4 builds... So actually I would contend (and teilo would agree) that the root cause of the performance issues is actually the mutability of the build result in the evil job type... fix that and a lot of the evil job's performance issues can be resolved... there are other performance issues with the evil job... and it will still be evil (FYI: I call it evil from my PoV as a committer to the Apache Maven project because it is a job type that expressly goes completely against the core principles of Maven... one of which being that the build can be completely reproducible using an identical environment variables, user permissions pom and command-line... the evil one doesn't do this because it modifies the effective pom in ways you cannot see or deterministically determine without injecting some additional inspection code into your Maven binaries)

Oleg Nenashev added a comment - 2017-03-29 09:02

FTR there was some design work and prototyping in JENKINS-38313

Oleg Nenashev added a comment - 2017-03-29 09:02 FTR there was some design work and prototyping in JENKINS-38313

Jenkins

Details

Description

Attachments

Issue Links

Activity

Collapse comment: Daniel Beck added a comment - 2015-01-22 08:12

Expand comment: Daniel Beck added a comment - 2015-01-22 08:12

Collapse comment: Libero Scarcelli added a comment - 2015-01-22 08:20

Expand comment: Libero Scarcelli added a comment - 2015-01-22 08:20

Collapse comment: Libero Scarcelli added a comment - 2015-01-22 08:21

Expand comment: Libero Scarcelli added a comment - 2015-01-22 08:21

Collapse comment: Mads Nielsen added a comment - 2015-01-22 08:43

Expand comment: Mads Nielsen added a comment - 2015-01-22 08:43

Collapse comment: Libero Scarcelli added a comment - 2015-01-22 08:52, Edited by Libero Scarcelli - 2015-01-22 08:52

Expand comment: Libero Scarcelli added a comment - 2015-01-22 08:52, Edited by Libero Scarcelli - 2015-01-22 08:52

Collapse comment: Daniel Beck added a comment - 2015-01-22 09:15, Edited by Daniel Beck - 2015-01-22 09:17

Expand comment: Daniel Beck added a comment - 2015-01-22 09:15, Edited by Daniel Beck - 2015-01-22 09:17

Collapse comment: Libero Scarcelli added a comment - 2015-01-22 09:18

Expand comment: Libero Scarcelli added a comment - 2015-01-22 09:18

Collapse comment: Daniel Beck added a comment - 2015-01-22 09:24

Expand comment: Daniel Beck added a comment - 2015-01-22 09:24

Collapse comment: Libero Scarcelli added a comment - 2015-01-22 09:26

Expand comment: Libero Scarcelli added a comment - 2015-01-22 09:26

Collapse comment: Daniel Beck added a comment - 2015-01-22 10:11

Expand comment: Daniel Beck added a comment - 2015-01-22 10:11

Collapse comment: Libero Scarcelli added a comment - 2015-01-22 10:27

Expand comment: Libero Scarcelli added a comment - 2015-01-22 10:27

Collapse comment: Daniel Beck added a comment - 2015-01-22 11:26

Expand comment: Daniel Beck added a comment - 2015-01-22 11:26

Collapse comment: Libero Scarcelli added a comment - 2015-01-22 12:56, Edited by Libero Scarcelli - 2015-01-22 13:02

Expand comment: Libero Scarcelli added a comment - 2015-01-22 12:56, Edited by Libero Scarcelli - 2015-01-22 13:02

Collapse comment: Daniel Serodio added a comment - 2015-10-30 13:28

Expand comment: Daniel Serodio added a comment - 2015-10-30 13:28

Collapse comment: Tim-Christian Bloss added a comment - 2015-11-16 09:33

Expand comment: Tim-Christian Bloss added a comment - 2015-11-16 09:33

Collapse comment: Daniel Beck added a comment - 2015-11-17 21:03

Expand comment: Daniel Beck added a comment - 2015-11-17 21:03

Collapse comment: Daniel Serodio added a comment - 2015-11-22 02:11

Expand comment: Daniel Serodio added a comment - 2015-11-22 02:11

Collapse comment: Daniel Beck added a comment - 2015-11-23 12:45

Expand comment: Daniel Beck added a comment - 2015-11-23 12:45

Collapse comment: Stephen Connolly added a comment - 2015-11-23 14:26

Expand comment: Stephen Connolly added a comment - 2015-11-23 14:26

Collapse comment: Oleg Nenashev added a comment - 2017-03-29 09:02

Expand comment: Oleg Nenashev added a comment - 2017-03-29 09:02

People

Dates