-
Bug
-
Resolution: Fixed
-
Critical
-
This issue is platform independent and happens for any kind of job which sends out final status emails.
-
Powered by SuggestiMate
When you have a job configured to send out status emails for a job (like success, fail, still failing) the jobs will block the node for up to 15 minutes or even more, only to send out the email. Given that some of our jobs only run 2min this is an absolutely large overhead we have to get rid of soon. It's mainly blocking our internal QA team from signing off of new releases.
Here an example of the last lines of a job which has debug mode enabled for email-ext:
00:09:31.082 Archiving artifacts
00:09:31.093 Recording test results
00:09:31.218 Checking for post-build
00:09:31.218 Performing post-build step
00:09:31.219 Checking if email needs to be generated
00:09:31.219 Email was triggered for: Success
00:09:31.219 Sending email for trigger: Success
00:09:31.219 NOT overriding default server settings, using Mailer to create session
00:24:25.439 messageContentType = text/plain; charset=UTF-8
00:24:25.446 Adding recipients from recipient list
00:31:41.324 Successfully created MimeMessage
00:31:41.324 Sending email to: mozmill-ci@mozilla.org
00:31:41.523 Finished: SUCCESS
As you can see the job is running 9:31 minutes. Then when trying to add the content (i hope that's right here) it takes about 15 minutes, and again 6 more minutes to create the MimeMessage.
During all that time Jenkins has a dramatically high cpu load and it consuming nearly all CPU power with 99% load. As more jobs are running concurrently as worse the situation is becoming.
Right now we are using version 2.32, and we haven't upgraded to the latest yet given that we don't see features or fixes included we would benefit.
Detailed information for our current problems can be found in our own issue tracker: https://github.com/mozilla/mozmill-ci/issues/301
We would appreciate a quick fix if possible. I would be around if you need more information. You can also reach me via IRC. My nickname is whimboo.
Thanks
- hotFix Methods.png
- 84 kB
- Walter Kacynski
- preHotFix Methods.png
- 74 kB
- Walter Kacynski
[JENKINS-20078] email-ext plugin takes a huge amount of time to send out final status emails (30min)
Agreed, in my case the difference is much less extreme, however there is a noticeable difference between my master installations.
It's possible the Mailer plugin is different between the two installations? What version of the Mailer is on each?
I have tried mailer 1.6 and mailer 1.8 on my "slow" installation and both given the same timing effects. I have Mailer 1.6 on my "speedy" installation.
I have captured a number of thread dumps for my problem.
It looks like:
at hudson.plugins.emailext.plugins.ContentBuilder.getPrivateMacros(ContentBuilder.java:76)
is causing a search of a bunch of classes to occur.
Looks like the plugin is loading @EmailToken-annotated macros over and over without any caching. It is unclear to me why SezPoz is being used here directly, when the usual Jenkins @Extension would be easy to use (and would be cached).
There is a reason that I didn't use @Extension...but I don't remember what it is without looking at the code. I do need to add some caching for sure if there is a valid reason I'm not using @Extension.
Ah, the reason I didn't use @Extension was because I didn't want those macros to show up in the help for Token Macro, they are specific to email-ext. I'll add some caching.
Is there a particular reason why this is recently a problem? Some of my other masters don't experiencing this problem.
@Walter are they all on the same version of email-ext? Also, do they all have the same plugins installed and so forth?
Yes, unfortunately, I believe there to be a number of differences including the Jenkins core. Attached is "SPEEDY-support.zip".
I'll try and diff the SPEEDY vs others and see if anything pops out at me.
Also met this problem for a long time, wish to see a final solution:
Overriding default server settings, creating our own session Overriding charset UTF-8 //very slow, hang messageContentType = text/html; charset=UTF-8 Adding recipients from recipient list //very slow, hang Successfully created MimeMessage //very slow, hang Sending email to:
Is anyone willing to try a test version based on 2.37.2 to see if the fix I am putting in actually fixes the issue?
I have tested the new version and I do see an improvement. So much so that I am unable to capture a thread dump during the run of the email generation. When monitoring the process CPU, I still see a large spike in CPU time, but the duration of this spike is much smaller.
08:23:13 Deleting old workspace snapshot from #19. 08:23:13 Archiving artifacts 08:23:13 Sending artifact delta relative to Deploy » SND » Build_WAS_Environment #19 08:23:14 Archived 418 artifacts 08:23:14 Archive block size is 32768 08:23:14 Received 215 blocks and 142533 bytes 08:23:14 Compression is 98.0% 08:23:14 Checking for post-build 08:23:14 Performing post-build step 08:23:14 Checking if email needs to be generated 08:23:14 Email was triggered for: Success 08:23:14 Sending email for trigger: Success 08:23:14 NOT overriding default server settings, using Mailer to create session 08:23:17 messageContentType = text/html; charset=UTF-8 08:23:17 Adding developers 08:23:17 Sending to requester 08:23:17 Successfully created MimeMessage 08:23:17 Sending email to: ........... 08:23:18 Finished: SUCCESS
On additional comment, after the first job that builds an email the process is much slower like the original implementation. The above log is from a second run so it appears that the caching has improved response time.
@Walter, just to clarify, are you saying the issue is "fixed" or that there is still an issue?
I compared the Master CPU from "SPEEDY" and this fix and I would say that the problem is NOT fixed. When generating emails from "SPEEDY" I hardly notice a master CPU spike. With this fix, I see a 10% spike every-time an email is generated. This spike is much shorter than without the recent fix.
I have Wily/Introscope available to perform a method level trace of the application. Please let me know the package names that I can instrument that would be helpful to debug the slow areas of code if you are interested in this information. Thanks.
hudson.plugins.emailext.* (mainly ExtendedEmailPublisher and EmailRecipientUtils)
hudson.plugins.emailext.plugins.* (mainly ContentBuilder)
org.jenkinsci.plugins.tokenmacro.*
The only thing I changed in the release was adding caching. Perhaps my caching implementation wasn't a good one.
public class ContentBuilder { @CopyOnWrite private static volatile List<TokenMacro> privateMacros; ... public String transformText(String origText, ExtendedEmailPublisherContext context, List<TokenMacro> additionalMacros) { .... try { List<TokenMacro> privMacros = getPrivateMacros(); if(additionalMacros != null) privateMacros.addAll(additionalMacros); newText = TokenMacro.expandAll(context.getBuild(), context.getListener(), newText, false, privMacros); } catch (MacroEvaluationException e) { context.getListener().getLogger().println("Error evaluating token: " + e.getMessage()); } catch (Exception e) { Logger.getLogger(ContentBuilder.class.getName()).log(Level.SEVERE, null, e); } ... } public static List<TokenMacro> getPrivateMacros() { if(privateMacros != null) return privateMacros; privateMacros = new ArrayList<TokenMacro>(); ClassLoader cl = Jenkins.getInstance().pluginManager.uberClassLoader; for (final IndexItem<EmailToken, TokenMacro> item : Index.load(EmailToken.class, TokenMacro.class, cl)) { try { privateMacros.add(item.instance()); } catch (Exception e) { // ignore errors loading tokens } } return privateMacros; } }
I'm not a Java guy really, so if there is a better way to do it in Java, please give me some pointers.
Attached are the method traces for before and after the hot Fix. The hot fix eliminated 4 seconds of time on the getPrivateMacros method.
So, the next question, how do I dig down into the methods calls to compromise Scriptcontent|evaluate
ScriptContent.evaluate will use GroovyShell to evaluate the template. Not sure if you are currently looking at hudson.plugins.emailext.plugins.content in your traces.
I did a more detailed trace and the most time is spent in ScriptContent.renderTemplate for all my test, I'm using the same groovy based transfer template.
In plugin version 2.36 render time is about 1.1 seconds
In plugin version 2.37.2.1 render time is about 2.2 seconds for script / managed script based
I'm not familiar at all with Groovy and how it works within the JVM. I could try switching to Java 6 to see if there is any performance change.
Overall, I think that the latest caching changes can be pushed up, and I can investigate the rending performance as a separate item.
I'll look into the script rendering stuff as well, there is some magic happening in that part of the code too.
Can you monitor hudson.plugins.emailext.plugins.content.EmailExtScript? There is some stuff that goes on in there to allow for token macros to be used in the groovy templates. It shouldn't need additional caching because TokenMacro.all caches and getPrivateMacros now caches, but it would be interesting to see if methodMissing of that class is getting called and how long it is taking.
That class seems to be an abstract class. What or who implements a concrete version of it?
The Groovy CompilerConfiguration will subclass it when it creates the script scope, so its somewhere internal to the Groovy classes.
Hmm, I don't know how to instrument the groovy compiled class then. "methodMissing" disappears for some reason.
I'm not sure either, I've never done it. I guess I could instrument in the code itself...
When will you be publishing 2.37.2.1 as an official fix? I can help pursue the groovy script at a later point in time?
Code changed in jenkins
User: Alex Earl
Path:
src/main/java/hudson/plugins/emailext/plugins/ContentBuilder.java
http://jenkins-ci.org/commit/email-ext-plugin/6dfefca4438cd5d15f2d3c5cd88a29b2ae2f8556
Log:
Fix JENKINS-20078
Added caching to the private macro lookup
We still have an issue on latest 2.38 version with huge amount of time to send email....
This is the output from Debug mode:
Checking for post-build
Performing post-build step
Checking if email needs to be generated
Email was triggered for: Failure - Any
Email was triggered for: Failure - Still
Trigger Failure - Any was overridden by another trigger and will not send an email.
Sending email for trigger: Failure - Still
Overriding default server settings, creating our own session
messageContentType = text/html; charset=UTF-8
It hangs after messageContentType for 20 minutes
See my comment below. Sometimes this job hangs over 40 minutes to send email.
https://wiki.jenkins-ci.org/display/JENKINS/Obtaining+a+thread+dump. Please do this during the long wait.
One stack trace:
at java.lang.reflect.Method.copy(Unknown Source)
at java.lang.reflect.ReflectAccess.copyMethod(Unknown Source)
at sun.reflect.ReflectionFactory.copyMethod(Unknown Source)
at java.lang.Class.copyMethods(Unknown Source)
at java.lang.Class.getMethods(Unknown Source)
at org.apache.commons.beanutils.MethodUtils.getMatchingAccessibleMethod(MethodUtils.java:973)
at org.apache.commons.beanutils.MappedPropertyDescriptor.getMethod(MappedPropertyDescriptor.java:409)
at org.apache.commons.beanutils.MappedPropertyDescriptor.<init>(MappedPropertyDescriptor.java:111)
at org.apache.commons.beanutils.PropertyUtilsBean.getPropertyDescriptor(PropertyUtilsBean.java:934)
at org.apache.commons.beanutils.BeanUtilsBean.setProperty(BeanUtilsBean.java:935)
at org.apache.commons.beanutils.BeanUtilsBean.populate(BeanUtilsBean.java:830)
at org.apache.commons.beanutils.BeanUtils.populate(BeanUtils.java:433)
at org.apache.commons.digester.SetPropertiesRule.begin(SetPropertiesRule.java:254)
at org.apache.commons.digester.Rule.begin(Rule.java:177)
at org.apache.commons.digester.Digester.startElement(Digester.java:1583)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanStartElement(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at org.apache.commons.digester.Digester.parse(Digester.java:1871)
at hudson.scm.BlameSubversionChangeLogParser.parse(BlameSubversionChangeLogParser.java:68)
at hudson.scm.BlameSubversionChangeLogParser.parse(BlameSubversionChangeLogParser.java:47)
at hudson.model.AbstractBuild.calcChangeSet(AbstractBuild.java:922)
at hudson.model.AbstractBuild.getChangeSet(AbstractBuild.java:896)
at hudson.model.AbstractBuild.hasParticipant(AbstractBuild.java:454)
at hudson.model.AbstractProject.hasParticipant(AbstractProject.java:1644)
at hudson.model.User.getProjects(User.java:479)
at hudson.scm.BlameSubversionMailAddressResolverImpl.findMailAddressFor(BlameSubversionMailAddressResolverImpl.java:23)
at hudson.tasks.MailAddressResolver.resolve(MailAddressResolver.java:112)
at hudson.tasks.Mailer$UserProperty.getAddress(Mailer.java:547)
at hudson.plugins.emailext.EmailRecipientUtils.getUserConfiguredEmail(EmailRecipientUtils.java:110)
at hudson.plugins.emailext.plugins.recipients.CulpritsRecipientProvider.addRecipients(CulpritsRecipientProvider.java:41)
at hudson.plugins.emailext.ExtendedEmailPublisher.createMail(ExtendedEmailPublisher.java:516)
at hudson.plugins.emailext.ExtendedEmailPublisher.sendMail(ExtendedEmailPublisher.java:290)
at hudson.plugins.emailext.ExtendedEmailPublisher._perform(ExtendedEmailPublisher.java:281)
at hudson.plugins.emailext.ExtendedEmailPublisher.access$100(ExtendedEmailPublisher.java:79)
at hudson.plugins.emailext.ExtendedEmailPublisher$1.endBuild(ExtendedEmailPublisher.java:689)
at hudson.matrix.MatrixBuild$MatrixBuildExecution.post2(MatrixBuild.java:403)
at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:725)
at hudson.model.Run.execute(Run.java:1709)
at hudson.matrix.MatrixBuild.run(MatrixBuild.java:304)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:231)
at hudson.model.OneOffExecutor.run(OneOffExecutor.java:43)
Second stack trace:
owned by "Executor #-1 for plkrbuilddb : executing FlexNet 10 - TeamAUTO » 2 Database Tests » DBU #97" Id=41963
at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:648)
- blocked on hudson.model.RunMap@231579
at jenkins.model.lazy.AbstractLazyLoadRunMap.search(AbstractLazyLoadRunMap.java:381)
at hudson.model.AbstractBuild.getPreviousBuild(AbstractBuild.java:219)
at hudson.model.AbstractProject.hasParticipant(AbstractProject.java:1643)
at hudson.model.User.getProjects(User.java:479)
at hudson.scm.BlameSubversionMailAddressResolverImpl.findMailAddressFor(BlameSubversionMailAddressResolverImpl.java:23)
at hudson.tasks.MailAddressResolver.resolve(MailAddressResolver.java:112)
at hudson.tasks.Mailer$UserProperty.getAddress(Mailer.java:547)
at hudson.plugins.emailext.EmailRecipientUtils.getUserConfiguredEmail(EmailRecipientUtils.java:110)
at hudson.plugins.emailext.plugins.recipients.CulpritsRecipientProvider.addRecipients(CulpritsRecipientProvider.java:41)
at hudson.plugins.emailext.ExtendedEmailPublisher.createMail(ExtendedEmailPublisher.java:516)
at hudson.plugins.emailext.ExtendedEmailPublisher.sendMail(ExtendedEmailPublisher.java:290)
at hudson.plugins.emailext.ExtendedEmailPublisher._perform(ExtendedEmailPublisher.java:281)
at hudson.plugins.emailext.ExtendedEmailPublisher.access$100(ExtendedEmailPublisher.java:79)
at hudson.plugins.emailext.ExtendedEmailPublisher$1.endBuild(ExtendedEmailPublisher.java:689)
at hudson.matrix.MatrixBuild$MatrixBuildExecution.post2(MatrixBuild.java:403)
at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:725)
at hudson.model.Run.execute(Run.java:1709)
at hudson.matrix.MatrixBuild.run(MatrixBuild.java:304)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:231)
at hudson.model.OneOffExecutor.run(OneOffExecutor.java:43)
Ok, so its the Subversion mail address resolution that is taking a long time. That is not the email-ext plugin's fault.
Mailer plugin is in version 1.8, but I believe i use override section in configuration, then this plugin is not used (Override Global Settings <- this setting is used)
The MailAddressResolver is part of the Mailer plugin. email-ext uses this to resolve an address for a username. MailAddressResolver is an extension point that allows other plugins to provide an address resolver, SCM systems generally will provide one to map a committer username to an email address. In the case of the slowdown you are seeing, the BlameSubversionMailAddressResolverImpl is what is taking a long time, not email-ext itself.
I think the resolution of this issue should be fixed, and not 'not a defect'.
Interesting, I'm similar issues after implementing LTS 1.532.1.2 in Jenkins Enterprise. In my OSS 1.532.1 installation, I don't see this problem. I'm wondering if there is a plug-in conflict of some sort. I enabled the debug mode the majority of time seems to be spent between this two operations:
14:16:34 NOT overriding default server settings, using Mailer to create session
14:16:40 messageContentType = text/html; charset=UTF-8