-
Bug
-
Resolution: Unresolved
-
Critical
-
Hide
Operating System :
LSB Version: :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Distributor ID: RedHatEnterpriseServer
Description: Red Hat Enterprise Linux Server release 6.8 (Santiago)
Release: 6.8
Codename: Santiago
JRE/JDK vendors and versions : 1.8.0_45
Jenkins versions : 2.46.2
Gerrit grigger plugin versions : 2.27.5
pipeline plugin version : 2.5
script-security plugin version : 1.44
running Jenkins directly or in a container : Jenkins is running on a Virtual machine on tomcat.
How you installed Jenkins : jenkins.war file to install Jenkins.
how you're launching any involved slave nodes : yes, we have slaves using ssh.
Your web browser : Crome Version 67.0.3396.99 (Official Build) (64-bit)ShowOperating System : LSB Version: :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch Distributor ID: RedHatEnterpriseServer Description: Red Hat Enterprise Linux Server release 6.8 (Santiago) Release: 6.8 Codename: Santiago JRE/JDK vendors and versions : 1.8.0_45 Jenkins versions : 2.46.2 Gerrit grigger plugin versions : 2.27.5 pipeline plugin version : 2.5 script-security plugin version : 1.44 running Jenkins directly or in a container : Jenkins is running on a Virtual machine on tomcat. How you installed Jenkins : jenkins.war file to install Jenkins. how you're launching any involved slave nodes : yes, we have slaves using ssh. Your web browser : Crome Version 67.0.3396.99 (Official Build) (64-bit)
-
Powered by SuggestiMate
We are experiencing a delay in Gerrit triggered Jobs in our Jenkins Jobs.
attached is the stack trace of the blocking threads.
Jul 18, 2018 5:07:57 PM com.sonymobile.tools.gerrit.gerritevents.GerritHandler checkQueueSize
WARNING: The Gerrit incoming events queue contains 28095 items! Something might be stuck, or your system can't process the commands fast enough. Try to increase the number of receiving worker threads. Current thread-pool size: 30
Jul 18, 2018 6:54:37 PM com.sonymobile.tools.gerrit.gerritevents.GerritJsonEventFactory getEvent
FINE: Constructor with JSONObject as parameter missing, trying default constructor.
java.lang.NoSuchMethodException: com.sonymobile.tools.gerrit.gerritevents.dto.events.RefUpdated.<init>(net.sf.json.JSONObject)
at java.lang.Class.getConstructor0(Class.java:3082)
at java.lang.Class.getConstructor(Class.java:1825)
at com.sonymobile.tools.gerrit.gerritevents.GerritJsonEventFactory.getEvent(GerritJsonEventFactory.java:69)
at com.sonymobile.tools.gerrit.gerritevents.workers.AbstractJsonObjectWork.perform(AbstractJsonObjectWork.java:69)
at com.sonymobile.tools.gerrit.gerritevents.workers.StreamEventsStringWork.perform(StreamEventsStringWork.java:67)
at com.sonymobile.tools.gerrit.gerritevents.workers.EventThread.run(EventThread.java:66)
at com.sonyericsson.hudson.plugins.gerrit.trigger.SystemEventThread.run(SystemEventThread.java:66)
[JENKINS-52636] Gerrit triggered jobs getting delayed
I investigated the issue as I am experiencing it in my own jenkins server and here are my findings.
The issue only happens when Gerrit-Trigger is connected to a Gerrit server with a lot of activity(events). Even if Gerrit-Trigger uses a thread pool(with 3 threads by default) to process events, it cannot keep up, the queue of events to process grows until the build up causes delays in triggering the builds(can be few hours).
Cranking up the number of threads does not really help as there is a major bottleneck in the Gerrit-Trigger code. The bottleneck is in GerritMissedEventsPlaybackManager.persist method. This method serialize to disk the last event received in order for the playback manager to know what is the last event processed and retrigger missed event while Gerrit-Trigger was down in case it goes down or the connection towards Gerrit is lost. The problem is that method is synchronized and called by the threads processing the events. So even if you have 100 receiving threads, they will all bottleneck in that method which is very slow as it does IO, this is even worst if Jenkins home is on NFS or any kind of slow volume.
The stack trace in the description is not right, here is the stack traces of a blocked thread followed by the stack trace of the thread holding the lock:
"Gerrit Worker EventThread_0" Id=62 Group=main BLOCKED on com.sonyericsson.hudson.plugins.gerrit.trigger.playback.GerritMissedEventsPlaybackManager@24996aa8 owned by "Gerrit Worker EventThread_1" Id=63 at com.sonyericsson.hudson.plugins.gerrit.trigger.playback.GerritMissedEventsPlaybackManager.persist(GerritMissedEventsPlaybackManager.java:455) - blocked on com.sonyericsson.hudson.plugins.gerrit.trigger.playback.GerritMissedEventsPlaybackManager@24996aa8 at com.sonyericsson.hudson.plugins.gerrit.trigger.playback.GerritMissedEventsPlaybackManager.gerritEvent(GerritMissedEventsPlaybackManager.java:300) at com.sonymobile.tools.gerrit.gerritevents.GerritHandler.notifyListener(GerritHandler.java:350) at com.sonymobile.tools.gerrit.gerritevents.GerritHandler.notifyListeners(GerritHandler.java:317) at com.sonyericsson.hudson.plugins.gerrit.trigger.JenkinsAwareGerritHandler.notifyListeners(JenkinsAwareGerritHandler.java:77) at com.sonymobile.tools.gerrit.gerritevents.workers.AbstractGerritEventWork.perform(AbstractGerritEventWork.java:46) at com.sonymobile.tools.gerrit.gerritevents.workers.AbstractJsonObjectWork.perform(AbstractJsonObjectWork.java:77) at com.sonymobile.tools.gerrit.gerritevents.workers.StreamEventsStringWork.perform(StreamEventsStringWork.java:67) at com.sonymobile.tools.gerrit.gerritevents.workers.EventThread.run(EventThread.java:66) at com.sonyericsson.hudson.plugins.gerrit.trigger.SystemEventThread.run(SystemEventThread.java:66)
"Gerrit Worker EventThread_1" Id=63 Group=main RUNNABLE at com.thoughtworks.xstream.mapper.MapperWrapper.getConverterFromItemType(MapperWrapper.java:88) at com.thoughtworks.xstream.mapper.MapperWrapper.getConverterFromItemType(MapperWrapper.java:88) at com.thoughtworks.xstream.mapper.MapperWrapper.getConverterFromItemType(MapperWrapper.java:88) at com.thoughtworks.xstream.mapper.MapperWrapper.getConverterFromItemType(MapperWrapper.java:88) at com.thoughtworks.xstream.mapper.MapperWrapper.getConverterFromItemType(MapperWrapper.java:88) at com.thoughtworks.xstream.mapper.MapperWrapper.getConverterFromItemType(MapperWrapper.java:88) at com.thoughtworks.xstream.mapper.MapperWrapper.getConverterFromItemType(MapperWrapper.java:88) at com.thoughtworks.xstream.mapper.MapperWrapper.getConverterFromItemType(MapperWrapper.java:88) at hudson.util.xstream.MapperDelegate.getConverterFromItemType(MapperDelegate.java:103) at com.thoughtworks.xstream.mapper.MapperWrapper.getConverterFromItemType(MapperWrapper.java:88) at hudson.util.RobustReflectionConverter$1.visit(RobustReflectionConverter.java:195) at com.thoughtworks.xstream.converters.reflection.PureJavaReflectionProvider.visitSerializableFields(PureJavaReflectionProvider.java:138) at hudson.util.RobustReflectionConverter.doMarshal(RobustReflectionConverter.java:191) at hudson.util.RobustReflectionConverter.marshal(RobustReflectionConverter.java:150) at com.thoughtworks.xstream.core.AbstractReferenceMarshaller.convert(AbstractReferenceMarshaller.java:69) at com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:58) at com.thoughtworks.xstream.core.AbstractReferenceMarshaller$1.convertAnother(AbstractReferenceMarshaller.java:84) at hudson.util.RobustReflectionConverter.marshallField(RobustReflectionConverter.java:265) at hudson.util.RobustReflectionConverter$2.writeField(RobustReflectionConverter.java:252) at hudson.util.RobustReflectionConverter$2.visit(RobustReflectionConverter.java:224) at com.thoughtworks.xstream.converters.reflection.PureJavaReflectionProvider.visitSerializableFields(PureJavaReflectionProvider.java:138) at hudson.util.RobustReflectionConverter.doMarshal(RobustReflectionConverter.java:209) at hudson.util.RobustReflectionConverter.marshal(RobustReflectionConverter.java:150) at com.thoughtworks.xstream.core.AbstractReferenceMarshaller.convert(AbstractReferenceMarshaller.java:69) at com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:58) at com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:43) at com.thoughtworks.xstream.core.AbstractReferenceMarshaller$1.convertAnother(AbstractReferenceMarshaller.java:88) at com.thoughtworks.xstream.converters.collections.AbstractCollectionConverter.writeItem(AbstractCollectionConverter.java:64) at com.thoughtworks.xstream.converters.collections.CollectionConverter.marshal(CollectionConverter.java:74) at com.thoughtworks.xstream.core.AbstractReferenceMarshaller.convert(AbstractReferenceMarshaller.java:69) at com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:58) at com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:43) at com.thoughtworks.xstream.core.AbstractReferenceMarshaller$1.convertAnother(AbstractReferenceMarshaller.java:88) at com.thoughtworks.xstream.converters.reflection.SerializableConverter$1.defaultWriteObject(SerializableConverter.java:214) at com.thoughtworks.xstream.core.util.CustomObjectOutputStream.defaultWriteObject(CustomObjectOutputStream.java:80) at java.util.Collections$SynchronizedCollection.writeObject(Collections.java:2081) - locked java.util.Collections$SynchronizedList@424b9a87 at sun.reflect.GeneratedMethodAccessor151.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.thoughtworks.xstream.converters.reflection.SerializationMethodInvoker.callWriteObject(SerializationMethodInvoker.java:135) at com.thoughtworks.xstream.converters.reflection.SerializableConverter.doMarshal(SerializableConverter.java:259) at com.thoughtworks.xstream.converters.reflection.AbstractReflectionConverter.marshal(AbstractReflectionConverter.java:83) at com.thoughtworks.xstream.core.AbstractReferenceMarshaller.convert(AbstractReferenceMarshaller.java:69) at com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:58) at com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:43) at com.thoughtworks.xstream.core.AbstractReferenceMarshaller$1.convertAnother(AbstractReferenceMarshaller.java:88) at com.thoughtworks.xstream.converters.reflection.AbstractReflectionConverter.marshal(AbstractReflectionConverter.java:81) at com.thoughtworks.xstream.core.AbstractReferenceMarshaller.convert(AbstractReferenceMarshaller.java:69) at com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:58) at com.thoughtworks.xstream.core.AbstractReferenceMarshaller$1.convertAnother(AbstractReferenceMarshaller.java:84) at hudson.util.RobustReflectionConverter.marshallField(RobustReflectionConverter.java:265) at hudson.util.RobustReflectionConverter$2.writeField(RobustReflectionConverter.java:252) at hudson.util.RobustReflectionConverter$2.visit(RobustReflectionConverter.java:224) at com.thoughtworks.xstream.converters.reflection.PureJavaReflectionProvider.visitSerializableFields(PureJavaReflectionProvider.java:138) at hudson.util.RobustReflectionConverter.doMarshal(RobustReflectionConverter.java:209) at hudson.util.RobustReflectionConverter.marshal(RobustReflectionConverter.java:150) at com.thoughtworks.xstream.core.AbstractReferenceMarshaller.convert(AbstractReferenceMarshaller.java:69) at com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:58) at com.thoughtworks.xstream.core.TreeMarshaller.convertAnother(TreeMarshaller.java:43) at com.thoughtworks.xstream.core.TreeMarshaller.start(TreeMarshaller.java:82) at com.thoughtworks.xstream.core.AbstractTreeMarshallingStrategy.marshal(AbstractTreeMarshallingStrategy.java:37) at com.thoughtworks.xstream.XStream.marshal(XStream.java:1026) at com.thoughtworks.xstream.XStream.marshal(XStream.java:1015) at com.thoughtworks.xstream.XStream.toXML(XStream.java:988) at hudson.XmlFile.write(XmlFile.java:193) at com.sonyericsson.hudson.plugins.gerrit.trigger.playback.GerritMissedEventsPlaybackManager.persist(GerritMissedEventsPlaybackManager.java:492) - locked com.sonyericsson.hudson.plugins.gerrit.trigger.playback.GerritMissedEventsPlaybackManager@24996aa8 at com.sonyericsson.hudson.plugins.gerrit.trigger.playback.GerritMissedEventsPlaybackManager.gerritEvent(GerritMissedEventsPlaybackManager.java:300) at com.sonymobile.tools.gerrit.gerritevents.GerritHandler.notifyListener(GerritHandler.java:350) at com.sonymobile.tools.gerrit.gerritevents.GerritHandler.notifyListeners(GerritHandler.java:317) at com.sonyericsson.hudson.plugins.gerrit.trigger.JenkinsAwareGerritHandler.notifyListeners(JenkinsAwareGerritHandler.java:77) at com.sonymobile.tools.gerrit.gerritevents.workers.AbstractGerritEventWork.perform(AbstractGerritEventWork.java:46) at com.sonymobile.tools.gerrit.gerritevents.workers.AbstractJsonObjectWork.perform(AbstractJsonObjectWork.java:77) at com.sonymobile.tools.gerrit.gerritevents.workers.StreamEventsStringWork.perform(StreamEventsStringWork.java:67) at com.sonymobile.tools.gerrit.gerritevents.workers.EventThread.run(EventThread.java:66) at com.sonyericsson.hudson.plugins.gerrit.trigger.SystemEventThread.run(SystemEventThread.java:66)
In the situation above, we have found the vast majority of the Gerrit events relate to "git notes" in the git repos. They are not required for triggering Jenkins jobs. A plugin config option that would allow certain events to be ignored by the plugin would be useful.
scoheb , Can you please let us know by any change you might have looked into this?
Thx scoheb!
Btw, a colleague of ours noticed [1] recently, if need be, or there might be other /related PRs too:
[1] https://github.com/jenkinsci/gerrit-trigger-plugin/pull/363
Note: not sure how close [1] is to hugares' well-detailed findings above, though (did not check myself).
We're having the same issue. Running on a machine with 32 cores we have a ton of events queued up and the trigger is barely able to keep up with them. Sometimes Jenkins appears to be dropping builds because of this and some of the requests are not honored.
Our updated thread pool size doesn't appear to be taking effect (although that should be put in a different issues).
Update, we've been debugging this issue, and it seems like the Replication Plugin may be (somehow) interfering with the missed events playback plugin/event trigger.
We're not entirely certain, but it almost looks like the event trigger (that sends builds to Jenkins) is receiving replication events and gets stuck trying to understand what to do with them.
We will continue investigating.
Disabling the replication plugin reduces our backlog queue to 0 and events are perfectly streamed to Jenkins.
Our team has confirmed that Gerrit trigger listens to all gerrit events, including ones it cannot act upon. It takes an inordinate time to determine whether or not it is able to act on them, finally only throwing a `NoSuchMethodException` trying to reflect into an object and see if the event has what it is looking for.
This wasn't really an issue before, as all data was stored in SQL somewhere. NoteDB stores all data (events included) in refs/changes/* which all generate events that Jenkins listens for.
We started using the replication plugin, and added reviewers automatically when a review was posted, and our Jenkins server was inundated almost immediately.
As it stands, this makes the gerrit-trigger plugin unusable for anybody using the replication plugin, and as gerrit generates more metadata over time, will make the plugin unusable in all general cases unless it starts only listening to relevant events.
We've found this gerrit trigger implemented with pipelines does not have the issue https://github.com/jenkinsci/gerrit-code-review-plugin
And will likely be switching to it in the future.
I have two PRs related to this issue.
https://github.com/jenkinsci/gerrit-trigger-plugin/pull/397
https://github.com/jenkinsci/gerrit-trigger-plugin/pull/398
The first one is to add an event filter to the gerrit event stream and the second one is delegate disk writing to a thread instead of letting the workers have a hold up.
Hi Christoffer,
That is good news. When we can expect it to get released?
Any status on this? There's a chance that my team just encountered this.
It doesn't look like there's been any action on the PR yet although it mostly looks good to go from here.
In the meantime, I have a patched version we're using at the company I'm at. You're free to use while we wait: https://github.com/DarrienG/gerrit-trigger-plugin/releases/tag/2.31.0-uninterested
It's basically HEAD from the official repo + ignores irrelevant events. We have probably 3000+ builds a day and haven't seen any issues with it while we wait. Jenkins was unusable for us otherwise.
We tried bumping to 2.29.0 during a system upgrade, and the queue started wildly accumulating without actually starting any builds. We reverted the plugin to 2.27.5 and things appear to be rolling again.
Any status on this? There's also a chance that my team just encountered this recently.
Recently 2.30.0 was released that include two changes meant to reduce queue load. One reduces the disk writing when playback is enabled and the other gives you the ability to filter out unnecessary gerrit messages from the main settings panel under advanced. Some may still be experiencing delays and queue build ups though.
To add more info to this, issue started happening after several plugins were upgraded (Pipeline, script security, junit...)
Enclosed list of installed plugins
Issue also encountered on other Jenkins instances running different Jenkins core/gerrit-trigger plugin: Core version 1.651.3 & gerrit trigger version - 2.18.3