Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-44794

Committing too often? Builds held up by other builds doing a commit for long periods of time

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • lucene-search-plugin
    • None
    • Jenkins 2.64, Lucene Search 4.4

      Many of our builds were taking a long time to finish up work even though they didn't seem to be doing anything. After doing some stack dumps, it turned out that many builds were held up waiting for a lock on an IndexWriter. I saw the same dumps after three separate stack dumps across 2 minutes.

       

      "Executor #0 for M-Hisui : executing abbyy/label_exp=macosx #402" #47014 daemon prio=5 os_prio=0 tid=0x00007f10ecbd5000 nid=0x4f9f waiting for monitor entry [0x00007f1043
      045000]
      java.lang.Thread.State: BLOCKED (on object monitor)
      at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2966)
      - waiting to lock <0x00000004d6f01548> (a java.lang.Object)
      at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2944)
      at org.jenkinsci.plugins.lucene.search.databackend.LuceneSearchBackend.updateReader(LuceneSearchBackend.java:177)
      at org.jenkinsci.plugins.lucene.search.databackend.LuceneSearchBackend.storeBuild(LuceneSearchBackend.java:314)
      at org.jenkinsci.plugins.lucene.search.databackend.LuceneSearchBackend.storeBuild(LuceneSearchBackend.java:75)
      at org.jenkinsci.plugins.lucene.search.databackend.SearchBackendManager.storeBuild(SearchBackendManager.java:110)
      at org.jenkinsci.plugins.lucene.search.FreeTextRunListener.onCompleted(FreeTextRunListener.java:24)
      at hudson.model.listeners.RunListener.fireCompleted(RunListener.java:211)
      
      "Executor #0 for U-MainBuildSlave01 : executing trunk-compile2 #1046" #131 daemon prio=5 os_prio=0 tid=0x00007f0f12d58800 nid=0x5bb1 waiting for monitor entry [0x00007f102f082000]
      java.lang.Thread.State: BLOCKED (on object monitor)
      at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2966)
      - waiting to lock <0x00000004d6f01548> (a java.lang.Object)
      at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2944)
      at org.jenkinsci.plugins.lucene.search.databackend.LuceneSearchBackend.updateReader(LuceneSearchBackend.java:177)
      at org.jenkinsci.plugins.lucene.search.databackend.LuceneSearchBackend.removeBuild(LuceneSearchBackend.java:328)
      at org.jenkinsci.plugins.lucene.search.databackend.LuceneSearchBackend.removeBuild(LuceneSearchBackend.java:75)
      at org.jenkinsci.plugins.lucene.search.databackend.SearchBackendManager.removeBuild(SearchBackendManager.java:106)
      at org.jenkinsci.plugins.lucene.search.FreeTextRunListener.onDeleted(FreeTextRunListener.java:32)
      at hudson.model.listeners.RunListener.fireDeleted(RunListener.java:271)
      
      

       

      Looking around for who had the lock, it was some other executor committing to the index, as one might expect:

      "Executor #0 for U-Wottingers : executing jai-imageio-core #479" #47008 daemon prio=5 os_prio=0 tid=0x00007f114de60800 nid=0x4f9a runnable [0x00007f10fe49c000]
      java.lang.Thread.State: RUNNABLE
      at sun.nio.ch.FileDispatcherImpl.force0(Native Method)
      at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76)
      at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:388)
      at org.apache.lucene.util.IOUtils.fsync(IOUtils.java:396)
      at org.apache.lucene.store.FSDirectory.fsync(FSDirectory.java:281)
      at org.apache.lucene.store.FSDirectory.sync(FSDirectory.java:226)
      at org.apache.lucene.store.LockValidatingDirectoryWrapper.sync(LockValidatingDirectoryWrapper.java:62)
      at org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:4456)
      at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2874)
      - locked <0x00000004d6f01548> (a java.lang.Object)
      at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2977)
      - locked <0x00000004d6f01548> (a java.lang.Object)
      at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2944)
      at org.jenkinsci.plugins.lucene.search.databackend.LuceneSearchBackend.updateReader(LuceneSearchBackend.java:177)
      at org.jenkinsci.plugins.lucene.search.databackend.LuceneSearchBackend.storeBuild(LuceneSearchBackend.java:314)
      at org.jenkinsci.plugins.lucene.search.databackend.LuceneSearchBackend.storeBuild(LuceneSearchBackend.java:75)
      at org.jenkinsci.plugins.lucene.search.databackend.SearchBackendManager.storeBuild(SearchBackendManager.java:110)
      at org.jenkinsci.plugins.lucene.search.FreeTextRunListener.onCompleted(FreeTextRunListener.java:24)
      at hudson.model.listeners.RunListener.fireCompleted(RunListener.java:211)
      

       

      Thoughts:

      1. Lucene's commit() is an expensive operation, and perhaps it is being called too often. If multiple builds finish at around the same time, it seems like this would commit multiple times, when it could delay until they're all in the index.
      2. It would be nice if indexing jobs didn't hold up build jobs. Perhaps they could be pushed off to a pool of threads which just indexes stuff and immediately return control to the thread to complete the build job.
      3. Lucene these days has other options for getting your results as soon as they're in the index which don't require committing quite so frequently (NRT search), which could be useful for this sort of application.

       

            tobias123 Tobias Olsson
            trejkaz trejkaz
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: