Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-13828

locks-and-latches not release job and infinitely waits for unlock.

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Critical Critical
    • None
    • rhel6,2, jenkins-1.463

      locks-and-latches periodically unrealeases jobs. I not know how debug and how reproduce manually, but i have this bug very periodically. When i loging to jenkins i see one job that cycles with "waititing for 1 minutes" for hours...
      I have 3 build executors and some for queue. One lock is used in 3 jobs...
      Could you advice any help? How to debug? I can install release plugin with debug info and collect on the next reproducing.

          [JENKINS-13828] locks-and-latches not release job and infinitely waits for unlock.

          Bernard Miegemolle added a comment - - edited

          This also happens sometimes to me when I cancel a build of a job that has a lock (but not each time).

          For example, the following build cancellation went well. As mentionned in the logs, the lock was released, and following builds could be executed.

          [INFO] ------------------------------------------------------------------------
          [INFO] BUILD FAILURE
          [INFO] ------------------------------------------------------------------------
          [INFO] Total time: 3:22.655s
          [INFO] Finished at: Wed Apr 10 18:56:34 CEST 2013
          [INFO] channel stopped
          [locks-and-latches] Releasing all the locks
          [locks-and-latches] All the locks released
          Build was aborted
          Aborted by bmiegemolle
          Final Memory: 76M/551M
          [INFO] ------------------------------------------------------------------------
          Finished: ABORTED

          But sometimes when I cancel a build, nothing about lock release is displayed. See following build logs for example:

          Uglifying file: /srv/jenkins/platform/jobs/trunk/workspace/target/classes/web/resources-built/vendor/jquery.fileupload/jquery.fileupload.js
          Uglifying file: /srv/jenkins/platform/jobs/trunk/workspace/target/classes/web/resources-built/vendor/jquery.fileupload/jquery.iframe-transport.js
          Uglifying file: /srv/jenkins/platform/jobs/trunk/workspace/target/classes/web/resources-built/vendor/handlebars/handlebars-1.0.rc.1.js
          Uglifying file: /srv/jenkins/platform/jobs/trunk/workspace/target/classes/web/resources-built/vendor/handlebars/handlebars-1.0.0.beta.6.js
          Uglifying file: /srv/jenkins/platform/jobs/trunk/workspace/target/classes/web/resources-built/vendor/select2/select2.js
          Uglifying file: /srv/jenkins/platform/jobs/trunk/workspace/target/classes/web/resources-built/vendor/jquery.storage/jquery.Storage.js
          Uglifying file: /srv/jenkins/platform/jobs/trunk/workspace/target/classes/web/resources-built/vendor/humane/humane.js
          Uglifying file: /srv/jenkins/platform/jobs/trunk/workspace/target/classes/web/resources-built/vendor/highcharts/highcharts.js
          Build was aborted
          Aborted by bmiegemolle
          Finished: ABORTED

          Following builds remained blocked just after the svn update:

          [locks-and-latches] Checking to see if we really have the locks
          [locks-and-latches] Could not get all the locks... sleeping for 1 minute
          [locks-and-latches] Could not get all the locks... sleeping for 1 minute
          [locks-and-latches] Could not get all the locks... sleeping for 1 minute
          [locks-and-latches] Could not get all the locks... sleeping for 1 minute

          Restarting Jenkins make things work again. I don't have any clue on the conditions that lead to this issue.

          Bernard Miegemolle added a comment - - edited This also happens sometimes to me when I cancel a build of a job that has a lock (but not each time). For example, the following build cancellation went well. As mentionned in the logs, the lock was released, and following builds could be executed. [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 3:22.655s [INFO] Finished at: Wed Apr 10 18:56:34 CEST 2013 [INFO] channel stopped [locks-and-latches] Releasing all the locks [locks-and-latches] All the locks released Build was aborted Aborted by bmiegemolle Final Memory: 76M/551M [INFO] ------------------------------------------------------------------------ Finished: ABORTED But sometimes when I cancel a build, nothing about lock release is displayed. See following build logs for example: Uglifying file: /srv/jenkins/platform/jobs/trunk/workspace/target/classes/web/resources-built/vendor/jquery.fileupload/jquery.fileupload.js Uglifying file: /srv/jenkins/platform/jobs/trunk/workspace/target/classes/web/resources-built/vendor/jquery.fileupload/jquery.iframe-transport.js Uglifying file: /srv/jenkins/platform/jobs/trunk/workspace/target/classes/web/resources-built/vendor/handlebars/handlebars-1.0.rc.1.js Uglifying file: /srv/jenkins/platform/jobs/trunk/workspace/target/classes/web/resources-built/vendor/handlebars/handlebars-1.0.0.beta.6.js Uglifying file: /srv/jenkins/platform/jobs/trunk/workspace/target/classes/web/resources-built/vendor/select2/select2.js Uglifying file: /srv/jenkins/platform/jobs/trunk/workspace/target/classes/web/resources-built/vendor/jquery.storage/jquery.Storage.js Uglifying file: /srv/jenkins/platform/jobs/trunk/workspace/target/classes/web/resources-built/vendor/humane/humane.js Uglifying file: /srv/jenkins/platform/jobs/trunk/workspace/target/classes/web/resources-built/vendor/highcharts/highcharts.js Build was aborted Aborted by bmiegemolle Finished: ABORTED Following builds remained blocked just after the svn update: [locks-and-latches] Checking to see if we really have the locks [locks-and-latches] Could not get all the locks... sleeping for 1 minute [locks-and-latches] Could not get all the locks... sleeping for 1 minute [locks-and-latches] Could not get all the locks... sleeping for 1 minute [locks-and-latches] Could not get all the locks... sleeping for 1 minute Restarting Jenkins make things work again. I don't have any clue on the conditions that lead to this issue.

          Jan Sochna added a comment -

          Hi, I can confirm that this bugs is happening in our environment as well. It usually starts with aborted job that holds a lock. It is aborted either by user or by timeout of the build (Jenkins plugin).
          Is it possible that plugin ignores a notification about cancelled build and therefore keeps the lock?
          We use:
          Locks and Latches plugin v0.6
          Jenkins 1.509.4

          Jan Sochna added a comment - Hi, I can confirm that this bugs is happening in our environment as well. It usually starts with aborted job that holds a lock. It is aborted either by user or by timeout of the build (Jenkins plugin). Is it possible that plugin ignores a notification about cancelled build and therefore keeps the lock? We use: Locks and Latches plugin v0.6 Jenkins 1.509.4

          Arnaud Nauwynck added a comment - - edited

          Hi,

          I also encountered this bug ... and I have implemented a workaround, that avoid restarting my jenkins master.
          The workaround is a groovy script to execute as admin in the web page "https://<<jenkins>>/script"

          ForceUnlockLatch.groovy
          import hudson.plugins.locksandlatches.LockWrapper;
          import java.util.concurrent.locks.ReentrantLock;
          import java.util.concurrent.locks.AbstractOwnableSynchronizer;
          import java.util.concurrent.locks.AbstractQueuedSynchronizer;
          import java.lang.reflect.Field;
          import java.lang.reflect.Method;
          
          String lockName = "xxx-lock-name";
          
          String text = "";
          LockWrapper.DescriptorImpl descr = LockWrapper.DESCRIPTOR;
          // invoke field "backupLocks"
          Field backupLocksField = LockWrapper.DescriptorImpl.class.getDeclaredField("backupLocks");
          backupLocksField.setAccessible(true);
          Map<String,Object> backupLocks = (Map<String,Object>) backupLocksField.get(descr);
          
          ReentrantLock lockObj = (ReentrantLock) backupLocks.get(lockName);
          
          if (lockObj == null) {
            text += "NULL : lock not found";
          } else if (lockObj.isLocked()) {
            text += "*** before Unlock: " + lockObj;
          
            if (! lockObj.isHeldByCurrentThread()) {
             // can not release from another thread... => java.lang.IllegalMonitorStateException
              // invoke "lockObj.sync.setExclusiveOwnerThread(currentThread)";
              Field syncField = lockObj.getClass().getDeclaredField("sync");
              syncField.setAccessible(true);
              AbstractOwnableSynchronizer lockSync = syncField.get(lockObj);
            
              Thread currentThread = Thread.currentThread();
              Method setExclusiveOwnerThreadMethod = AbstractOwnableSynchronizer.class.getDeclaredMethod("setExclusiveOwnerThread", Thread.class);
              setExclusiveOwnerThreadMethod.setAccessible(true);
              setExclusiveOwnerThreadMethod.invoke(lockSync, currentThread);
            }
          
            // *** do unlock ***
            lockObj.unlock();
          
            text += "\n *** after unlock:" + lockObj;
          } else {
            text += "NOT locked : " + lockObj;
          }
          
          text;
          

          Arnaud Nauwynck added a comment - - edited Hi, I also encountered this bug ... and I have implemented a workaround, that avoid restarting my jenkins master. The workaround is a groovy script to execute as admin in the web page "https://<<jenkins>>/script" ForceUnlockLatch.groovy import hudson.plugins.locksandlatches.LockWrapper; import java.util.concurrent.locks.ReentrantLock; import java.util.concurrent.locks.AbstractOwnableSynchronizer; import java.util.concurrent.locks.AbstractQueuedSynchronizer; import java.lang.reflect.Field; import java.lang.reflect.Method; String lockName = "xxx-lock-name" ; String text = ""; LockWrapper.DescriptorImpl descr = LockWrapper.DESCRIPTOR; // invoke field "backupLocks" Field backupLocksField = LockWrapper.DescriptorImpl. class. getDeclaredField( "backupLocks" ); backupLocksField.setAccessible( true ); Map< String , Object > backupLocks = (Map< String , Object >) backupLocksField.get(descr); ReentrantLock lockObj = (ReentrantLock) backupLocks.get(lockName); if (lockObj == null ) { text += "NULL : lock not found" ; } else if (lockObj.isLocked()) { text += "*** before Unlock: " + lockObj; if (! lockObj.isHeldByCurrentThread()) { // can not release from another thread... => java.lang.IllegalMonitorStateException // invoke "lockObj.sync.setExclusiveOwnerThread(currentThread)" ; Field syncField = lockObj.getClass().getDeclaredField( "sync" ); syncField.setAccessible( true ); AbstractOwnableSynchronizer lockSync = syncField.get(lockObj); Thread currentThread = Thread .currentThread(); Method setExclusiveOwnerThreadMethod = AbstractOwnableSynchronizer. class. getDeclaredMethod( "setExclusiveOwnerThread" , Thread .class); setExclusiveOwnerThreadMethod.setAccessible( true ); setExclusiveOwnerThreadMethod.invoke(lockSync, currentThread); } // *** do unlock *** lockObj.unlock(); text += "\n *** after unlock:" + lockObj; } else { text += "NOT locked : " + lockObj; } text;

          rogerdpack added a comment -

          Just ran into something similar--no other jobs going, but when I start one, it says

          [locks-and-latches] Could not get all the locks... sleeping for 1 minute

          infinitely.

          Possibly had something to do with interrupting a job?

          rogerdpack added a comment - Just ran into something similar--no other jobs going, but when I start one, it says [locks-and-latches] Could not get all the locks... sleeping for 1 minute infinitely. Possibly had something to do with interrupting a job?

          Nirav Shah added a comment -

          I too found two different jobs stuck at releasing locks and latches . Both show tests as passed while they report failure on machine restart the next day.

          Send and Verify Raw bytes using Verify COM Packet :: Currently we ... [locks-and-latches] Releasing all the locks
          06:00:32 [locks-and-latches] All the locks released
          06:00:32 FATAL: java.io.IOException: An existing connection was forcibly closed by the remote host
          06:00:33 hudson.remoting.RequestAbortedException: java.io.IOException: An existing connection was forcibly closed by the remote host
          06:00:33 at hudson.remoting.Request.abort(Request.java:295)
          06:00:33 at hudson.remoting.Channel.terminate(Channel.java:814)
          06:00:33 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69)
          06:00:33 at ......remote call to Flo(Native Method)
          06:00:33 at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1356)
          06:00:33 at hudson.remoting.Request.call(Request.java:171)
          06:00:33 at hudson.remoting.Channel.call(Channel.java:751)
          06:00:33 at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:179)
          06:00:33 at com.sun.proxy.$Proxy47.join(Unknown Source)
          06:00:33 at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:979)
          06:00:33 at hudson.tasks.CommandInterpreter.join(CommandInterpreter.java:137)
          06:00:33 at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:97)
          06:00:33 at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:66)
          06:00:33 at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
          06:00:33 at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:770)
          06:00:33 at hudson.model.Build$BuildExecution.build(Build.java:199)
          06:00:33 at hudson.model.Build$BuildExecution.doRun(Build.java:160)
          06:00:33 at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:533)
          06:00:33 at hudson.model.Run.execute(Run.java:1759)
          06:00:33 at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
          06:00:33 at hudson.model.ResourceController.execute(ResourceController.java:89)
          06:00:33 at hudson.model.Executor.run(Executor.java:240)
          06:00:33 Caused by: java.io.IOException: An existing connection was forcibly closed by the remote host
          06:00:33 at sun.nio.ch.SocketDispatcher.read0(Native Method)
          06:00:33 at sun.nio.ch.SocketDispatcher.read(Unknown Source)
          06:00:33 at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
          06:00:33 at sun.nio.ch.IOUtil.read(Unknown Source)
          06:00:33 at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
          06:00:33 at hudson.remoting.SocketChannelStream$1.read(SocketChannelStream.java:35)
          06:00:33 at sun.nio.ch.ChannelInputStream.read(Unknown Source)
          06:00:33 at sun.nio.ch.ChannelInputStream.read(Unknown Source)
          06:00:33 at sun.nio.ch.ChannelInputStream.read(Unknown Source)
          06:00:33 at java.io.InputStream.read(Unknown Source)
          06:00:33 at sun.nio.ch.ChannelInputStream.read(Unknown Source)
          06:00:33 at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:82)
          06:00:33 at java.io.ObjectInputStream$PeekInputStream.peek(Unknown Source)
          06:00:33 at java.io.ObjectInputStream$BlockDataInputStream.peek(Unknown Source)
          06:00:33 at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source)
          06:00:33 at java.io.ObjectInputStream.readObject0(Unknown Source)
          06:00:33 at java.io.ObjectInputStream.readObject(Unknown Source)
          06:00:33 at hudson.remoting.Command.readFrom(Command.java:92)
          06:00:33 at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:70)
          06:00:33 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)

          Nirav Shah added a comment - I too found two different jobs stuck at releasing locks and latches . Both show tests as passed while they report failure on machine restart the next day. Send and Verify Raw bytes using Verify COM Packet :: Currently we ... [locks-and-latches] Releasing all the locks 06:00:32 [locks-and-latches] All the locks released 06:00:32 FATAL: java.io.IOException: An existing connection was forcibly closed by the remote host 06:00:33 hudson.remoting.RequestAbortedException: java.io.IOException: An existing connection was forcibly closed by the remote host 06:00:33 at hudson.remoting.Request.abort(Request.java:295) 06:00:33 at hudson.remoting.Channel.terminate(Channel.java:814) 06:00:33 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69) 06:00:33 at ......remote call to Flo(Native Method) 06:00:33 at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1356) 06:00:33 at hudson.remoting.Request.call(Request.java:171) 06:00:33 at hudson.remoting.Channel.call(Channel.java:751) 06:00:33 at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:179) 06:00:33 at com.sun.proxy.$Proxy47.join(Unknown Source) 06:00:33 at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:979) 06:00:33 at hudson.tasks.CommandInterpreter.join(CommandInterpreter.java:137) 06:00:33 at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:97) 06:00:33 at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:66) 06:00:33 at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20) 06:00:33 at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:770) 06:00:33 at hudson.model.Build$BuildExecution.build(Build.java:199) 06:00:33 at hudson.model.Build$BuildExecution.doRun(Build.java:160) 06:00:33 at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:533) 06:00:33 at hudson.model.Run.execute(Run.java:1759) 06:00:33 at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43) 06:00:33 at hudson.model.ResourceController.execute(ResourceController.java:89) 06:00:33 at hudson.model.Executor.run(Executor.java:240) 06:00:33 Caused by: java.io.IOException: An existing connection was forcibly closed by the remote host 06:00:33 at sun.nio.ch.SocketDispatcher.read0(Native Method) 06:00:33 at sun.nio.ch.SocketDispatcher.read(Unknown Source) 06:00:33 at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source) 06:00:33 at sun.nio.ch.IOUtil.read(Unknown Source) 06:00:33 at sun.nio.ch.SocketChannelImpl.read(Unknown Source) 06:00:33 at hudson.remoting.SocketChannelStream$1.read(SocketChannelStream.java:35) 06:00:33 at sun.nio.ch.ChannelInputStream.read(Unknown Source) 06:00:33 at sun.nio.ch.ChannelInputStream.read(Unknown Source) 06:00:33 at sun.nio.ch.ChannelInputStream.read(Unknown Source) 06:00:33 at java.io.InputStream.read(Unknown Source) 06:00:33 at sun.nio.ch.ChannelInputStream.read(Unknown Source) 06:00:33 at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:82) 06:00:33 at java.io.ObjectInputStream$PeekInputStream.peek(Unknown Source) 06:00:33 at java.io.ObjectInputStream$BlockDataInputStream.peek(Unknown Source) 06:00:33 at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source) 06:00:33 at java.io.ObjectInputStream.readObject0(Unknown Source) 06:00:33 at java.io.ObjectInputStream.readObject(Unknown Source) 06:00:33 at hudson.remoting.Command.readFrom(Command.java:92) 06:00:33 at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:70) 06:00:33 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)

          Mark Waite added a comment -

          The Hudson locks and latches plugin is not distributed by the Jenkins update center. It was proposed for deprecation in 2015 and has known blocking issues. It prevents the saving of Jenkins job configurations in Jenkins 2.277.1 and later.

          If someone adopted the plugin, updated it to work with Jenkins 2.277.1 and later, and released a new version, then this issue report could be reopened.

          Mark Waite added a comment - The Hudson locks and latches plugin is not distributed by the Jenkins update center. It was proposed for deprecation in 2015 and has known blocking issues. It prevents the saving of Jenkins job configurations in Jenkins 2.277.1 and later. If someone adopted the plugin, updated it to work with Jenkins 2.277.1 and later, and released a new version, then this issue report could be reopened.

            Unassigned Unassigned
            integer Kanstantsin Shautsou
            Votes:
            6 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:
              Resolved: