-
Improvement
-
Resolution: Unresolved
-
Minor
-
None
-
Jenkins 2.x (several x > 200)
lockable-resources-plugin 2.13 and earlier
I maintain some pipelines where lockable resources allow us to dedicate and isolate SUT appliances, and sometimes developers see a test broken and want to reserve the SUT for post-mortem investigation before some other test deletes everything. Interactively this is addressed by the Reassign/Steal Lock button from https://github.com/jenkinsci/lockable-resources-plugin/pull/144 (because otherwise unlocking and reserving with original plugin code risked that someone from the queue would get this resource to use); but when we tried to automate similar behavior with pipeline code (e.g. to handle test failures and leave SUTs reserved for investigation if a build argument says to do so), things became funny.
Essentially, we can set a reservation on a resource received in the lock step (directly using the LockableResource object, the LRM forbids this), and expect that the reserved resource may not be locked by the LockableResourcesManager when it is later seen as a suitable candidate. Typical code structure is:
def LOCK_OBJ = null def LRM = org.jenkins.plugins.lockableresources.LockableResourcesManager.get() lock(label: "SUTtype", variable: 'LOCK_NAME') { LOCK_OBJ = LRM.fromName(env.LOCK_NAME) LOCK_OBJ.setReservedBy('keep after unlock') } // some other code uses dedicated LOCK_OBJ // Finally later - maybe even in another job LRM.unreserve([LOCK_OBJ]) // or LOCK_OBJ.unReserve() until recently - does not cut it well
In practice however, we found that:
- sometimes the lockable resource was grabbed by another `lock()` request even though it remained reserved
- sometimes the un-reserved resource remained available and jobs waiting for it to become free did not proceed; manually locking and unlocking that resource made it actually usable for those jobs
Investigation led to code, and so a PR was born: https://github.com/jenkinsci/lockable-resources-plugin/pull/279
Notable points include:
- original method to checkResourcesAvailability() did not consider whether resource isReserved as long as it is listed among lockedResourcesAboutToBeUnlocked – and so led to immediate re-use of reserved resources IFF there was already a queued request waiting which such resource matched
- the method also behaved identically for un-locking and un-reserving resources (had no way to differentiate), so a quick fix to just consider isReserved() which did help against "losing" the reservation during unlock(), did not help "un-stuck" the waiting jobs when the resource was finally un-reserved as well
- LockableResource methods such as unReserve(), reset() and setBuild(null) for "unlock()" effect, reasonably only changed fields of the resource instance and did not deal with LRM for the bigger picture => added recycle() methods in both LR and LRM classes to help announce that the resource may be re-used instantly
- fixes confirmed by added tests (in TDD style, initially tests reproduced and confirmed the bugs I was hunting)
On a side note, our work in fact might benefit from https://github.com/jenkinsci/lockable-resources-plugin/pull/64 since we do what people in that PR argued against - "sharing" a lock between several consumers, (which I agree is not safe for the state machine of lockable resources which aims to not leave loose ends), but this does help achieve real-life practical goals - which do involve leaving resources locked/reserved after the job ends, using same resource in different stages or even jobs, etc. - all at end-user risk and discretion.