Within pipeline builds, shell steps randomly fail with an unspecific java.lang.InterruptedException, a full stack trace is listed below.
Unfortunately, this happens often enough to be a major issue within our development process since negative build results cannot be trusted and builds of multi-hour length might have to be retriggered multiple times.
Since we cannot reliably trigger the issue, I cannot provide an minimal example for reproduction. This is especially painful since all debugging has to happen in production.
- Our slaves are started dynamically using the swarm plugin
- The orchestration of these slaves is handled by a shared library, the respective step is [available on github]
- We've only seen the exception occur on shell steps, other steps do not seem to throw (although not many were tested)
- Only the first shell step might throw, if it succeeds the others will be fine
- master can catch the Exception and continue with error handling
Complete stack trace: