I have reproduceably experienced the issue with Jenkins 2.249.3
It seems to occur when there is more work than executors are available for it to be done.
When a new job is started due to SCM changes, the job is started, but waits for a new executor. After one minute (polling interval), the SCM change is still open, triggering another job, starting the pipeline but waiting for an executor to begin with the steps in it, including checkout, to do.
When multiple jobs are running in parallel, the load gets higher, thus each job needs more time to complete and the time to wait for an executor gets longer, thus the problem gets worse with the time, accumulating queued jobs.
Increasing the number of executors can lead to Out Of Memory issues (with the job I run), thus cannot be recommended as workaround.
I have tested with a quiet period of 300 seconds, which reduces the rate of accumulating duplicated jobs for the same check-in, but it does not help me in building my changes and getting feedback about broken builds fast. I have set the quiet time in the global settings, thus affecting all jobs, even those not affected.
Maybe with a quiet period which is longer than the build time might help as a weak workaround.
Possible solutions could be:
- Allow checking out in a pipeline, even when no executor is available for normal build steps, probably to be configured as an additional executor allowance for SCM actions.
- Not starting a pipeline when no executor is available and ready to execute the first step. The executor must be bound to the pipeline when starting it, so that it does not build something else.
- One executor builds the full pipeline, not only parts of it.
- Limit the number of parallel builds allowed per branch/for the whole project
- Allow starting another build pipeline only when the previously started pipeline reached a specific build step or finished building.
Also limiting the number of queued entries could help. Enqueuing another job could let the job fail instead of leading to a queue congestion. This wouldn't solve the issue, but might help.
Also limiting the time a job may wait for the queue, failing on timeout might help against congestion. This also wouldn't solve the issue, but might help.