From what I've read, it's the incorrect counting of the runnable workload that's causing this issue - it may well be that the fix for JENKINS-27034 will help fix this issue (or perhaps even fix this problem entirely).
i.e. This issue may just be a symptom of JENKINS-27034.
Also, I would not consider time spent fixing Queue/Cloud/NodeProvisioner as time wasted - that's all core cloud functionality that's used to provide executors by all cloud plugins (e.g. we use docker, vSphere and OpenStack; there are others).
I appreciate that dockerNode is useful, but pipeline-specified one-shot nodes aren't the answer to everything. When it takes a long time for a node to start up (e.g. fully featured VMs rather than lightweight containers), it's important to have clouds configured to supply nodes (with a retention strategy that is not "one shot") in order to maintain build throughput.
FYI I didn't encounter this issue via the docker-plugin; I noticed this because the Jenkins core was asking the vsphere-plugin for new nodes (where dockerNode isn't a viable replacement) and I was monitoring my vSphere cloud at the time. There may well have been OpenStack and Docker nodes being created as well (but I wasn't monitoring those at the time).
Changing NodeProvisioner would create a deadlock situation with NonBlockingTasks, such as a Matrix Build. Their slaves may never get created. I think it would be more appropriate to modify the behaviour of countBuildable*() in Queue to only count tasks that are not blocked by shutdown.
I have another pull request manipulating countBuildable, I may make a pull request for this after that one gets accepted.