When requesting many nodes at once, provisioning locks Queue and prevents regular calls to maintain. Jobs are stuck for minutes.
In our case we have jobs which request 100+ nodes of different type from k8s. Each REST call to k8s api takes ~2sec. All of them are executed within one withLock which basically blocks everything else from happening on jenkins for that time. To make it worse it seems it then recurses down and does the same again.
It even gets worse when cluster is at high load and pods can not be scheduled anymore, then it seems like waiting for the pod startup timeout also adds to the time.
As soon as the nodes are available or load decreases, calls to maintain get back to normal levels.