I find a way to trigger a remoting problem using tcp fault injection with netem. I'm able to trigger this wait call at hudson.remoting.Request.call(Request.java:146):
When this wait is triggered, the running build is stuck and consumes a executor. It loops over and over on the wait.
To reproduce, setup a SSH slave using the attached Dockerfile, and setup netem on the docker0 bridge like this:
Testing requires to run the job one time before configuring netem, as netem settings are applied to all network streams, it could fail while downloading Maven dependencies. I just launched a Maven build of a example project to trigger the problem. It might be a Maven specific problem...
To remove netem settings, just run tc qdisc del dev docker0 root.
I've attached the Dockerfile, the command I used to launch it and a threaddump of a Jenkins stuck master.