We have got a big Jenkins Set-Up (1 Master, 5 Slaves, each 16 Executor-Slots) and running a lot of independent tests (~800 tests aka "nodes"). After upgrading to Jenkins 2 (LTS 2.46.1) we developed a Jenkins-File for our Continous-Integration Process.
Each test structured similar:
- copying files and previous compiled application from master to slaves
- run application (.exe) (Output is redirected to a file)
- Checking content of the output
- write a junit-result-xml file
If we split up the 800 tests into four stages (each 200 tests) the build works in general, but through the four stages other obvious environmental performance problems occure (e.g.: wait for the end of one stage before starting the new one). For this the build needs approx. 4h.
To avoid the performance issues we try to run all tests (aka "nodes") in parallel instead of splitting them into stages. This results in a big build queue with about 700 tests. With this long queue the build starts much slower and becomes slower exponentially over its duration until a point is reached where one pipeline step (e.g.: "open a new declerative node" needs 1 minute). It seems as something within the "master process" which controls the pipeline is the crux in this matter since the slaves and their executors seem to wait until the "master process" gives them a new command and this needs extremely long. With this setup the build starts slower than the "four stage setup" and doesn't even finish after 40h.
In our opinion the extremly slow Jenkins-Pipeline is caused by an unknown overload of the master process. There seems to be a direct connection between "nodes per stage" and "becoming slower" or "nodes_in_queue" and "becoming slower".
- More heap for the JRE
- Other GC Settings (G1)
- reduced stash-Operations to a minimum
- reduced Output/logging to a minimum
- Sorting build Queue (long tests at the beginning, short tests at the end)
So far all without success. Jenkins still nearly hangs up after a while.