For my nightly integration test Jenkins environment, I have a "driver" job which is responsible for spawning about 50 of a "runner" jobs which has set parameters, one per test. This "runner" job is set to run concurrently on two nodes, one with 5 executors and the other with 3 executors.
The runner job has the following components worth noting...
- Injects environment properties based on <node>.<executor> for the job to use to determine its own database
- Using a label to distribute runner job to two specific nodes
- Mercurial SCM
- Triggering builds remotely using an authentication token
- Deleting workspace before the build starts
- Pre-build Actions
- Trigger another project (blocking) - This job sets up the determined schema
- Maven Build (default Maven build project step): clean install -U -C -P<profile> -pl <module> -D<params>
- Post-build Actions
- Groovy Postbuild: Setting some build short text
- Publish HTML Reports
- Editable e-mail notification
The "runner" jobs can take anywhere from 45 to 75 minutes, depending on what each test is doing and how long the blocking import takes (if it needs to rebuild the schema or just reset + import data).
I have noticed that intermittently, but pretty much at least one per night, one execution of the "runner" job will report as unstable even though the console and reports show as successful. Drilling into the job console output, it shows as successful. Drilling into the module console output, it shows the output of a different execution of the same job (one example being about 10 minutes earlier based on console output time).