-
Improvement
-
Resolution: Unresolved
-
Major
-
None
-
All . . .
So we have some 20 Jenkins instances each with 1 to 400 jobs . . . total number of jobs . . . 1000 to 1500.
Each of the jobs is setup to Poll Perforce at "H/5 * * * *".
The other thing is that these jobs all have in common is they all have a common "Poll Exclude File(s)":
//source/qcom/qct/modem/arch/hub/...
This setup has been in place for some two years, but the number of systems and jobs is constantly growing.
What appears to have happened today, is that this usually quiet, but important location had like four checkins within like 10 minutes.
So each job at its own time polled perforce . . . and as part of that then did a "describe" on those four change list to see if they can be ignored . . . of course they could be . . . so nothing happened.
So a rush of traffic . . . in the course of 5 minutes 1500 polls and 6000 Describes.
Worse yet, for 98%+ of those jobs, because this was an "exclude" they didn't kick off.
To add insult to injury, during the next five minute cycle it all happened again . . . 1500 plus 6000 . . .
Our Perforce server eventually sank under the weight of the request load.
What would be nice, is having described the change list . . . cache that information so on the next polling cycle you don't have to ask about the four change lists AGAIN and AGAIN every five minutes until the job builds.
Seems to me the cache could be really simple: CL#1234 is not interesting to this job . . . you don't need to save all the other information, just the result that the change list can be safely ignored.
Yes, the implementation of such "cache" makes sense.