-
Improvement
-
Resolution: Unresolved
-
Minor
-
None
We usually catch thread dumps (a set of them) in Jenkins when we face performance issues. However, when the performance issue is intermittent or the Jenkins instance is restarted automatically (like on K8s), collecting the performance data is not easy. So, it would be great to collect this thread dump set automatically during the outage/performance issues episode and include it in the support bundles.
Knowning that we have a feature that capture the slow requests (requests that consume more than 10 seconds, by default, in the support-core-plugin, we could take advantage of this feature to introduce the logic of getting the thread-dump (considering that you will face performance issue when you have some records of slow-request at the same time).
The new feature should not affect the performance of the system (or the effect should be minimum), and it should follow the below prerequisite:
- The threaddump collecting should not happen concurrently. If we are running one threaddump collecting we should not start another one if we find more slow request at the same time.
- To avoid performance issue caused by this new feature, we should restrict the number of collecting by time (for example, only one collecting each 10 minutes).
- We should be able to configure the above time and other parameters, like how many thread dumps will be caught during a collecting operation or how frequently those thread dumps are caught.
- We should be able to disable this feature (on the fly).
- We should be able to download the thread dumps caught in the support bundle (and having an check-box to include them or not).
- We should be able to decide how many slow requests triggers the collecting operation.
- We should limit the number of thread dump files to avoid storage problems.
Potential default values of the above parameters could be:
- RECURRENCE_PERIOD_MIN = 10 (How often (at minimum) we will capture the ThreadDump under a slowRequest scenario)
- MINIMAL_SLOW_REQUEST_COUNT = 5 (The minimal number of SlowRequest found at the same time (in the last 3 seconds) to trigger the ThreadDump generation)
- TOTAL_ITERATIONS = 4 (Number of ThreadDump that will be generated during the slowRequest scenario)
- FREQUENCY_SEC = 5 (Time in seconds that we will wait between the ThreadDump generations (under the same slowRequest check))
- SLOW_REQUEST_THREAD_DUMPS_TO_RETAIN = 40 (Limit the number of thread dumps to retain on slowRequest scenario)
- DISABLED = false (Provide a means to disable the slow request slow request checker)
- links to