Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-68707

Cron Triggers Being Missed During Quiet-down Mode

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • core
    • None

      We're using Jenkins as something like a central scheduler. On a busy night, it might queue and execute around 800 jobs per hour. During maintenance, we put Jenkins into quiet-down and let the jobs queue up while we complete the maintenance.

      Unfortunately, it seems like we're starting to miss dozens of jobs during those quiet-down periods. Some jobs get queued up and placed in queue.xml, but an unknown number of other jobs never make it into the queue and are missed entirely. In the jenkins.log, we see it littered with Young GCs and messages like this:

      2022-05-13T19:39:05.214-0500: [GC (Allocation Failure) [PSYoungGen: 8368320K->15447K(8371200K)] 24305413K->15952724K(31439872K), 0.0728277 secs] [Times: user=0.51 sys=1.16, real=0.07 secs] 
      2022-05-13T19:39:35.244-0500: [GC (Allocation Failure) [PSYoungGen: 8369239K->14256K(8371200K)] 24306516K->15951893K(31439872K), 0.0919080 secs] [Times: user=0.70 sys=1.43, real=0.09 secs] 
      2022-05-13T19:40:04.589-0500: [GC (Allocation Failure) [PSYoungGen: 8368048K->14064K(8371200K)] 24305685K->15952197K(31439872K), 0.1147281 secs] [Times: user=1.54 sys=1.07, real=0.12 secs] 
      2022-05-14 00:40:27.195+0000 [id=123]    WARNING    hudson.triggers.Trigger#checkTriggers: Trigger hudson.triggers.TimerTrigger.run() triggered by hudson.model.FreeStyleProject@560953a3[XXX-REDACTED-XXX] spent too much time (31 sec) in its execution, other timers can be affected
      2022-05-13T19:40:33.942-0500: [GC (Allocation Failure) [PSYoungGen: 8367856K->13920K(8371712K)] 24305989K->15952533K(31440384K), 0.0810411 secs] [Times: user=0.49 sys=1.36, real=0.08 secs] 
      2022-05-13T19:41:03.582-0500: [GC (Allocation Failure) [PSYoungGen: 8368224K->14048K(8371200K)] 24306837K->15952837K(31439872K), 0.0920661 secs] [Times: user=0.45 sys=1.52, real=0.10 secs] 
      2022-05-14 00:41:22.769+0000 [id=123]    WARNING    hudson.triggers.Trigger#checkTriggers: Trigger hudson.triggers.TimerTrigger.run() triggered by hudson.model.FreeStyleProject@7a59040a[XXX-REDACTED-XXX] spent too much time (33 sec) in its execution, other timers can be affected
      2022-05-13T19:41:32.099-0500: [GC (Allocation Failure) [PSYoungGen: 8368352K->14000K(8372224K)] 24307141K->15952989K(31440896K), 0.0732050 secs] [Times: user=0.47 sys=1.19, real=0.08 secs] 
      2022-05-14 00:41:56.047+0000 [id=123]    WARNING    hudson.triggers.Trigger#checkTriggers: Trigger hudson.triggers.TimerTrigger.run() triggered by hudson.model.FreeStyleProject@6651d4a9[XXX-REDACTED-XXX] spent too much time (33 sec) in its execution, other timers can be affected
      2022-05-13T19:42:02.710-0500: [GC (Allocation Failure) [PSYoungGen: 8369328K->15110K(8371712K)] 24308317K->15954251K(31440384K), 0.0830675 secs] [Times: user=0.61 sys=1.30, real=0.09 secs] 
      2022-05-13T19:42:31.207-0500: [GC (Allocation Failure) [PSYoungGen: 8370438K->14624K(8371712K)] 24309579K->15953853K(31440384K), 0.0494647 secs] [Times: user=0.30 sys=0.80, real=0.05 secs] 
      2022-05-13T19:43:00.400-0500: [GC (Allocation Failure) [PSYoungGen: 8369440K->14864K(8371712K)] 24308669K->15954245K(31440384K), 0.0970107 secs] [Times: user=0.50 sys=1.61, real=0.10 secs] 
      2022-05-13T19:43:30.702-0500: [GC (Allocation Failure) [PSYoungGen: 8369680K->15424K(8371712K)] 24309061K->15954949K(31440384K), 0.1055369 secs] [Times: user=0.48 sys=1.96, real=0.10 secs] 
      2022-05-13T19:44:02.834-0500: [GC (Allocation Failure) [PSYoungGen: 8370240K->14368K(8371712K)] 24309765K->15954333K(31440384K), 0.0893574 secs] [Times: user=0.47 sys=1.55, real=0.09 secs] 
      2022-05-13T19:44:34.046-0500: [GC (Allocation Failure) [PSYoungGen: 8369184K->14592K(8371712K)] 24309149K->15954790K(31440384K), 0.0817069 secs] [Times: user=0.58 sys=1.29, real=0.08 secs] 
      2022-05-13T19:45:02.639-0500: [GC (Allocation Failure) [PSYoungGen: 8369408K->12896K(8371712K)] 24309606K->15954397K(31440384K), 0.0787259 secs] [Times: user=0.48 sys=1.28, real=0.08 secs] 
      2022-05-13T19:45:32.443-0500: [GC (Allocation Failure) [PSYoungGen: 8367712K->13334K(8371712K)] 24309213K->15955131K(31440384K), 0.1156263 secs] [Times: user=1.06 sys=1.61, real=0.11 secs] 

      This started happening after only 40 minutes or so in quiet-down and about 600 jobs in the queue.

      I wish I had more logging for you on this, but it's the lack of any message in this situation that is the tricky part – we simply miss cron timers and never receive job results.

      Any idea on what might be going on here?

            Unassigned Unassigned
            esnewmanium Kevin Newman
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: