• Icon: Bug Bug
    • Resolution: Won't Fix
    • Icon: Critical Critical
    • git-plugin
    • FreeBSD 10.1
      Jenkins 1.598
      git 2.3.5
      git-client 1.16.1
      scm-api 0.2

      Our cron-based polling does not work. The UI correctly seems to say when the next and previous polling times will/would be, but the polling log isn't updated. Sometimes it's more than 3 days behind.

      I found this in the threadDump page:

      Jenkins cron thread
      
      "Jenkins cron thread" Id=25 Group=main WAITING on java.util.TaskQueue@1c75ef13
      	at java.lang.Object.wait(Native Method)
      	-  waiting on java.util.TaskQueue@1c75ef13
      	at java.lang.Object.wait(Object.java:503)
      	at java.util.TimerThread.mainLoop(Timer.java:526)
      	at java.util.TimerThread.run(Timer.java:505)
      
      jenkins.util.Timer [#10]
      
      "jenkins.util.Timer [#10]" Id=78 Group=main WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@7f92c0bf
      	at sun.misc.Unsafe.park(Native Method)
      	-  waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@7f92c0bf
      	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
      	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1085)
      	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:807)
      	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:745)
      
      jenkins.util.Timer [#1]
      
      "jenkins.util.Timer [#1]" Id=26 Group=main WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@7f92c0bf
      	at sun.misc.Unsafe.park(Native Method)
      	-  waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@7f92c0bf
      	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
      	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1085)
      	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:807)
      	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:745)
      
      jenkins.util.Timer [#2]
      
      "jenkins.util.Timer [#2]" Id=63 Group=main WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@7f92c0bf
      	at sun.misc.Unsafe.park(Native Method)
      	-  waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@7f92c0bf
      	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
      	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1085)
      	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:807)
      	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:745)
      
      jenkins.util.Timer [#3]
      
      "jenkins.util.Timer [#3]" Id=66 Group=main TIMED_WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@7f92c0bf
      	at sun.misc.Unsafe.park(Native Method)
      	-  waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@7f92c0bf
      	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
      	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1090)
      	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:807)
      	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:745)
      

          [JENKINS-27677] SCM Polling does not run

          Mark Waite added a comment -

          Can you provide more details to describe conditions which may be causing the problem?

          Does the problem persist if you switch to the long term support Jenkins version (currently 1.596.3)?

          Does the problem persist on a new installation of the latest Jenkins version?

          What is the JDK version running on your FreeBSD system?

          Mark Waite added a comment - Can you provide more details to describe conditions which may be causing the problem? Does the problem persist if you switch to the long term support Jenkins version (currently 1.596.3)? Does the problem persist on a new installation of the latest Jenkins version? What is the JDK version running on your FreeBSD system?

          It ran twice on a Saturday while that's not possible according to one of the crons:

          H/30 7-22 * * 1-5
          

          The server time is correct.

          This is the whole System Information page: https://gist.github.com/frvge/28884e6755a2610ec2cb (removed a few IPs).

          We upgraded to the version with the new build layout so I'd prefer not to downgrade. With > 500 jobs I don't want to take chances that something goes wrong. I can do it, but only as a last resort.

          Jenkins is running in a screen session. Maybe that also has something to do with it?

          Frank van Gemeren added a comment - It ran twice on a Saturday while that's not possible according to one of the crons: H/30 7-22 * * 1-5 The server time is correct. This is the whole System Information page: https://gist.github.com/frvge/28884e6755a2610ec2cb (removed a few IPs). We upgraded to the version with the new build layout so I'd prefer not to downgrade. With > 500 jobs I don't want to take chances that something goes wrong. I can do it, but only as a last resort. Jenkins is running in a screen session. Maybe that also has something to do with it?

          Update:
          We just did maintenance on our gitlab server, including a start/stop of the program, and suddenly all the polling / triggers started working again. This meant a big big big queue. (which failed because we couldn't checkout the code of course).

          The interesting part is that a timed job, configured like

          58 23 * * *
          

          also ran again automatically. Its running time is also not consistent with that schedule:

          Failed > Console Output#1​6 Apr 1, 2015 6:21 PM
          Failed > Console Output#1​5 Apr 1, 2015 6:05 PM
          Failed > Console Output#1​4 Apr 1, 2015 6:04 PM
          Success > Console Output#1​3 Apr 1, 2015 7:38 AM
          Success > Console Output#1​2 Mar 30, 2015 10:24 PM
          Success > Console Output#1​1 Mar 29, 2015 11:22 AM
          Success > Console Output#1​0 Mar 27, 2015 9:43 PM
          Success > Console Output#9 Mar 26, 2015 11:18 AM
          Success > Console Output#8 Mar 25, 2015 4:22 AM
          Success > Console Output#7 Mar 23, 2015 7:16 PM
          Success > Console Output#6 Mar 22, 2015 10:23 AM
          Success > Console Output#5 Mar 21, 2015 1:57 AM
          Success > Console Output#4 Mar 20, 2015 12:59 PM
          Success > Console Output#3 Mar 19, 2015 2:40 AM
          Success > Console Output#2 Mar 17, 2015 8:45 PM
          Success > Console Output#1 Mar 16, 2015 6:05 PM
          

          Frank van Gemeren added a comment - Update: We just did maintenance on our gitlab server, including a start/stop of the program, and suddenly all the polling / triggers started working again. This meant a big big big queue. (which failed because we couldn't checkout the code of course). The interesting part is that a timed job, configured like 58 23 * * * also ran again automatically. Its running time is also not consistent with that schedule: Failed > Console Output#1​6 Apr 1, 2015 6:21 PM Failed > Console Output#1​5 Apr 1, 2015 6:05 PM Failed > Console Output#1​4 Apr 1, 2015 6:04 PM Success > Console Output#1​3 Apr 1, 2015 7:38 AM Success > Console Output#1​2 Mar 30, 2015 10:24 PM Success > Console Output#1​1 Mar 29, 2015 11:22 AM Success > Console Output#1​0 Mar 27, 2015 9:43 PM Success > Console Output#9 Mar 26, 2015 11:18 AM Success > Console Output#8 Mar 25, 2015 4:22 AM Success > Console Output#7 Mar 23, 2015 7:16 PM Success > Console Output#6 Mar 22, 2015 10:23 AM Success > Console Output#5 Mar 21, 2015 1:57 AM Success > Console Output#4 Mar 20, 2015 12:59 PM Success > Console Output#3 Mar 19, 2015 2:40 AM Success > Console Output#2 Mar 17, 2015 8:45 PM Success > Console Output#1 Mar 16, 2015 6:05 PM

          Mark Waite added a comment -

          I wonder if that hints the polling has started but not completed? The calls to command line git from the git plugin attempt to never have the command line git block, but there seem to be more ways to block than there are techniques to prevent blocking.

          You could check the polling log of the jobs to see if the polling is in progress. Alternately, you could check the processes on the master server to see if there are many git processes running which have existed for a long time.

          Mark Waite added a comment - I wonder if that hints the polling has started but not completed? The calls to command line git from the git plugin attempt to never have the command line git block, but there seem to be more ways to block than there are techniques to prevent blocking. You could check the polling log of the jobs to see if the polling is in progress. Alternately, you could check the processes on the master server to see if there are many git processes running which have existed for a long time.

          There are a few long running git processes. Example:

          jenkins 74321    0.0  0.0    52568     4484  1  I+   16Mar15       0:00.03 ssh git@git.company.com git-upload-pack 'repoowner/reponame.git'
          

          Frank van Gemeren added a comment - There are a few long running git processes. Example: jenkins 74321 0.0 0.0 52568 4484 1 I+ 16Mar15 0:00.03 ssh git@git.company.com git-upload-pack 'repoowner/reponame.git'

          I've updated to 1.607 and we've changed some crons to a simpler

          */15 * * *
          

          I'll let you know.

          Frank van Gemeren added a comment - I've updated to 1.607 and we've changed some crons to a simpler */15 * * * I'll let you know.

          It works on some projects now, but it's still way below what we expect. Our main project with a lot of commits hasn't run since last Friday.

          Might be related to: https://issues.jenkins-ci.org/browse/JENKINS-26208 . I'll upgrade to 1.608.

          Frank van Gemeren added a comment - It works on some projects now, but it's still way below what we expect. Our main project with a lot of commits hasn't run since last Friday. Might be related to: https://issues.jenkins-ci.org/browse/JENKINS-26208 . I'll upgrade to 1.608.

          Frank van Gemeren added a comment - Is there a way to check the output of "isClogged()" in a log? Like https://github.com/jenkinsci/jenkins/blob/608517e187cb5bd1566b1c3728a4df0f7ac4dd5c/core/src/main/java/hudson/triggers/SCMTrigger.java#L225

          Just saw this in the log. This was in a merge request. We use the Gitlab Merge Request Builder plugin for that.

          Apr 07, 2015 3:59:42 PM hudson.triggers.Trigger checkTriggers
          WARNING: org.jenkinsci.plugins.gitlab.GitlabBuildTrigger.run() failed for hudson.model.FreeStyleProject@225cf060[myproject]
          java.lang.NullPointerException
                  at org.jenkinsci.plugins.gitlab.GitlabBuildTrigger.run(GitlabBuildTrigger.java:100)
                  at hudson.triggers.Trigger.checkTriggers(Trigger.java:265)
                  at hudson.triggers.Trigger$Cron.doRun(Trigger.java:214)
                  at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:51)
                  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
                  at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
                  at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
                  at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
                  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
                  at java.lang.Thread.run(Thread.java:745)
          

          Frank van Gemeren added a comment - Just saw this in the log. This was in a merge request. We use the Gitlab Merge Request Builder plugin for that. Apr 07, 2015 3:59:42 PM hudson.triggers.Trigger checkTriggers WARNING: org.jenkinsci.plugins.gitlab.GitlabBuildTrigger.run() failed for hudson.model.FreeStyleProject@225cf060[myproject] java.lang.NullPointerException at org.jenkinsci.plugins.gitlab.GitlabBuildTrigger.run(GitlabBuildTrigger.java:100) at hudson.triggers.Trigger.checkTriggers(Trigger.java:265) at hudson.triggers.Trigger$Cron.doRun(Trigger.java:214) at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:51) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang. Thread .run( Thread .java:745)

          We're now at 1.609 . Our slaves are also updated to the latest slave.jar . Issue persists. Would like to be able to see if the polling thread is clogged up or not.

          Frank van Gemeren added a comment - We're now at 1.609 . Our slaves are also updated to the latest slave.jar . Issue persists. Would like to be able to see if the polling thread is clogged up or not.

          Is there an update on this?

          We're upgrading our Jenkins-Job-Builder program soon. Some issues that were fixed had to do with polling. Even if they caused it, which isn't certain until we've tested it, I'd like to see a list of the "clogged" polls in Jenkins.

          Frank van Gemeren added a comment - Is there an update on this? We're upgrading our Jenkins-Job-Builder program soon. Some issues that were fixed had to do with polling. Even if they caused it, which isn't certain until we've tested it, I'd like to see a list of the "clogged" polls in Jenkins.

          Mark Waite added a comment -

          I am not doing any investigation or work on this question.

          Mark Waite added a comment - I am not doing any investigation or work on this question.

          What can I do to change that? It's kinda annoying when we're moving to continuous integration and only a handful of our jobs that poll work (and even then not as often as they should).

          Frank van Gemeren added a comment - What can I do to change that? It's kinda annoying when we're moving to continuous integration and only a handful of our jobs that poll work (and even then not as often as they should).

          Mark Waite added a comment -

          Unfortunately, I don't think there is much you can do to change that (at least for me). Sorry about that. I think the investigation work will need to come from you.

          I can't duplicate your problem.

          I don't have a FreeBSD system in my git plugin / git client plugin test environment (only Windows 7 x86, Windows 7 x64, Windows 8.1 x64, Windows Home Server 2011 x64, Debian 6 x86, Debian 7 x64, Debian 8 x64, Ubuntu 14.04 x64, CentOS 6 x86, and CentOS 7 x64).

          I don't have a GitLab server in my test environment (only GitHub, gitweb, bitbucket, sourceforge, and a few others).

          I'm a volunteer who contributes to the plugins on my personal time, so you'd need to persuade me to use my personal time to work on a problem which I can't duplicate, and which has hints that it may be related to a local configuration problem. Some of the things I find most persuasive are things like:

          • Expand the understanding of the platforms where the bug happens and does not happen, so that its impact to the larger community is more clear. If, for example, the problem also appears when using GitHub or bitbucket or a git protocol server on CentOS or Debian, that makes it more interesting to me
          • Expand the understanding of the versions where the bug happens and does not happen, so that its impact to the larger community is more clear. If, for example, the problem also appears when using the most recent long term support release, that is more interesting to me, since I'm a user of the long term support release
          • Expand the understanding of the conditions where the problem happens and does not happen, so that its impact to the larger community is more clear. If, for example, the problem does not appear when using git plugin 2.3.4 and does appear in 2.3.5, then that is very interesting, since it hints that there was a plugin version which behaved more the way you wanted

          If you'd rather have a more direct form of persuasion, you might be able to persuade others to investigate the problem by paying them to investigate. There is also freedomsponsors.org which provides a way to offer to fund a bug fix or investigation. I don't think my employer will allow me to take money for my personal development activities, so neither of those techniques will persuade me. It's not a personal thing, nor is it me saying that the problem you've found is not a bug.

          I hope that my description is not viewed as an attempt to offend or rebuff your concern. It is not. I'm trying to help you see which things might persuade others to help, and which will not.

          Mark Waite added a comment - Unfortunately, I don't think there is much you can do to change that (at least for me). Sorry about that. I think the investigation work will need to come from you. I can't duplicate your problem. I don't have a FreeBSD system in my git plugin / git client plugin test environment (only Windows 7 x86, Windows 7 x64, Windows 8.1 x64, Windows Home Server 2011 x64, Debian 6 x86, Debian 7 x64, Debian 8 x64, Ubuntu 14.04 x64, CentOS 6 x86, and CentOS 7 x64). I don't have a GitLab server in my test environment (only GitHub, gitweb, bitbucket, sourceforge, and a few others). I'm a volunteer who contributes to the plugins on my personal time, so you'd need to persuade me to use my personal time to work on a problem which I can't duplicate, and which has hints that it may be related to a local configuration problem. Some of the things I find most persuasive are things like: Expand the understanding of the platforms where the bug happens and does not happen, so that its impact to the larger community is more clear. If, for example, the problem also appears when using GitHub or bitbucket or a git protocol server on CentOS or Debian, that makes it more interesting to me Expand the understanding of the versions where the bug happens and does not happen, so that its impact to the larger community is more clear. If, for example, the problem also appears when using the most recent long term support release, that is more interesting to me, since I'm a user of the long term support release Expand the understanding of the conditions where the problem happens and does not happen, so that its impact to the larger community is more clear. If, for example, the problem does not appear when using git plugin 2.3.4 and does appear in 2.3.5, then that is very interesting, since it hints that there was a plugin version which behaved more the way you wanted If you'd rather have a more direct form of persuasion, you might be able to persuade others to investigate the problem by paying them to investigate. There is also freedomsponsors.org which provides a way to offer to fund a bug fix or investigation. I don't think my employer will allow me to take money for my personal development activities, so neither of those techniques will persuade me. It's not a personal thing, nor is it me saying that the problem you've found is not a bug. I hope that my description is not viewed as an attempt to offend or rebuff your concern. It is not. I'm trying to help you see which things might persuade others to help, and which will not.

          Hi Mark,

          Thank you for your answer. I understand your position. I'll see if I can get approval for using the freedomsponsors option. We also have an in-house Java team so maybe I can use them to debug this further.

          In the meantime I have another, related, question: I just noticed that there's a hudson.triggers.SCMTrigger.starvationThreshold "hidden" feature with its default set to 1 hour (description is "Milliseconds waiting for polling executor before trigger reports it is clogged"). We do a lot of polling, so would extending it to 2 hours or more potentially fix this? If that's the case, maybe it makes sense to use a separate polling thread in the Jenkins core?

          Frank van Gemeren added a comment - Hi Mark, Thank you for your answer. I understand your position. I'll see if I can get approval for using the freedomsponsors option. We also have an in-house Java team so maybe I can use them to debug this further. In the meantime I have another, related, question: I just noticed that there's a hudson.triggers.SCMTrigger.starvationThreshold "hidden" feature with its default set to 1 hour (description is "Milliseconds waiting for polling executor before trigger reports it is clogged"). We do a lot of polling, so would extending it to 2 hours or more potentially fix this? If that's the case, maybe it makes sense to use a separate polling thread in the Jenkins core?

          Mark Waite added a comment -

          frvge I don't know the code related to starvation threshold, so I can't help with that at all. If polling frequency is the concern, you may want to read the "polling must die" blog posting from Kohsuke Kawaguchi. He describes a technique which reduces polling dramatically and also reduces the time between a change being submitted and related jobs starting.

          Mark Waite added a comment - frvge I don't know the code related to starvation threshold, so I can't help with that at all. If polling frequency is the concern, you may want to read the " polling must die " blog posting from Kohsuke Kawaguchi. He describes a technique which reduces polling dramatically and also reduces the time between a change being submitted and related jobs starting.

          Hi Mark, we currently prefer not to run via webhooks, because our tests (and deployments) take 20 minutes, and we get a few pushes per minute. In other words: we didn't want to overload our slaves (16 executors in total).

          We will re-evaluate a web-hook like system and also look into Gerrit and Zuul to limit the runtime.

          Thanks for your time. The ticket can be suspended.

          Frank van Gemeren added a comment - Hi Mark, we currently prefer not to run via webhooks, because our tests (and deployments) take 20 minutes, and we get a few pushes per minute. In other words: we didn't want to overload our slaves (16 executors in total). We will re-evaluate a web-hook like system and also look into Gerrit and Zuul to limit the runtime. Thanks for your time. The ticket can be suspended.

          Mark Waite added a comment -

          Don't plan to fix this report. I can't duplicate it.

          Mark Waite added a comment - Don't plan to fix this report. I can't duplicate it.

            ndeloof Nicolas De Loof
            frvge Frank van Gemeren
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: