Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-10131

Git polling shouldn't need a workspace on a slave.

    • Icon: Improvement Improvement
    • Resolution: Fixed
    • Icon: Major Major
    • git-plugin
    • None

      What happened:

      I have two slaves (1 & 2). I took slave 1 down for maintenance and a bunch of old, rarely updated, git builds kicked off. When I checked the git polling log, I saw a message (it's gone now, darn it) that said it was rebuilding to get workspace for git polling.

      A workspace shouldn't be needed.

      I'm unclear what git needs for this, but if you're tracking only a single branch (master) then you just need the HEAD and can compare the SHA1s.

      If it technically really really needs a git checkout, then I'd prefer if they were kept on an assigned host (the jenkins server in my case, master) instead of using the workspaces for this check. I'd want this for a couple reasons:

      • Slaves come and go. Rebuilding all my projects because a slave went down is unproductive.
      • Occasionally, I need to go into a slave and monkey with a workspace to troubleshoot a weird problem. I don't want that to impact polling.
      • It's a waste of space on the slaves. I'd rather control where the space is wasted.

      Ciao!

          [JENKINS-10131] Git polling shouldn't need a workspace on a slave.

          Christian Höltje created issue -

          Okay! I just figured out this bug is what causes massive amounts of builds to kick off whenever a slave goes down. It is also why all builds kick off when I restart Jenkins!

          Lets assume we have a set of jobs, J, that require slaves of label L. If all slaves of label L go down, then there are no workspaces available for J, therefore the git plugin kicks off builds for all jobs J to get a workspace.

          In addition, it seems that sometimes on restart the git plugin kicks off before the slaves come online causing all jobs on the system to kick off!

          Finally, this also breaks the "only bring up slave when needed" functionality. I imagine it breaks the "provision a slave automatically" feature as well.

          This really needs to be fixed. It means any maintenance task renders the build system busy for hours (we have 100s of builds, some of which are very long). It also needlessly bumps version numbers for no obvious reason which causes confusion.

          Christian Höltje added a comment - Okay! I just figured out this bug is what causes massive amounts of builds to kick off whenever a slave goes down. It is also why all builds kick off when I restart Jenkins! Lets assume we have a set of jobs, J, that require slaves of label L. If all slaves of label L go down, then there are no workspaces available for J, therefore the git plugin kicks off builds for all jobs J to get a workspace. In addition, it seems that sometimes on restart the git plugin kicks off before the slaves come online causing all jobs on the system to kick off! Finally, this also breaks the "only bring up slave when needed" functionality. I imagine it breaks the "provision a slave automatically" feature as well. This really needs to be fixed. It means any maintenance task renders the build system busy for hours (we have 100s of builds, some of which are very long). It also needlessly bumps version numbers for no obvious reason which causes confusion.

          Andrew Herron added a comment -

          I've been experimenting with a master/slave Jenkins setup to combine our 4 existing Hudson servers into one, but this issue is going to make it untenable. I don't have an easy way to count but there'd easily be more than 50 builds.

          In my case, the Jenkins master has 0 executors. Any git project, no matter what the slave label target is, hits this problem on my server.

          Interestingly the projects still in SVN don't have this issue; they appear on the build executor list to poll, but when there are no changes the build doesn't trigger. Looks like the git plugin doesn't check for changes until after the build has begun, meaning it would have to cancel if there are no changes.

          Andrew Herron added a comment - I've been experimenting with a master/slave Jenkins setup to combine our 4 existing Hudson servers into one, but this issue is going to make it untenable. I don't have an easy way to count but there'd easily be more than 50 builds. In my case, the Jenkins master has 0 executors. Any git project, no matter what the slave label target is, hits this problem on my server. Interestingly the projects still in SVN don't have this issue; they appear on the build executor list to poll, but when there are no changes the build doesn't trigger. Looks like the git plugin doesn't check for changes until after the build has begun, meaning it would have to cancel if there are no changes.

          Partially fixed by https://github.com/jenkinsci/git-plugin/commit/c750b63ecd527a5721d9d8ee426bff97dd0e7a75
          using git ls-remote thanks to onemanbucket contribution
          (only supports simple cases : single branch, single repo)

          Nicolas De Loof added a comment - Partially fixed by https://github.com/jenkinsci/git-plugin/commit/c750b63ecd527a5721d9d8ee426bff97dd0e7a75 using git ls-remote thanks to onemanbucket contribution (only supports simple cases : single branch, single repo)

          Andrew Herron added a comment -

          Cool!

          I thought the git plugin only supported single branch single repo cases in general though

          Andrew Herron added a comment - Cool! I thought the git plugin only supported single branch single repo cases in general though

          No, it lets you specify a wildcard to pick branches to build.

          In a few releases, we should get rid of the checkbox, too. Excerpt from the IRC log:

          (02:31:03 PM) kohsuke: Any reason this needs to be a checkbox?
          (02:31:16 PM) One-Man-Bucket: i was thinking it could be the default behaviour in the simple use case
          (02:31:18 PM) abayer: kohsuke: I'd like it to not be the default behavior quite yet.
          (02:31:30 PM) kohsuke: I mean, shouldn't it just work automatically?
          (02:31:57 PM) abayer: Actually, I'd be good with having the readResolve make it off and the default for new projects to be on.
          (02:32:05 PM) One-Man-Bucket: as i wrote int he pull request, jgit has ls-remote since a couple of months
          (02:32:22 PM) kohsuke: I think we already have enough switches and knobs in Git configuration
          (02:32:28 PM) kohsuke: IMHO, that is.
          (02:32:45 PM) abayer: One-Man-Bucket: I'm actually meeting with ksawicki of github tomorrow to start talking about gutting/rewriting the git plugin from the ground up, to mainly use jgit among other things. =)
          (02:32:59 PM) One-Man-Bucket: abayer: oh, cool
          (02:33:41 PM) abayer: I'm just wary of potential regressions in upgrades due to having introduced gobs of them myself. =)
          (02:34:13 PM) One-Man-Bucket: abayer: :)
          (02:34:42 PM) kohsuke: OK, I suppose you can take away the checkbox later.
          (02:35:04 PM) One-Man-Bucket: one perk with the checkbox is that you get feedback if the configuration won't support remote poll
          (02:35:14 PM) One-Man-Bucket: which is important for our usecase
          (02:37:06 PM) kohsuke: One-Man-Bucket: by "it just works" I mean switching to ls-remote when and only when you can.
          (02:37:39 PM) kohsuke: IIUC, provided that it works, there's no downside in using ls-remote.
          (02:37:57 PM) One-Man-Bucket: kohsuke: fair enough
          (02:38:38 PM) kohsuke: And also, IIUC, there's no inherent limitation in git ls-remote that you can't support multiple branches, right?
          (02:38:49 PM) kohsuke: It's just that we haven't implemented that yet.
          (02:39:13 PM) One-Man-Bucket: right, it should work for multiple branches as well
          (02:39:37 PM) kohsuke: Anyway, I guess we aren't eliminating options --- as long as folks are willing to drop this checkbox at a later point, I'm happy.
          (02:39:39 PM) abayer: Would it work with branch wildcards?
          (02:39:58 PM) kohsuke: Yeah, you just list all the remote branches and check against the wildcard you got
          (02:40:12 PM) abayer: kohsuke: That's fine by me. One release to make sure it doesn't barf, and in the meantime have an escape hatch.
          (02:40:26 PM) kohsuke: Yes.
          (02:40:57 PM) kohsuke: ("eliminating options" is kind of funny here because we are trying not to eliminate the option of eliminating the option (= the checkbox)
          

          Kohsuke Kawaguchi added a comment - No, it lets you specify a wildcard to pick branches to build. In a few releases, we should get rid of the checkbox, too. Excerpt from the IRC log: (02:31:03 PM) kohsuke: Any reason this needs to be a checkbox? (02:31:16 PM) One-Man-Bucket: i was thinking it could be the default behaviour in the simple use case (02:31:18 PM) abayer: kohsuke: I'd like it to not be the default behavior quite yet. (02:31:30 PM) kohsuke: I mean, shouldn't it just work automatically? (02:31:57 PM) abayer: Actually, I'd be good with having the readResolve make it off and the default for new projects to be on. (02:32:05 PM) One-Man-Bucket: as i wrote int he pull request, jgit has ls-remote since a couple of months (02:32:22 PM) kohsuke: I think we already have enough switches and knobs in Git configuration (02:32:28 PM) kohsuke: IMHO, that is. (02:32:45 PM) abayer: One-Man-Bucket: I'm actually meeting with ksawicki of github tomorrow to start talking about gutting/rewriting the git plugin from the ground up, to mainly use jgit among other things. =) (02:32:59 PM) One-Man-Bucket: abayer: oh, cool (02:33:41 PM) abayer: I'm just wary of potential regressions in upgrades due to having introduced gobs of them myself. =) (02:34:13 PM) One-Man-Bucket: abayer: :) (02:34:42 PM) kohsuke: OK, I suppose you can take away the checkbox later. (02:35:04 PM) One-Man-Bucket: one perk with the checkbox is that you get feedback if the configuration won't support remote poll (02:35:14 PM) One-Man-Bucket: which is important for our usecase (02:37:06 PM) kohsuke: One-Man-Bucket: by "it just works" I mean switching to ls-remote when and only when you can. (02:37:39 PM) kohsuke: IIUC, provided that it works, there's no downside in using ls-remote. (02:37:57 PM) One-Man-Bucket: kohsuke: fair enough (02:38:38 PM) kohsuke: And also, IIUC, there's no inherent limitation in git ls-remote that you can't support multiple branches, right? (02:38:49 PM) kohsuke: It's just that we haven't implemented that yet. (02:39:13 PM) One-Man-Bucket: right, it should work for multiple branches as well (02:39:37 PM) kohsuke: Anyway, I guess we aren't eliminating options --- as long as folks are willing to drop this checkbox at a later point, I'm happy. (02:39:39 PM) abayer: Would it work with branch wildcards? (02:39:58 PM) kohsuke: Yeah, you just list all the remote branches and check against the wildcard you got (02:40:12 PM) abayer: kohsuke: That's fine by me. One release to make sure it doesn't barf, and in the meantime have an escape hatch. (02:40:26 PM) kohsuke: Yes. (02:40:57 PM) kohsuke: ("eliminating options" is kind of funny here because we are trying not to eliminate the option of eliminating the option (= the checkbox)

          Andrew Herron added a comment -

          I've just updated to the new version of the git plugin, and I can confirm the fix works. Shut down all of my slaves and there are no jobs queued

          I really would've preferred it to be on by default, but we have our job configs in SCM so it was a simple search/replace to turn it on for every build.

          Andrew Herron added a comment - I've just updated to the new version of the git plugin, and I can confirm the fix works. Shut down all of my slaves and there are no jobs queued I really would've preferred it to be on by default, but we have our job configs in SCM so it was a simple search/replace to turn it on for every build.

          Andrew Herron added a comment -

          I'm seeing some strange errors in my SCM polling logs now... These errors don't show in the UI, and don't seem to impact builds, but they're still concerning.

          I have 3 long builds showing an error while they are building - this doesn't happen on other active builds though:

          (server name removed)

          [poll] Last Built Revision: Revision a03190f9886de492a997a72c6525d65424618641 (origin/master)
          ERROR: Failed to record SCM polling
          hudson.plugins.git.GitException: Error performing command: git ls-remote -h git@<server> master
          Command "git ls-remote -h git@<server> master" returned status code 128: error: cannot run ssh: No such file or directory
          fatal: unable to fork
          
          	at hudson.plugins.git.GitAPI.launchCommandIn(GitAPI.java:764)
          	at hudson.plugins.git.GitAPI.launchCommand(GitAPI.java:729)
          	at hudson.plugins.git.GitAPI.getHeadRev(GitAPI.java:1043)
          	at hudson.plugins.git.GitSCM.compareRemoteRevisionWith(GitSCM.java:624)
          	at hudson.scm.SCM._compareRemoteRevisionWith(SCM.java:355)
          	at hudson.scm.SCM.poll(SCM.java:372)
          	at hudson.model.AbstractProject.poll(AbstractProject.java:1324)
          	at hudson.triggers.SCMTrigger$Runner.runPolling(SCMTrigger.java:420)
          	at hudson.triggers.SCMTrigger$Runner.run(SCMTrigger.java:449)
          	at hudson.util.SequentialExecutionQueue$QueueEntry.run(SequentialExecutionQueue.java:118)
          	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
          	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
          	at java.util.concurrent.FutureTask.run(FutureTask.java:166)
          	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
          	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
          	at java.lang.Thread.run(Thread.java:636)
          Caused by: hudson.plugins.git.GitException: Command "git ls-remote -h git@<server> master" returned status code 128: error: cannot run ssh: No such file or directory
          fatal: unable to fork
          
          	at hudson.plugins.git.GitAPI.launchCommandIn(GitAPI.java:759)
          	... 15 more
          

          Builds that used the ** branch specifier (we want our builds to all be master-only, this has exposed a few that aren't) have this error in their polling log:

          Started on 09/09/2011 5:10:18 PM
          Using strategy: Default
          [poll] Last Build : #38
          [poll] Last Built Revision: Revision b9af602f4250b74fbf026a92716083d9b294121e (origin/master)
          ERROR: Failed to record SCM polling
          java.lang.NullPointerException
          	at hudson.plugins.git.GitSCM.compareRemoteRevisionWith(GitSCM.java:657)
          	at hudson.scm.SCM._compareRemoteRevisionWith(SCM.java:355)
          	at hudson.scm.SCM.poll(SCM.java:372)
          	at hudson.model.AbstractProject.poll(AbstractProject.java:1324)
          	at hudson.triggers.SCMTrigger$Runner.runPolling(SCMTrigger.java:420)
          	at hudson.triggers.SCMTrigger$Runner.run(SCMTrigger.java:449)
          	at hudson.util.SequentialExecutionQueue$QueueEntry.run(SequentialExecutionQueue.java:118)
          	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
          	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
          	at java.util.concurrent.FutureTask.run(FutureTask.java:166)
          	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
          	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
          	at java.lang.Thread.run(Thread.java:636)
          

          Happy to help track these down if you need me to.

          Andrew Herron added a comment - I'm seeing some strange errors in my SCM polling logs now... These errors don't show in the UI, and don't seem to impact builds, but they're still concerning. I have 3 long builds showing an error while they are building - this doesn't happen on other active builds though: (server name removed) [poll] Last Built Revision: Revision a03190f9886de492a997a72c6525d65424618641 (origin/master) ERROR: Failed to record SCM polling hudson.plugins.git.GitException: Error performing command: git ls-remote -h git@<server> master Command "git ls-remote -h git@<server> master" returned status code 128: error: cannot run ssh: No such file or directory fatal: unable to fork at hudson.plugins.git.GitAPI.launchCommandIn(GitAPI.java:764) at hudson.plugins.git.GitAPI.launchCommand(GitAPI.java:729) at hudson.plugins.git.GitAPI.getHeadRev(GitAPI.java:1043) at hudson.plugins.git.GitSCM.compareRemoteRevisionWith(GitSCM.java:624) at hudson.scm.SCM._compareRemoteRevisionWith(SCM.java:355) at hudson.scm.SCM.poll(SCM.java:372) at hudson.model.AbstractProject.poll(AbstractProject.java:1324) at hudson.triggers.SCMTrigger$Runner.runPolling(SCMTrigger.java:420) at hudson.triggers.SCMTrigger$Runner.run(SCMTrigger.java:449) at hudson.util.SequentialExecutionQueue$QueueEntry.run(SequentialExecutionQueue.java:118) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang. Thread .run( Thread .java:636) Caused by: hudson.plugins.git.GitException: Command "git ls-remote -h git@<server> master" returned status code 128: error: cannot run ssh: No such file or directory fatal: unable to fork at hudson.plugins.git.GitAPI.launchCommandIn(GitAPI.java:759) ... 15 more Builds that used the ** branch specifier (we want our builds to all be master-only, this has exposed a few that aren't) have this error in their polling log: Started on 09/09/2011 5:10:18 PM Using strategy: Default [poll] Last Build : #38 [poll] Last Built Revision: Revision b9af602f4250b74fbf026a92716083d9b294121e (origin/master) ERROR: Failed to record SCM polling java.lang.NullPointerException at hudson.plugins.git.GitSCM.compareRemoteRevisionWith(GitSCM.java:657) at hudson.scm.SCM._compareRemoteRevisionWith(SCM.java:355) at hudson.scm.SCM.poll(SCM.java:372) at hudson.model.AbstractProject.poll(AbstractProject.java:1324) at hudson.triggers.SCMTrigger$Runner.runPolling(SCMTrigger.java:420) at hudson.triggers.SCMTrigger$Runner.run(SCMTrigger.java:449) at hudson.util.SequentialExecutionQueue$QueueEntry.run(SequentialExecutionQueue.java:118) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang. Thread .run( Thread .java:636) Happy to help track these down if you need me to.

          Andrew Herron added a comment - - edited

          ok, "ssh: no such file or directory" isn't related to build length... still happening after the builds finished.

          Looks like it's related to the slave node OS; it's happening on all builds that are set to only use our windows slaves. Turning off the fast remote polling for those builds stops the error.

          Andrew Herron added a comment - - edited ok, "ssh: no such file or directory" isn't related to build length... still happening after the builds finished. Looks like it's related to the slave node OS; it's happening on all builds that are set to only use our windows slaves. Turning off the fast remote polling for those builds stops the error.

          Uwe Stuehler added a comment -

          I think this issue is related to JENKINS-9596, at least on a high level.

          Uwe Stuehler added a comment - I think this issue is related to JENKINS-9596 , at least on a high level.
          Uwe Stuehler made changes -
          Link New: This issue is related to JENKINS-9596 [ JENKINS-9596 ]

            abayer Andrew Bayer
            docwhat Christian Höltje
            Votes:
            22 Vote for this issue
            Watchers:
            27 Start watching this issue

              Created:
              Updated:
              Resolved: