Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-73304

Sudden SCM checkout break on all agents with java.lang.NoSuchMethodError for gitclient

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • git-client-plugin
    • LTS Version: 2.426.3
      OpenJDK Version: 17.0.6+10-0ubuntu1~20.04.1
      Git plugin Version: 5.2.1
      Controller OS: Debian Bookworm
      Agents (ephemeral docker based): Ubuntu Focal

      Greetings. So this issue looks similar to https://issues.jenkins.io/browse/JENKINS-38072 in presentation... it's happened to us twice in the past 6 weeks and both times presented suddenly with builds unable to get past the checkout scm stage across all agents.

      00:00:02.569  [GitHub Checks] GitHub check (name: Jenkins, status: in_progress) has been published.
      00:00:03.252  [Pipeline] Start of Pipeline
      00:00:05.278  [Pipeline] node
      00:00:05.609  Running on docker-nodename-f4a88e28-2870-11ef-8344-da94012290d4 in /workspace/jenkins-agent/workspace/ring_reponame-stack_PR-2300
      00:00:05.622  [Pipeline] {
      00:00:06.350  [Pipeline] stage
      00:00:06.372  [Pipeline] { (Declarative: Checkout SCM)
      00:00:07.247  [Pipeline] checkout
      00:00:07.285  The recommended git tool is: git
      00:00:07.898  [Pipeline] }
      00:00:08.011  [Pipeline] // stage
      00:00:08.121  [Pipeline] }
      00:00:08.511  [Pipeline] // node
      00:00:08.777  [Pipeline] End of Pipeline
      00:00:09.026  Also:   hudson.remoting.Channel$CallSiteStackTrace: Remote call to JNLP4-connect connection from hostname.of.agent.pool/10.xxx.xxx.xxx:44604
      00:00:09.026          at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1787)
      00:00:09.026          at hudson.remoting.UserRequest$ExceptionResponse.retrieve(UserRequest.java:356)
      00:00:09.026          at hudson.remoting.Channel.call(Channel.java:1003)
      00:00:09.026          at hudson.FilePath.act(FilePath.java:1230)
      00:00:09.026          at hudson.FilePath.act(FilePath.java:1219)
      00:00:09.026          at org.jenkinsci.plugins.gitclient.Git.getClient(Git.java:138)
      00:00:09.026          at hudson.plugins.git.GitSCM.createClient(GitSCM.java:916)
      00:00:09.026          at hudson.plugins.git.GitSCM.createClient(GitSCM.java:847)
      00:00:09.026          at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1294)
      00:00:09.026          at org.jenkinsci.plugins.workflow.steps.scm.SCMStep.checkout(SCMStep.java:129)
      00:00:09.026          at org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:97)
      00:00:09.026          at org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:84)
      00:00:09.026          at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution.lambda$start$0(SynchronousNonBlockingStepExecution.java:47)
      00:00:09.026          at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
      00:00:09.026          at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
      00:00:09.026          at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
      00:00:09.026          at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
      00:00:09.026          at java.base/java.lang.Thread.run(Thread.java:840)
      00:00:09.026  java.lang.NoSuchMethodError: 'void hudson.plugins.git.GitAPI.setHostKeyFactory(org.jenkinsci.plugins.gitclient.verifier.HostKeyVerifierFactory)'
      00:00:09.026      at org.jenkinsci.plugins.gitclient.Git$GitAPIMasterToSlaveFileCallable.invoke(Git.java:208)
      00:00:09.026      at org.jenkinsci.plugins.gitclient.Git$GitAPIMasterToSlaveFileCallable.invoke(Git.java:176)
      00:00:09.026      at hudson.FilePath$FileCallableWrapper.call(FilePath.java:3616)
      00:00:09.026      at hudson.remoting.UserRequest.perform(UserRequest.java:211)
      00:00:09.026      at hudson.remoting.UserRequest.perform(UserRequest.java:54)
      00:00:09.026      at hudson.remoting.Request$2.run(Request.java:377)
      00:00:09.026      at hudson.remoting.InterceptingExecutorService.lambda$wrap$0(InterceptingExecutorService.java:78)
      00:00:09.026      at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
      00:00:09.026      at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
      00:00:09.026      at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
      00:00:09.026      at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:125)
      00:00:09.026      at java.base/java.lang.Thread.run(Thread.java:833)
      00:00:09.026  Also:   org.jenkinsci.plugins.workflow.actions.ErrorAction$ErrorId: ed8f4997-9fb2-44fa-9777-32a06a529b03
      00:00:09.026  Caused: java.io.IOException: Remote call on JNLP4-connect connection from hostname.of.agent.pool/10.xxx.xxx.xxx:44604 failed
      00:00:09.026      at hudson.remoting.Channel.call(Channel.java:1007)
      00:00:09.026      at hudson.FilePath.act(FilePath.java:1230)
      00:00:09.026      at hudson.FilePath.act(FilePath.java:1219)
      00:00:09.026      at org.jenkinsci.plugins.gitclient.Git.getClient(Git.java:138)
      00:00:09.026      at hudson.plugins.git.GitSCM.createClient(GitSCM.java:916)
      00:00:09.026      at hudson.plugins.git.GitSCM.createClient(GitSCM.java:847)
      00:00:09.026      at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1294)
      00:00:09.026      at org.jenkinsci.plugins.workflow.steps.scm.SCMStep.checkout(SCMStep.java:129)
      00:00:09.026      at org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:97)
      00:00:09.026      at org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:84)
      00:00:09.026      at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution.lambda$start$0(SynchronousNonBlockingStepExecution.java:47)
      00:00:09.026      at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
      00:00:09.026      at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
      00:00:09.026      at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
      00:00:09.026      at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
      00:00:09.026      at java.base/java.lang.Thread.run(Thread.java:840)
      00:00:09.604  [GitHub Checks] GitHub check (name: Jenkins, status: completed) has been published.
      00:00:10.408
      00:00:10.408  GitHub has been notified of this commit’s build result
      00:00:10.408
      00:00:10.408  Finished: FAILURE
       

      No recent changes have been deployed to our controller images, agent images, or CasC to explain it.

      It was resolved by restarting the controller, but due to the fact that we've had this a second time we would like to get to the bottom of it and to move towards ideally a preventative action or less ideally a corrective action that is quicker than a controller restart for us, which can take an hour before the UI is fully back to operational.

      There is also the fact that the presentation of this issue doesn't trigger healthcheck failure on either controller or agent instances so we can clock a few hours of service interruption with time taken to get notified of build failures, investigate/identify errors and restart controller.

      Our agents are all ephemeral, docker based agents connecting over JNLP. After a build finishes on an agent, they get automatically removed as nodes from the controller causing the agent instance to restart. When an instance starts up again, it is from an identical image and there is no filesystem persistence so it's a clean slate.
      It connects to the controller to register a node under a new name derived from the task id, and then launches the agent java process to connect to the controller. Custom code in form of a groovy plugin and agent side jenkins API client is responsible for this process including removing the old agents.

      So with our ephemeral setup in mind are there any ideas as to what would be causing this and whether there are steps that could be taken to prevent reoccurrence or correct the occasional reoccurrence without a full controller restart?
      Controller uptime would have been between 12-36 hours old, agent lifetime is only between a couple minutes to couple hours max, and no changes to either the controller or agent images had been deployed recently nor had the CasC been modified.

      Let me know if there is any more information I should provide.

       

       

            Unassigned Unassigned
            krzwrd Christopher
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: