Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-62249

Unreliable authentication when using the GitHub App credentials

    XMLWordPrintable

    Details

    • Similar Issues:
    • Released As:
      github-branch-source-2.8.2, github-branch-source-2.9.0

      Description

      github-branch-source-plugin v2.7.1 allows to configure Jenkins as a GitHub App, which

      After upgrading to github-branch-source-plugin v2.7.1, I followed the GitHub app authentication guide to connect Jenkins to GitHub.

      I can successfully clone our repositories using these new credentials most of the time, but I can also observe intermittent authentication failures.

      I have some builds that manage to successfully execute `checkout scm` using the new credentials 10 times in a row, but the 11th time, it fails with:

      using GIT_ASKPASS to set credentials Jenkins as a GitHub App for the my-org organization
       > git fetch --tags --progress -- https://github.com/my-org/my-repo +refs/heads/*:refs/remotes/origin/* # timeout=10
      ERROR: Error fetching remote repo 'origin'
      hudson.plugins.git.GitException: Failed to fetch from https://github.com/my-org/my-repo
      	at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:909)
      	at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1131)
      	at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1167)
      	at org.jenkinsci.plugins.workflow.steps.scm.SCMStep.checkout(SCMStep.java:125)
      	at org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:93)
      	at org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:80)
      	at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution.lambda$start$0(SynchronousNonBlockingStepExecution.java:47)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      Caused by: hudson.plugins.git.GitException: Command "git fetch --tags --progress -- https://github.com/my-org/my-repo +refs/heads/*:refs/remotes/origin/*" returned status code 128:
      stdout: 
      stderr: 
      	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:2430)
      	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandWithCredentials(CliGitAPIImpl.java:2044)
      	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.access$500(CliGitAPIImpl.java:81)
      	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$1.execute(CliGitAPIImpl.java:569)
      	at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$GitCommandMasterToSlaveCallable.call(RemoteGitImpl.java:161)
      	at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$GitCommandMasterToSlaveCallable.call(RemoteGitImpl.java:154)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:211)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:54)
      	at hudson.remoting.Request$2.run(Request.java:369)
      	at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:117)
      	... 1 more
      

      (I don't have more details)

      Then, subsequent builds fail with a slighly better error:

      using GIT_ASKPASS to set credentials Jenkins as a GitHub App for the my-org organization
       > git fetch --tags --progress -- https://github.com/my-org/my-repo +refs/heads/*:refs/remotes/origin/* # timeout=10
      ERROR: Error fetching remote repo 'origin'
      hudson.plugins.git.GitException: Failed to fetch from https://github.com/my-org/my-repo
      	at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:909)
      	at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1131)
      	at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1167)
      	at org.jenkinsci.plugins.workflow.steps.scm.SCMStep.checkout(SCMStep.java:125)
      	at org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:93)
      	at org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:80)
      	at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution.lambda$start$0(SynchronousNonBlockingStepExecution.java:47)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      Caused by: hudson.plugins.git.GitException: Command "git fetch --tags --progress -- https://github.com/my-org/my-repo +refs/heads/*:refs/remotes/origin/*" returned status code 128:
      stdout: 
      stderr: remote: Repository not found.
      fatal: Authentication failed for 'https://github.com/my-org/my-repo/'
      
      	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:2430)
      	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandWithCredentials(CliGitAPIImpl.java:2044)
      	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.access$500(CliGitAPIImpl.java:81)
      	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$1.execute(CliGitAPIImpl.java:569)
      	at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$GitCommandMasterToSlaveCallable.call(RemoteGitImpl.java:161)
      	at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$GitCommandMasterToSlaveCallable.call(RemoteGitImpl.java:154)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:211)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:54)
      	at hudson.remoting.Request$2.run(Request.java:369)
      	at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:117)
      	... 1 more
      

      Usually, this lasts until we destroy the Jenkins worker used for this build and create a brand new one.

      There are no changes, either on Jenkins or on GitHub between that 10th build (which was successful) and the subsequent builds (which are failing). It was easily reproducible last week (it was failing after ~5 or 10 builds). I don't have any suspicious logs about these failures anymore.

        Attachments

          Issue Links

            Activity

            Hide
            allan_burdajewicz Allan BURDAJEWICZ added a comment -

            Anybody has a reproducible scenario/environment ? And could for example enable an ALL log recorder for "org.jenkinsci.plugins.github_branch_source.GitHubAppCredentials" under "Manage Jenkins > System Log" while reproducing the problem and share it here with the time when the issue occurred ? (Maybe also use the timestamper plugin to get the exact time during the build when the issue occured).

            Show
            allan_burdajewicz Allan BURDAJEWICZ added a comment - Anybody has a reproducible scenario/environment ? And could for example enable an ALL log recorder for "org.jenkinsci.plugins.github_branch_source.GitHubAppCredentials" under "Manage Jenkins > System Log" while reproducing the problem and share it here with the time when the issue occurred ? (Maybe also use the timestamper plugin to get the exact time during the build when the issue occured).
            Hide
            brialius Denis Bel added a comment -

            Hi Allan BURDAJEWICZ,

            I provided some logs in attached screenshots. (Not too much related info there)

            About reproducible scenario/environment, I can see some repeatable scenario: the next build just after fail is always success.

            We have several jobs running every 3h on number of servers and the issue can happen once per a week. I also guess that non regular build fails more often

            Show
            brialius Denis Bel added a comment - Hi Allan BURDAJEWICZ , I provided some logs in attached screenshots. (Not too much related info there) About reproducible scenario/environment, I can see some repeatable scenario: the next build just after fail is always success. We have several jobs running every 3h on number of servers and the issue can happen once per a week. I also guess that non regular build fails more often
            Hide
            mindaugas Mindaugas Laganeckas added a comment - - edited

            We have experienced 2 different errors:

            status code 128:
            [2020-10-20T06:43:23.731Z] stdout: 
            [2020-10-20T06:43:23.731Z] stderr: remote: Invalid username or password.
            [2020-10-20T06:43:23.731Z] fatal: Authentication failed for

            And

            System.InvalidOperationException: Could not find a 'develop' or 'master' branch, neither locally nor remotely.

            Since we enabled SSH checkout trait in our jobdsl, both problems have vanished. We have pinged developers in our organization 5 days after enabling SSH checkout trait and none of them have seen these 2 errors since then. We will keep an eye on the issue and report back, if anything changes.

            Here is an excerpt from our jobdsl:

                  organizations {
                     github {
                       traits {
                         gitHubSshCheckout {
                           credentialsId('github_ssh_key')
                         }
                       }
                     }
                   }

            The Jenkins docker version:

            jenkins/jenkins:2.249.2-lts-alpine

            Github-branch-source plugin 2.9.1

            Show
            mindaugas Mindaugas Laganeckas added a comment - - edited We have experienced 2 different errors: status code 128: [2020-10-20T06:43:23.731Z] stdout: [2020-10-20T06:43:23.731Z] stderr: remote: Invalid username or password. [2020-10-20T06:43:23.731Z] fatal: Authentication failed for And System .InvalidOperationException: Could not find a 'develop' or 'master' branch, neither locally nor remotely. Since we enabled SSH checkout trait in our jobdsl, both problems have vanished. We have pinged developers in our organization 5 days after enabling SSH checkout trait and none of them have seen these 2 errors since then. We will keep an eye on the issue and report back, if anything changes. Here is an excerpt from our jobdsl:       organizations {         github {           traits {             gitHubSshCheckout {               credentialsId( 'github_ssh_key' )             }           }         }       } The Jenkins docker version: jenkins/jenkins:2.249.2-lts-alpine Github-branch-source plugin 2.9.1
            Hide
            bitwiseman Liam Newman added a comment -

            Mindaugas Laganeckas
            Using ssh checkout means you're not using the GitHub App credentials to fetch from GitHub. It is good to know that the works around the issue, but is also not what most people want to do. The whole point of the GitHub App credentials is to not be passing around ssh keys.

            Hans Koster Denis Bel
            Thank you for confirming that the change in 2.9.0 reduces the frequency of occurrences of this error. I could create a custom build of the plugin that exposes these settings and let you try out different combinations.

            Also James Nord asked: I'm wondering if all those that still see it are performing the builds on the master/controller or an agent? Can you comment?

            I'm grasping at straws but do you have the system time on your Jenkins controller and agents properly sync'd to UTC? Crypto and tokens can be sensitive to time inconsistencies, right? I saw someone recently report their github app Id not working and the issue being their system clock was wrong: https://gitter.im/jenkinsci/github-branch-source-plugin?at=5f96ea06eb82301c1a4e5b7b

            That was from the API and it gave a meaningful error, but from the git command the same issue might report a less friendly error. Maybe?

            Show
            bitwiseman Liam Newman added a comment - Mindaugas Laganeckas Using ssh checkout means you're not using the GitHub App credentials to fetch from GitHub. It is good to know that the works around the issue, but is also not what most people want to do. The whole point of the GitHub App credentials is to not be passing around ssh keys. Hans Koster Denis Bel Thank you for confirming that the change in 2.9.0 reduces the frequency of occurrences of this error. I could create a custom build of the plugin that exposes these settings and let you try out different combinations. Also James Nord asked: I'm wondering if all those that still see it are performing the builds on the master/controller or an agent? Can you comment? I'm grasping at straws but do you have the system time on your Jenkins controller and agents properly sync'd to UTC? Crypto and tokens can be sensitive to time inconsistencies, right? I saw someone recently report their github app Id not working and the issue being their system clock was wrong: https://gitter.im/jenkinsci/github-branch-source-plugin?at=5f96ea06eb82301c1a4e5b7b That was from the API and it gave a meaningful error, but from the git command the same issue might report a less friendly error. Maybe?
            Hide
            bitwiseman Liam Newman added a comment - - edited

            If anyone wants to help fix this bug, more data collection is needed.

            Right now my best theory is that there is a time/replication issue on GitHub.

            How often are you seeing it?

            In github-branch-source-plugin v2.9.3, feature flags have been added that can help us analyze this (https://github.com/jenkinsci/github-branch-source-plugin/pull/363).

            Possible tests (do not use on production systems):
            1. Add a logger for "org.jenkinsci.plugins.github_branch_source.GitHubAppCredentials" at FINEST logging level and leave it on for at least long enough to see at least a couple repro's of the issue.

            2. set NOT_STALE_MINIMUM_SECONDS=1 and STALE_BEFORE_EXPIRATION_SECONDS=3600 - This will almost completely disable caching of tokens. I expect this will cause the frequency of errors to increase significantly. Capture logs of this occurring, including agent logs as well which are on a different part of the Jenkins Log page.

            3. Add AFTER_TOKEN_GENERATION_DELAY_SECONDS=5 - This will cause there to be 5 second delay whenever a new token is generated. If used with settings above this may make jobs run slower due to waiting after generating a token. Verify in your logs that the delay is turned on. See if error frequency decreases.

            Be sure to remove the flags when done.

            Show
            bitwiseman Liam Newman added a comment - - edited If anyone wants to help fix this bug, more data collection is needed. Right now my best theory is that there is a time/replication issue on GitHub. How often are you seeing it? In github-branch-source-plugin v2.9.3, feature flags have been added that can help us analyze this ( https://github.com/jenkinsci/github-branch-source-plugin/pull/363 ). Possible tests (do not use on production systems): 1. Add a logger for "org.jenkinsci.plugins.github_branch_source.GitHubAppCredentials" at FINEST logging level and leave it on for at least long enough to see at least a couple repro's of the issue. 2. set NOT_STALE_MINIMUM_SECONDS=1 and STALE_BEFORE_EXPIRATION_SECONDS=3600 - This will almost completely disable caching of tokens. I expect this will cause the frequency of errors to increase significantly. Capture logs of this occurring, including agent logs as well which are on a different part of the Jenkins Log page. 3. Add AFTER_TOKEN_GENERATION_DELAY_SECONDS=5 - This will cause there to be 5 second delay whenever a new token is generated. If used with settings above this may make jobs run slower due to waiting after generating a token. Verify in your logs that the delay is turned on. See if error frequency decreases. Be sure to remove the flags when done.

              People

              Assignee:
              bitwiseman Liam Newman
              Reporter:
              multani Jonathan Ballet
              Votes:
              12 Vote for this issue
              Watchers:
              24 Start watching this issue

                Dates

                Created:
                Updated: