Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-72047

scm-filter-jervis gives up after 1 API request to GitHub which can lead to missed webhooks

    • 2.0-66.vc21d0c1d936d

      Bug description

      Webhooks get received by Jenkins but do not create jobs or start builds. This only happens sometimes.

      Other info

      I noticed clock drift on GitHub servers but it wasn't a factor.

      I verified GitHub API servers have about a 12 second clock drift currently compared to time.gov.

      We've been having several webhooks issues and I'm suspicious about the clock differences (I haven't nailed down a specific bug in code, yet).

      For example, GitHub will send a webhook at 22:07:04 and Jenkins will process the hook payload with signature verification at 22:07:03. No builds trigger for this clock difference and the log is missing from the multibranch pipeline event log.

      However, if I close and re-open the pull request to trigger another webhook its timestamps are in chronological order and succeed. Is it possible there's a clock drift bug in code? I'm still struggling to track it down with traces.

      Custom loggers

      I installed the support-core plugin and created a custom logger named "GitHub webhooks debugging".

      I have logging enabled for the following classes currently (level ALL):

      com.cloudbees.jenkins.GitHubWebHook
      org.jenkinsci.plugins.github.webhook.WebhookManager
      org.jenkinsci.plugins.github.admin.GitHubHookRegisterProblemMonitor
      org.jenkinsci.plugins.github.webhook.subscriber.DefaultPushGHEventSubscriber
      org.jenkinsci.plugins.github.webhook.subscriber.PingGHEventSubscriber
      org.jenkinsci.plugins.github.webhook.GHEventHeader$PayloadHandler
      org.jenkinsci.plugins.github.webhook.GHEventPayload$PayloadHandler
      org.jenkinsci.plugins.github.webhook.GHWebhookSignature
      org.jenkinsci.plugins.github.webhook.RequirePostWithGHHookPayload$Processor
      org.jenkinsci.plugins.workflow.job.properties.PipelineTriggersJobProperty
      org.jenkinsci.plugins.github_branch_source.GitHubRepositoryEventSubscriber
      org.jenkinsci.plugins.github_branch_source.PushGHEventSubscriber
      org.jenkinsci.plugins.github_branch_source.PullRequestGHEventSubscriber
      org.jenkinsci.plugins.workflow.multibranch.WorkflowMultiBranchProject
      jenkins.branch.buildstrategies.basic.TagBuildStrategyImpl
      jenkins.branch.buildstrategies.basic.ChangeRequestBuildStrategyImpl
      jenkins.scm.api.SCMHeadEvent
      jenkins.branch.MultiBranchProject
      

      I'm able to trace webhook events from GitHub to Jenkins and inside of Jenkins: pull request event, payload received, signature verification succeeded.

      However, the trail stops at signature verification and there's no multibranch pipeline event log. If I retry it goes through all of the above and an event shows up in multibranch pipeline event log with a build being started.

      Sample job

      See attachment sample-job.xml

      Jenkins war and plugin versions

      See dependencies.gradle and the companion comment "How to reproduce" in the comments section of this issue.
       

          [JENKINS-72047] scm-filter-jervis gives up after 1 API request to GitHub which can lead to missed webhooks

          Sam Gleske created issue -

          Sam Gleske added a comment - - edited

          Example

          curl -sI https://api.github.com/meta | grep -F date:; date
          
          date: Thu, 21 Sep 2023 00:04:33 GMT
          Wed Sep 20 20:04:53 EDT 2023
          

          The second timestamp matches time.gov

          It's worth noting that some GitHub server timestamps are wildly different and do not appear to be synchonized on GitHub's end (while some do appear to be synchronized). I filed a support ticket for this with GitHub already but the dropped builds are a bug concern which is why I filed this ticket.

          Sam Gleske added a comment - - edited Example curl -sI https://api.github.com/meta | grep -F date:; date date: Thu, 21 Sep 2023 00:04:33 GMT Wed Sep 20 20:04:53 EDT 2023 The second timestamp matches time.gov It's worth noting that some GitHub server timestamps are wildly different and do not appear to be synchonized on GitHub's end (while some do appear to be synchronized). I filed a support ticket for this with GitHub already but the dropped builds are a bug concern which is why I filed this ticket.

          Sam Gleske added a comment -

          Clock drift between GitHub and NIST was up to 20 seconds last night when I was investigating this.

          Sam Gleske added a comment - Clock drift between GitHub and NIST was up to 20 seconds last night when I was investigating this.
          Sam Gleske made changes -
          Description Original: I verified GitHub API servers have about a 12 second clock drift currently compared to time.gov.

          We've been having several webhooks issues and I'm suspicious about the clock differences (I haven't nailed down a specific bug in code, yet).

          For example, GitHub will send a webhook at 22:07:04 and Jenkins will process the hook payload with signature verification at 22:07:03. No builds trigger for this clock difference and the log is missing from the multibranch pipeline event log.

          However, if I close and re-open the pull request to trigger another webhook its timestamps are in chronological order and succeed. Is it possible there's a clock drift bug in code? I'm still struggling to track it down with traces.

          I have logging enabled for the following classes currently (level ALL):

          {noformat}
          com.cloudbees.jenkins.GitHubWebHook
          org.jenkinsci.plugins.github.webhook.WebhookManager
          org.jenkinsci.plugins.github.admin.GitHubHookRegisterProblemMonitor
          org.jenkinsci.plugins.github.webhook.subscriber.DefaultPushGHEventSubscriber
          org.jenkinsci.plugins.github.webhook.subscriber.PingGHEventSubscriber
          org.jenkinsci.plugins.github.webhook.GHEventHeader$PayloadHandler
          org.jenkinsci.plugins.github.webhook.GHEventPayload$PayloadHandler
          org.jenkinsci.plugins.github.webhook.GHWebhookSignature
          org.jenkinsci.plugins.github.webhook.RequirePostWithGHHookPayload$Processor
          org.jenkinsci.plugins.workflow.job.properties.PipelineTriggersJobProperty
          {noformat}

          I'm able to trace webhook events from GitHub to Jenkins and inside of Jenkins: pull request event, payload received, signature verification succeeded.

          However, the trail stops at signature verification and there's no multibranch pipeline event log. If I retry it goes through all of the above and an event shows up in multibranch pipeline event log with a build being started.
          New: I verified GitHub API servers have about a 12 second clock drift currently compared to time.gov.

          We've been having several webhooks issues and I'm suspicious about the clock differences (I haven't nailed down a specific bug in code, yet).

          For example, GitHub will send a webhook at 22:07:04 and Jenkins will process the hook payload with signature verification at 22:07:03. No builds trigger for this clock difference and the log is missing from the multibranch pipeline event log.

          However, if I close and re-open the pull request to trigger another webhook its timestamps are in chronological order and succeed. Is it possible there's a clock drift bug in code? I'm still struggling to track it down with traces.

          I have logging enabled for the following classes currently (level ALL):

          {noformat}
          com.cloudbees.jenkins.GitHubWebHook
          org.jenkinsci.plugins.github.webhook.WebhookManager
          org.jenkinsci.plugins.github.admin.GitHubHookRegisterProblemMonitor
          org.jenkinsci.plugins.github.webhook.subscriber.DefaultPushGHEventSubscriber
          org.jenkinsci.plugins.github.webhook.subscriber.PingGHEventSubscriber
          org.jenkinsci.plugins.github.webhook.GHEventHeader$PayloadHandler
          org.jenkinsci.plugins.github.webhook.GHEventPayload$PayloadHandler
          org.jenkinsci.plugins.github.webhook.GHWebhookSignature
          org.jenkinsci.plugins.github.webhook.RequirePostWithGHHookPayload$Processor
          org.jenkinsci.plugins.workflow.job.properties.PipelineTriggersJobProperty
          org.jenkinsci.plugins.github_branch_source.GitHubRepositoryEventSubscriber
          org.jenkinsci.plugins.github_branch_source.PushGHEventSubscriber
          org.jenkinsci.plugins.github_branch_source.PullRequestGHEventSubscriber
          org.jenkinsci.plugins.workflow.multibranch.WorkflowMultiBranchProject
          {noformat}

          I'm able to trace webhook events from GitHub to Jenkins and inside of Jenkins: pull request event, payload received, signature verification succeeded.

          However, the trail stops at signature verification and there's no multibranch pipeline event log. If I retry it goes through all of the above and an event shows up in multibranch pipeline event log with a build being started.

          Sam Gleske added a comment -

          Updated my logger in description

          Sam Gleske added a comment - Updated my logger in description

          Sam Gleske added a comment -

          I have verified this bug; I'll post details soon

          Sam Gleske added a comment - I have verified this bug; I'll post details soon

          Sam Gleske added a comment -

          I have attached sample-job.xml (which is a config.xml) that reproduces the bug so you know what configurations are at play.

          github-branch-source@1704.ias_vd5a_2b_29c6cdc.1 plugin is a custom fork of 1703.vd5a_2b_29c6cdc with patch applied from https://github.com/jenkinsci/github-branch-source-plugin/pull/653

          I will also attach a list of plugins I'm using.

          Sam Gleske added a comment - I have attached sample-job.xml (which is a config.xml) that reproduces the bug so you know what configurations are at play. github-branch-source@1704.ias_vd5a_2b_29c6cdc.1 plugin is a custom fork of 1703.vd5a_2b_29c6cdc with patch applied from https://github.com/jenkinsci/github-branch-source-plugin/pull/653 I will also attach a list of plugins I'm using.
          Sam Gleske made changes -
          Attachment New: sample-job.xml [ 61173 ]

          Sam Gleske added a comment -

          The temporary workaround

          Before I dive into details I found a temporary workaround. GitHub clocks being out of sync required delaying between payload processing and triggering multibranch pipeline builds. This was achieved via the following system property.

          -Dorg.jenkinsci.plugins.github_branch_source.GitHubSCMSource.eventDelaySeconds=22
          

          I had to restart Jenkins. I would like to change this property (specifically the static method getEventDelaySeconds()) to return the property or fall back to static value so that it can be changed without restart to runtime.

          Why does it work?

          GitHub servers were out of sync. Jenkins processed multibranch events BEFORE GitHub sent webhook payloads. This triggered a bug (I've yet to find in source but now I have an idea).

          By forcing a delay the Jenkins controller system clock has a chance to catch up to the payload event so that multibranch pipeline events are processed AFTER the hook payload timestamp.

          Sam Gleske added a comment - The temporary workaround Before I dive into details I found a temporary workaround. GitHub clocks being out of sync required delaying between payload processing and triggering multibranch pipeline builds. This was achieved via the following system property. -Dorg.jenkinsci.plugins.github_branch_source.GitHubSCMSource.eventDelaySeconds=22 I had to restart Jenkins. I would like to change this property (specifically the static method getEventDelaySeconds()) to return the property or fall back to static value so that it can be changed without restart to runtime. Why does it work? GitHub servers were out of sync. Jenkins processed multibranch events BEFORE GitHub sent webhook payloads. This triggered a bug (I've yet to find in source but now I have an idea). By forcing a delay the Jenkins controller system clock has a chance to catch up to the payload event so that multibranch pipeline events are processed AFTER the hook payload timestamp.

          Sam Gleske added a comment -

          Sample logs

          I've narrowed down the issue to branch matchers. This gives me a specific avenue of source code to review

          Failed webhook trace log

          [Fri Sep 22 14:41:34 GMT 2023] Received Push event for tag 1.0.146 in repository **REDACTED ORG**/**REDACTED REPO** CREATED event from **REDACTED IP ADDRESS** ⇒ https://jenkins-webhooks.REDACTED.net/github-webhook/ with timestamp Fri Sep 22 14:41:29 GMT 2023
          

          This means:

          • GitHub sent webhook.
          • Jenkins received webhook.
          • Jenkins processed payload and successfully verified the signature to return 200 status to GitHub.
          • Jenkins multibranch pipeline processed the event
          • Nothing happened. No jobs created or builds started.

          Successful trace log after replay webhook

          [Fri Sep 22 16:20:03 GMT 2023] Received Push event for tag 1.0.146 in repository **REDACTED ORG**/**REDACTED REPO** CREATED event from **REDACTED IP ADDRESS** ⇒ https://jenkins-webhooks.REDACTED.net/github-webhook/ with timestamp Fri Sep 22 16:19:58 GMT 2023
          Found match against Reporting-Platform/fantastic-signals-sso (new branch 1.0.146)
          
          [Fri Sep 22 16:20:05 GMT 2023] Finished processing Push event for tag 1.0.146 in repository **REDACTED ORG**/**REDACTED REPO** CREATED event from **REDACTED IP ADDRESS** ⇒ https://jenkins-webhooks.REDACTED.net/github-webhook/ with timestamp Fri Sep 22 16:19:58 GMT 2023, processed in 1746ms. Matched 1.
          

          This means:

          • GitHub sent webhook.
          • Jenkins received webhook.
          • Jenkins processed payload and successfully verified the signature to return 200 status to GitHub.
          • Jenkins central multibranch pipeline processed the event. Found a match.
          • Jenkins central multibranch pipeline processed the event and notified the multibranch pipeline job.
          • Jenkins multibranch pipeline job successfully processed the event against branch matchers and created a Jenkins job for a GitHub tag.
          • Jenkins automatically started a build for the GitHub tag.

          Sam Gleske added a comment - Sample logs I've narrowed down the issue to branch matchers. This gives me a specific avenue of source code to review Failed webhook trace log [Fri Sep 22 14:41:34 GMT 2023] Received Push event for tag 1.0.146 in repository **REDACTED ORG**/**REDACTED REPO** CREATED event from **REDACTED IP ADDRESS** ⇒ https://jenkins-webhooks.REDACTED.net/github-webhook/ with timestamp Fri Sep 22 14:41:29 GMT 2023 This means: GitHub sent webhook. Jenkins received webhook. Jenkins processed payload and successfully verified the signature to return 200 status to GitHub. Jenkins multibranch pipeline processed the event Nothing happened. No jobs created or builds started. Successful trace log after replay webhook [Fri Sep 22 16:20:03 GMT 2023] Received Push event for tag 1.0.146 in repository **REDACTED ORG**/**REDACTED REPO** CREATED event from **REDACTED IP ADDRESS** ⇒ https://jenkins-webhooks.REDACTED.net/github-webhook/ with timestamp Fri Sep 22 16:19:58 GMT 2023 Found match against Reporting-Platform/fantastic-signals-sso (new branch 1.0.146) [Fri Sep 22 16:20:05 GMT 2023] Finished processing Push event for tag 1.0.146 in repository **REDACTED ORG**/**REDACTED REPO** CREATED event from **REDACTED IP ADDRESS** ⇒ https://jenkins-webhooks.REDACTED.net/github-webhook/ with timestamp Fri Sep 22 16:19:58 GMT 2023, processed in 1746ms. Matched 1. This means: GitHub sent webhook. Jenkins received webhook. Jenkins processed payload and successfully verified the signature to return 200 status to GitHub. Jenkins central multibranch pipeline processed the event. Found a match. Jenkins central multibranch pipeline processed the event and notified the multibranch pipeline job. Jenkins multibranch pipeline job successfully processed the event against branch matchers and created a Jenkins job for a GitHub tag. Jenkins automatically started a build for the GitHub tag.

            sag47 Sam Gleske
            sag47 Sam Gleske
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: