Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-64579

First build on multibranch pipeline job has no SCM information

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Labels:
      None
    • Environment:
    • Similar Issues:

      Description

      Hi all,

      I’ve been trying to configure Jenkins for static code analysis with the use of Warning NG and Git Forensics plugins to execute static analysis in a scope of a Docker container (with Checkstyle, PMD and other utilities, as well as their rules bundled in). The idea is to utilize multibranch pipelines, so that when a developer creates a pull request in Bitbucket, Jenkins automatically executes a job for that branch, checks for any new issues reported by code analysis tool and then informs Bitbucket if everything is okay and pull request can be merged. Pretty standard stuff.

      Reproduction steps:

      1. Run a fresh Jenkins instance and install the required plugins (warning ng, git forensics)
      2. Add new multibranch pipeline connected to Bitbucket Cloud repository (Jenkinsfile in the attachments)
      3. Scan multibranch pipeline to run the builds for all the branches

      However, I’ve come across an implementation which seems invalid and I’m not able to find any workaround for it (hence the major bug priority). For the very first job executed for the branch or pull request, the Forensics API and Warning NG which is using it are not able to figure out what SCM is being used and cannot extract information like who introduced the new code smells:

      [CheckStyle] Creating SCM blamer to obtain author and commit information for affected files 
      [CheckStyle] SCM 'hudson.scm.NullSCM' is not of type GitSCM 
      [CheckStyle] -> Git blamer could not be created for SCM 'hudson.scm.NullSCM@71e85a6f' in working tree '/var/jenkins_home/workspace/Statistics_-_Static_Analysis_fix'

      Then, when trying to mine the repository with mine repository forensics step, I get the following output:

      [Forensics] SCM 'hudson.scm.NullSCM' is not of type GitSCM 
      [Forensics] -> Git miner could not be created for SCM 'hudson.scm.NullSCM@239301ef' in working tree '/var/jenkins_home/workspace/Statistics_-_Static_Analysis_fix'

      However, on the second run of the very same job, everything works as expected:

      [CheckStyle] Creating SCM blamer to obtain author and commit information for affected files 
      [CheckStyle] -> Git blamer successfully created in working tree '/var/jenkins_home/workspace/Statistics_-_Static_Analysis_fix' 
      [CheckStyle] Creating SCM miner to obtain statistics for affected repository files 
      [CheckStyle] -> Git miner successfully created in working tree '/var/jenkins_home/workspace/Statistics_-_Static_Analysis_fix' 
      [CheckStyle] Resolving file names for all issues in source directory '/var/jenkins_home/workspace/Statistics_-_Static_Analysis_fix' 
      [CheckStyle] -> resolved paths in source directory (27 found, 0 not found) 
      
      ... 
      
      [Forensics] Creating SCM miner to obtain statistics for affected repository files 
      [Forensics] -> Git miner successfully created in working tree '/var/jenkins_home/workspace/Statistics_-_Static_Analysis_fix' 
      [Forensics] Analyzing the commit log of the Git repository '/var/jenkins_home/workspace/Statistics_-_Static_Analysis_fix' 
      [Forensics] Invoking Git miner to create statistics for all available files 
      [Forensics] Git working tree = '/var/jenkins_home/workspace/Statistics_-_Static_Analysis_fix' 
      [Forensics] -> created statistics for 355 files 
      [Forensics] -> created report for 355 files in 20 seconds

      The issue is present for the multibranch pipelines not only when Bitbucket Cloud is selected as an SCM but also in case of generic git. The issue doesn’t happen when the standard pipeline (non-multibranch) is used - in its case SCM is properly figured out even during the first build run of the job. Unfortunately, the mentioned use case is not one where a standard pipeline could be used.

      I’ve checked the source code of Jenkins and its plugins and also debugged it a bit and here is what I found that causes the issue. In order to obtain the miner and blamer, Forensics API uses helper class called [ScmResolver|https://github.com/jenkinsci/forensics-api-plugin/blob/master/src/main/java/io/jenkins/plugins/forensics/util/ScmResolver.java]. Looks like it has a separate way of figuring out SCM from the freestyle projects (extractFromProject method) and the pipelines (extractFromPipeline method). The latter one gets the SCMs from the job with the getSCMs method from the SCMTriggerItem class. In my case, the instance is actually of a WorkflowJob type and its implementation of extracting SCMs looks as follows:]

      @Override public Collection<? extends SCM> getSCMs() {        
        WorkflowRun b = getLastSuccessfulBuild();        
        if (b == null) {            
          b = getLastCompletedBuild();        
        }        
        if (b == null) { 
          return Collections.emptySet();        
        }        
        Map<String,SCM> scms = new LinkedHashMap<>();        
        for (WorkflowRun.SCMCheckout co : b.checkouts(null) {
          scms.put(co.scm.getKey(), co.scm);        
        }        
        return scms.values();
      }
      

      I don’t know why it is like that, but figuring out the SCM is made on the base of previously executed builds. This explains why during the very first build it cannot determine the SCM, but on the other hand - couldn’t there be a different way to get that information? I’ve attached the debugger to Jenkins instance and analyzed this snippet of code in the workflow-job-plugin library. I’ve got the following result - WorkflowJob returning empty set of SCMs because there were no builds made before:

      On the other hand, looking on the content of WorkflowJob instance, there is a GitSCM instance dangling somewhere in its properties:

      It’s in the instance of [BranchJobProperty|https://github.com/jenkinsci/workflow-multibranch-plugin/blob/master/src/main/java/org/jenkinsci/plugins/workflow/multibranch/BranchJobProperty.java] class from the workflow multibranch plugin.

      It makes it impossible to simply apply the solution in the form “if-multibranch-then-get-from-property” without diving into reflection mess due to the fact the workflow-multibranch plugin uses workflow-job as a library and has access to its libraries, but not vice versa.

      What I think could help is either change how Forensics plugin’s ScmResolver finds out about the multibranch pipeline’s SCM or modify the code in WorkflowJob to not give up in case when there were no jobs executed before and somehow get that information. However, although both solutions could be implemented in such a way that it wouldn’t affect previous behavior, I don’t know which one would be more reasonable to make modifications in.

        Attachments

          Activity

          Hide
          drulli Ulli Hafner added a comment - - edited

          Thanks for creating such a detailed report!

          Yes, I marked it as a TODO to use the branch API if possible in ReferenceRecorder. The same idea could help here as well.

          I think what is needed is something like

          SCMSource.SourceByItem.findSource(job);
          

          (We used that approach in the GitHub checks API plugin https://github.com/jenkinsci/github-checks-plugin/blob/master/src/main/java/io/jenkins/plugins/checks/github/SCMFacade.java)

          Show
          drulli Ulli Hafner added a comment - - edited Thanks for creating such a detailed report! Yes, I marked it as a TODO to use the branch API if possible in ReferenceRecorder . The same idea could help here as well. I think what is needed is something like SCMSource.SourceByItem.findSource(job); (We used that approach in the GitHub checks API plugin https://github.com/jenkinsci/github-checks-plugin/blob/master/src/main/java/io/jenkins/plugins/checks/github/SCMFacade.java )
          Hide
          konpon96 Konrad Ponichtera added a comment - - edited

          Thank you for the tip, the mention of branch API and the SCMSource existence was exactly what I needed!

          I’ve checked your code from the GitHub checks API plugin and it was just what I needed for implementing the workaround attached below. However, due to the fact that other plugins still work with Hudson's SCM class and that I’m not exactly familiar with the refactors going on (it’s my first dive into Jenkins’ code), I don’t want to risk turning the code base upside down. However, I’ve come up with such a snippet implemented in the workflow-plugin’s WorkflowJob which could mitigate the issue until the plugin is fully migrated to branch API:

          @Override public Collection<? extends SCM> getSCMs() {
              WorkflowRun b = getLastSuccessfulBuild();
              if (b == null) {
                  b = getLastCompletedBuild();
              }
          
              Map<String,SCM> scms = new LinkedHashMap<>();
          
              if (b != null) {
                  for (WorkflowRun.SCMCheckout co : b.checkouts(null)) {
                      scms.put(co.scm.getKey(), co.scm);
                  }
              } else {
                  SCMSource source = SCMSource.SourceByItem.findSource(this);
                  SCMHead head = SCMHead.HeadByItem.findHead(this);
          
                  if (source != null && head != null) {
                      SCM scm = source.build(head);
                      scms.put(scm.getKey(), scm);
                  }
              }
          
              return scms.values();
          }
          

          The behavior is just as I mentioned in the issue description - instead of giving up on the SCM search when no builds are found, it uses branch API to figure out SCMSource and SCMHead which is then used to build an instance of SCM. I’ve used the SCMSource.build(SCMHead) method instead of SCMSource.build(SCMHead, SCMRevision) since I didn’t figure out how to obtain the revision object.

          The other change I had to do is to replace the test dependency on scm-api with compile dependency on branch-api be removing this:

          <dependency>            
            <groupId>org.jenkins-ci.plugins</groupId>            
            <artifactId>scm-api</artifactId>            
            <scope>test</scope>        
          </dependency>

          And adding the following one:

          <dependency>            
            <groupId>org.jenkins-ci.plugins</groupId>            
            <artifactId>branch-api</artifactId>        
          </dependency>

          The branch API already includes scm-api as its dependency, so all of its classes are available.

          On the other hand, I see that the SCMFacade class from the Github checks plugin is in large part a copy paste of ScmResolver from the Forensics API plugin, which implies that I should rather try modifying the Forensics API plugin than the workflow-job one. Especially that the workflow-job plugin’s development looks kinda stalled - the last commit I see that modified something else than pom.xml or test classes dates back to September 2020, implying that it’s not really a part that is to be altered.

          The questions I have are following questions:

          1. Should the quickfix be implemented on the Workflow Job plugin side or rather in the Forensics API?
          2. Is it fine to replace the scm-api testing dependency with the branch-api compile one?
          Show
          konpon96 Konrad Ponichtera added a comment - - edited Thank you for the tip, the mention of branch API and the SCMSource existence was exactly what I needed! I’ve checked your code from the GitHub checks API plugin and it was just what I needed for implementing the workaround attached below. However, due to the fact that other plugins still work with Hudson's  SCM class and that I’m not exactly familiar with the refactors going on (it’s my first dive into Jenkins’ code), I don’t want to risk turning the code base upside down. However, I’ve come up with such a snippet implemented in the workflow-plugin’s WorkflowJob which could mitigate the issue until the plugin is fully migrated to branch API: @Override public Collection<? extends SCM> getSCMs() { WorkflowRun b = getLastSuccessfulBuild(); if (b == null ) { b = getLastCompletedBuild(); } Map< String ,SCM> scms = new LinkedHashMap<>(); if (b != null ) { for (WorkflowRun.SCMCheckout co : b.checkouts( null )) { scms.put(co.scm.getKey(), co.scm); } } else { SCMSource source = SCMSource.SourceByItem.findSource( this ); SCMHead head = SCMHead.HeadByItem.findHead( this ); if (source != null && head != null ) { SCM scm = source.build(head); scms.put(scm.getKey(), scm); } } return scms.values(); } The behavior is just as I mentioned in the issue description - instead of giving up on the SCM search when no builds are found, it uses branch API to figure out SCMSource and SCMHead which is then used to build an instance of SCM . I’ve used the SCMSource.build(SCMHead) method instead of SCMSource.build(SCMHead, SCMRevision) since I didn’t figure out how to obtain the revision object. The other change I had to do is to replace the test dependency on scm-api with compile dependency on branch-api be removing this: <dependency>             <groupId>org.jenkins-ci.plugins</groupId>             <artifactId>scm-api</artifactId>             <scope>test</scope>         </dependency> And adding the following one: <dependency>             <groupId>org.jenkins-ci.plugins</groupId>             <artifactId>branch-api</artifactId>         </dependency> The branch API already includes scm-api as its dependency, so all of its classes are available. On the other hand, I see that the SCMFacade class from the Github checks plugin is in large part a copy paste of ScmResolver from the Forensics API plugin, which implies that I should rather try modifying the Forensics API plugin than the workflow-job one. Especially that the workflow-job plugin’s development looks kinda stalled - the last commit I see that modified something else than pom.xml or test classes dates back to September 2020, implying that it’s not really a part that is to be altered. The questions I have are following questions: Should the quickfix be implemented on the Workflow Job plugin side or rather in the Forensics API? Is it fine to replace the scm-api testing dependency with the branch-api compile one?
          Hide
          konpon96 Konrad Ponichtera added a comment -

          Okay, nevermind, I just noticed that on Thursday you've made a fix to Forensics API's ScmResolver and released version 0.8.1 with it.  I've already tested it briefly and can confirm that the problem doesn't occur anymore for the mentioned use case - after scanning multibranch pipeline and creating a new job for the newly-created branch, the very first build ends with success and all the stuff that uses Forensics API (Checkstyle blaming, mining repository) passes without ending up with NullSCM.

          Before the fix - the last step where the mining was supposed to be lasted for 80ms which was basically reporting that NullSCM cannot be used:

          After the fix - mining takes now 7 minutes as it should (it actually takes longer than it took before Forensics API 0.8.0 update, but if that's an issue then it's not related to the one here):

          I will do further testing tomorrow, thank you for taking care of this problem so fast!

          Show
          konpon96 Konrad Ponichtera added a comment - Okay, nevermind, I just noticed that on Thursday you've made a fix to Forensics API's ScmResolver  and released version 0.8.1 with it.   I've already tested it briefly and can confirm that the problem doesn't occur anymore for the mentioned use case - after scanning multibranch pipeline and creating a new job for the newly-created branch, the very first build ends with success and all the stuff that uses Forensics API (Checkstyle blaming, mining repository) passes without ending up with NullSCM . Before the fix - the last step where the mining was supposed to be lasted for 80ms which was basically reporting that NullSCM cannot be used: After the fix - mining takes now 7 minutes as it should (it actually takes longer than it took before Forensics API 0.8.0 update, but if that's an issue then it's not related to the one here): I will do further testing tomorrow, thank you for taking care of this problem so fast!
          Hide
          drulli Ulli Hafner added a comment - - edited

          Ah, good to see that the other change also helps in your case. I actually wasn't sure if for branch sources the run gets the SCM attached as well.

          Show
          drulli Ulli Hafner added a comment - - edited Ah, good to see that the other change also helps in your case. I actually wasn't sure if for branch sources the run gets the SCM attached as well.
          Hide
          konpon96 Konrad Ponichtera added a comment -

          Okay, I did some more testing and didn't stumble upon anything that looks even remotely broken.

          I think that this issue can be closed.

          Show
          konpon96 Konrad Ponichtera added a comment - Okay, I did some more testing and didn't stumble upon anything that looks even remotely broken. I think that this issue can be closed.

            People

            Assignee:
            drulli Ulli Hafner
            Reporter:
            konpon96 Konrad Ponichtera
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: