Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-36547

Queue.Task.getFullDisplayName is a poor choice of key for LoadBalancer.CONSISTENT_HASH

    • jenkins 2.145, workflow-durable-task-step 2.29

      The default LoadBalancer, CONSISTENT_HASH, tries to reassign the same node to repeated builds of a given project, so as to reuse workspaces whenever possible. However to identify the project it uses Queue.Task.getFullDisplayName as seen here. For an AbstractProject this works OK, since this method will be implemented in AbstractItem and remains pretty stable across buildsā€”at least so long as you do not rename the project or a containing folder!

      For Pipeline builds it does not work at all, since the PlaceholderTask implementation encodes the build number. Thus, the key to the load balancer changes with every build, and there is no affinity.

      Queue.Task.getName used to be documented to serve this function but it is no different in the case of PlaceholderTask, and worse in the case of AbstractProject (since it throws out the project path, though it is more resilient to display name changes). getUrl sounds like a better key, but again for PlaceholderTask it is intentionally sensitive to the Run's build number, so that you can click on a queue item and see which Pipeline build it was coming from, not just which job. And Task has no getFullName, oddly.

      The load balancer could perhaps call getOwnerTask before using getFullDisplayName. This would have no effect on AbstractProject, but for PlaceholderTask would let it delegate to WorkflowJob, so the affinity would work tolerably well, at least so long as your build has only a single node blockā€”for multiple blocks it will tend to break down. There is also some risk of mangling affinity calculations for other SubTask implementations with different needs.

      What we really need is an extended API for a Task which can specify a dedicated affinity key directly, not overloaded by UI elements. AbstractProject should implement it to return getFullName. PlaceholderTask could implement this at a minimum to return WorkflowJob.getFullName, but could also do better by looking up an enclosing label for the node as in JENKINS-26132 and concatenating that with the job name, so that for example a script like

      stage('Build') {
        node('unix') {
          git '.../main-sources'
          sh './build.sh'
          stash 'binaries'
        }
      }
      stage('Test') {
        parallel unixTests: {
          node('unix') {
            git '.../tests'
            unstash 'binaries'
            sh './test.sh'
          }
        }, windowsTests: {
          node('windows') {
            git '.../tests'
            unstash 'binaries'
            bat 'test.bat'
          }
        }
      }
      stage('Deploy') {
        node('unix') {
          git '.../tools'
          unstash 'binaries'
          sh './deploy.sh'
        }
      }
      

      would result in four affinity keys being used: prjname#Build, prhname#unixTests, prjname#windowsTests, prjname#Deploy, each resulting in distinct persistent workspaces.

          [JENKINS-36547] Queue.Task.getFullDisplayName is a poor choice of key for LoadBalancer.CONSISTENT_HASH

          John Calsbeek added a comment -

          I'll throw in a +1 for this! We similarly had a "regression" switching to pipeline because of the lack of stable assignment.

          John Calsbeek added a comment - I'll throw in a +1 for this! We similarly had a "regression" switching to pipeline because of the lack of stable assignment.

          Paul van der Ende added a comment - - edited

          Same issue here. Besides that our incremental builds suffer from poor locality because the pipeline always start on a different node then last time.

          Paul van der Ende added a comment - - edited Same issue here. Besides that our incremental builds suffer from poor locality because the pipeline always start on a different node then last time.

          Jesse Glick added a comment -

          As mentioned parenthetically in the PR for JENKINS-48348, the implementation of getFullDisplayName may also be more expensive than we would want for calls from Queue.maintain. A dedicated method can be optimized to distinct constraints.

          Jesse Glick added a comment - As mentioned parenthetically in the PR for  JENKINS-48348 , the implementation of getFullDisplayName may also be more expensive than we would want for calls from Queue.maintain . A dedicated method can be optimized to distinct constraints.

          jglick: Thanks for mentioning. We can see in our instance allocating a node takes between 10 - 30s this may be caused by this. I'd like to give it a try - however I'm unsure how to improve the algorithm for speed. Should I open a separate ticket for this?

          Maybe in a first step there could be the fix for the affinity and in the seconds step there could be a performance improvement.

          Joerg Schwaerzler added a comment - jglick : Thanks for mentioning. We can see in our instance allocating a node takes between 10 - 30s this may be caused by this. I'd like to give it a try - however I'm unsure how to improve the algorithm for speed. Should I open a separate ticket for this? Maybe in a first step there could be the fix for the affinity and in the seconds step there could be a performance improvement.

          For backward compatibility of the jenkins core to older plugins I'd propose to do a default implementation of - let's say - getAffinityKey() in Queue.Task.

          Joerg Schwaerzler added a comment - For backward compatibility of the jenkins core to older plugins I'd propose to do a default implementation of - let's say -  getAffinityKey() in Queue.Task.

          Oleg Nenashev added a comment -

          Core part was released in Jenkins 2.145

          Oleg Nenashev added a comment - Core part was released in Jenkins 2.145

          Joerg Schwaerzler added a comment - - edited

          PR against workflow-durable-task-step plugin: https://github.com/jenkinsci/workflow-durable-task-step-plugin/pull/78

          Joerg Schwaerzler added a comment - - edited PR against workflow-durable-task-step plugin: https://github.com/jenkinsci/workflow-durable-task-step-plugin/pull/78

          Devin Nusbaum added a comment -

          The Pipeline side of this was just released in version 2.29 of the Pipeline Nodes and Processes Plugin.

          Devin Nusbaum added a comment - The Pipeline side of this was just released in version 2.29 of the Pipeline Nodes and Processes Plugin.

            macdrega Joerg Schwaerzler
            jglick Jesse Glick
            Votes:
            7 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: