Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-24790

Matrix plugin - Label expression pick invalid node

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      I'm seeing strange behaviour when i'm using a matrix job with the label expressions.
      I have the following setup:
      4 nodes with the following labels:

      • testnode0 - 4gb maintestpool
      • testnode1 - 4gb maintestpool
      • testnode3 - 8gb maintestpool
      • testnode4 - 8gb maintestpool

      All nodes have only 1 executor.
      In the matrix job (build concurrent enabled) I have specified the label expression as:
      Name - node
      Label Expressions:

      • !under_test&&4gb&&maintestpool
      • !under_test&&8gb&&maintestpool

      My understanding of this expression is - pick the next two nodes that, don't have the under_test label and one has label 4gb and one has label 8gb and both have the label maintestpool
      I then append the label 'under_test' to each of the chosen nodes - This is because I reboot the nodes as part of a downstream multijob test run and hopefully this will stop a pending job pinching the node mid way through a test run.
      The downstream multijob is triggered with a string parameter $NODE_NAME (Jenkins env var) because everything in the build section and beyond of the matrix jobs is run on the node chosen by the Label Expression, each of the multijobs sub jobs
      are passed the $NODE_NAME parameter as a Label param.

      Now the problem is that if I kick off 3 matrix jobs the last job should just queue unit the down stream job has finished (which then remove the under_test label)
      However what is happening is that the all 3 job are pushed onto the first two nodes matching the Expression
      Job 1 on - !under_test&&4gb&&maintestpool - has the following output:

      23:33:15 Building remotely on testnode0(4gb maintestpool) in workspace /Users/builduk/Jenkins/Home/DSL_osx_tests_martix/node/!under_test&&4gb&&maintestpool
      23:33:15 [!under_test&&4gb&&maintestpool] $ /bin/sh -xe /var/folders/88/kjb_0d_x7sx525zv6bz2c4w80000gn/T/hudson619443788463271298.sh
      23:33:15 + echo '!under_test&&4gb&&maintestpool'
      23:33:15 !under_test&&4gb&&maintestpool
      23:33:15 Adding under_test label
      23:33:15 hudson.slaves.DumbSlave@7ecde659
      23:33:15 Test node is: testnode0
      23:33:15 Original Labels: 10.9.4 4gb maintestpool
      23:33:15 Adding under test label
      23:33:15 Setting labels: 10.9.4 4gb maintestpool under_test
      

      Which is fine, exactly what i want. Then the job is kicked off again:

      23:33:19 Building remotely on testnode0 (4gb under_test maintestpool) in workspace /Users/builduk/Jenkins/Home/DSL_osx_tests_martix/node/!under_test&&4gb&&maintestpool
      23:33:19 [!under_test&&4gb&&maintestpool] $ /bin/sh -xe /var/folders/88/kjb_0d_x7sx525zv6bz2c4w80000gn/T/hudson2089316416162620658.sh
      23:33:19 + echo '!under_test&&4gb&&maintestpool'
      23:33:19 !under_test&&4gb&&maintestpool
      23:33:19 Adding under_test label
      23:33:19 hudson.slaves.DumbSlave@7ecde659
      23:33:19 Test node is: norfolk
      23:33:19 Original Labels: 10.9.4 4gb verificationpool office i5 ssd maintestpool under_test
      23:33:19 Script returned: 1
      

      Which is not what I want, as you can see from the first line the 'under_test' label exists therefore the matrix job should not have chosen this node, it should have moved onto testnode1. You can see script return 1 - this because my script that appends the label 'under_test', checks if it is already present in the label list, which is clearly is. The same happens for the third job

      Does anyone know what I'm doing wrong.

      Thanks - Apologies for the length of the post
      satpal

        Attachments

          Activity

          Hide
          danielbeck Daniel Beck added a comment -

          Jira is not a support site. Ask questions on the jenkinsci-users list instead of claiming everything you don't understand is a critical bug in the software.

          Jenkins seems to assign both job runs at the same time to the node in question, while no run has yet assigned the new label in its script execution. But even if that's not the case, Jenkins generally doesn't support use of labels like this, as there's caching going on in the background. Use something better suited for this, e.g. https://wiki.jenkins-ci.org/display/JENKINS/Throttle+Concurrent+Builds+Plugin

          Show
          danielbeck Daniel Beck added a comment - Jira is not a support site. Ask questions on the jenkinsci-users list instead of claiming everything you don't understand is a critical bug in the software. Jenkins seems to assign both job runs at the same time to the node in question, while no run has yet assigned the new label in its script execution. But even if that's not the case, Jenkins generally doesn't support use of labels like this, as there's caching going on in the background. Use something better suited for this, e.g. https://wiki.jenkins-ci.org/display/JENKINS/Throttle+Concurrent+Builds+Plugin
          Hide
          schander Satpal Chander added a comment - - edited

          I'm not sure how this isn't a bug, If we look at the second run of the matrix job we have:

          23:33:19 Building remotely on testnode0 (4gb under_test maintestpool) in workspace /Users/builduk/Jenkins/Home/DSL_osx_tests_martix/node/!under_test&&4gb&&maintestpool
          23:33:19 [!under_test&&4gb&&maintestpool] $ /bin/sh -xe /var/folders/88/kjb_0d_x7sx525zv6bz2c4w80000gn/T/hudson2089316416162620658.sh
          23:33:19 + echo '!under_test&&4gb&&maintestpool'
          

          Jenkins knows what the labels are on the node and the plugin does not seem to check them. The first line in the log above, is Jenkins telling saying what the labels are on the node that it has picked, or am I misunderstanding what that logging line is actually telling me? I have assumed that these are the labels on the node because in the jobs is first run line reads as

          23:33:15 Building remotely on testnode0(4gb maintestpool) in workspace /Users/builduk/Jenkins/Home/DSL_osx_tests_martix/node/!under_test&&4gb&&maintestpool
          

          Please let me know if i'm reading the log lines incorrectly?

          Show
          schander Satpal Chander added a comment - - edited I'm not sure how this isn't a bug, If we look at the second run of the matrix job we have: 23:33:19 Building remotely on testnode0 (4gb under_test maintestpool) in workspace /Users/builduk/Jenkins/Home/DSL_osx_tests_martix/node/!under_test&&4gb&&maintestpool 23:33:19 [!under_test&&4gb&&maintestpool] $ /bin/sh -xe /var/folders/88/kjb_0d_x7sx525zv6bz2c4w80000gn/T/hudson2089316416162620658.sh 23:33:19 + echo '!under_test&&4gb&&maintestpool' Jenkins knows what the labels are on the node and the plugin does not seem to check them. The first line in the log above, is Jenkins telling saying what the labels are on the node that it has picked, or am I misunderstanding what that logging line is actually telling me? I have assumed that these are the labels on the node because in the jobs is first run line reads as 23:33:15 Building remotely on testnode0(4gb maintestpool) in workspace /Users/builduk/Jenkins/Home/DSL_osx_tests_martix/node/!under_test&&4gb&&maintestpool Please let me know if i'm reading the log lines incorrectly?
          Hide
          danielbeck Daniel Beck added a comment -

          Jenkins caches the labels-to-node assignment to save time (as labels are meant to be fairly static, or at least not supposed to need millisecond precision), see trimLabels() in jenkins.model.Jenkins and getNodes() in hudson.model.Label.

          You're checking in the other direction (node-to-labels), which isn't relevant for the queue (see e.g. how hudson.model.Queue's makeBuildable calls Label.contains(Node) which accesses the cached list of nodes).

          Show
          danielbeck Daniel Beck added a comment - Jenkins caches the labels-to-node assignment to save time (as labels are meant to be fairly static, or at least not supposed to need millisecond precision), see trimLabels() in jenkins.model.Jenkins and getNodes() in hudson.model.Label . You're checking in the other direction (node-to-labels), which isn't relevant for the queue (see e.g. how hudson.model.Queue 's makeBuildable calls Label.contains(Node) which accesses the cached list of nodes ).

            People

            Assignee:
            Unassigned Unassigned
            Reporter:
            schander Satpal Chander
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: