• Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • matrix-project-plugin
    • None
    • jenkins 2.72
      matrix project plugin 1.11
      installed from debian package - running directly
      java-8-openjdk
      13 executors labelled for the failing job, label is 'builder'

      When building a matrix job with a large amount of dynamic axis, some of the axis builds will suddenly abort, the log output shows

      11:10:23 kernel-single-defconfig-builder » multi_v7_defconfig+CONFIG_RANDOMIZE_BASE=y,builder appears to be cancelled
      11:10:23 kernel-single-defconfig-builder » multi_v7_defconfig+CONFIG_RANDOMIZE_BASE=y,builder completed with result ABORTED

      It only seems to happen to builds that are a "new" axis or havent been built recently (possibly old one cleaned out by discarding old builds), clicking on the hyperlink in the log gives a 404, it's like the new axis doesn't get setup correctly. I had a look on the filesystem and the aborted jobs have no data in the jobs axis-label directories:

       

      root@machine:/var/lib/jenkins/jobs/kernel-single-defconfig-builder/configurations/axis-defconfig# ls -alh *RANDOMIZE*/axis-label/
       defconfig+CONFIG_RANDOMIZE_BASE=y/axis-label/:
       total 12K
       drwxr-xr-x 3 jenkins jenkins 4.0K Jul 19 13:54 .
       drwxr-xr-x 3 jenkins jenkins 4.0K Mar 27 15:21 ..
       drwxr-xr-x 3 jenkins jenkins 4.0K Aug 25 11:08 builder
      multi_v7_defconfig+CONFIG_RANDOMIZE_BASE=y/axis-label/:
       total 8.0K
       drwxr-xr-x 2 jenkins jenkins 4.0K Aug 24 18:03 .
       drwxr-xr-x 3 jenkins jenkins 4.0K Aug 24 18:03 ..
      multi_v7_defconfig+CONFIG_THUMB2_KERNEL=y+CONFIG_RANDOMIZE_BASE=y/axis-label/:
       total 8.0K
       drwxr-xr-x 2 jenkins jenkins 4.0K Aug 24 18:03 .
       drwxr-xr-x 3 jenkins jenkins 4.0K Aug 24 18:03 ..
      omap2plus_defconfig+CONFIG_RANDOMIZE_BASE=y/axis-label/:
       total 8.0K
       drwxr-xr-x 2 jenkins jenkins 4.0K Aug 24 18:03 .
       drwxr-xr-x 3 jenkins jenkins 4.0K Aug 24 18:03 ..
      

       

      The label used is 'builder' and the first axis that contains this directory built fine but the others were aborted.

      Seems to be a repeat of Jenkins-13972 that was closed as fixed without really any code changes or resolution.
      I also found this google presentation which would seem to discuss the same issue:
      https://docs.google.com/presentation/d/1ybtB-Bhkb4c3dhb5ZMArr4prtEZ-pjLqH9Vk7yhdZTg/edit#slide=id.g2c21d8fdc_00

       

      Note that I have a 'staging' setup of the same plugins and jenkins version doing the same build which does not show this problem, but only has 1 builder so they are not executed concurrently.

       

          [JENKINS-46453] matrix job with dynamic axis aborting builds

          pjdarton added a comment -

          We're also seeing this same issue on Jenkins version 2.150.2 ... and I suspect we've been experiencing it (but not been aware that this was the cause) for some time.

          pjdarton added a comment - We're also seeing this same issue on Jenkins version 2.150.2 ... and I suspect we've been experiencing it (but not been aware that this was the cause) for some time.

          Martin added a comment -

          Hello there, do we have any update on this?

          I think we have the same issue on large matrix builds (>300 and >1500), when one build has an additional subbuild compared to the following build. We run them in parallel.

          Martin added a comment - Hello there, do we have any update on this? I think we have the same issue on large matrix builds (>300 and >1500), when one build has an additional subbuild compared to the following build. We run them in parallel.

          pjdarton added a comment -

          My guess is that this is unlikely to be fixed.
          AFAICT the general focus of Jenkins development is now on pipeline builds; the general advice for anyone encountering issues with matrix builds seems to be to re-code them as pipeline builds - pipeline builds have much more flexible functionality and are "the future" whereas matrix builds seem to be viewed as "legacy".

          pjdarton added a comment - My guess is that this is unlikely to be fixed. AFAICT the general focus of Jenkins development is now on pipeline builds; the general advice for anyone encountering issues with matrix builds seems to be to re-code them as pipeline builds - pipeline builds have much more flexible functionality and are "the future" whereas matrix builds seem to be viewed as "legacy".

          Matt Hart added a comment -

          Indeed, since I opened this in 2017 and got absolutely no response, the only way we found to fix this was to migrate our entire setup to Pipeline.

          Matt Hart added a comment - Indeed, since I opened this in 2017 and got absolutely no response, the only way we found to fix this was to migrate our entire setup to Pipeline.

            kohsuke Kohsuke Kawaguchi
            mattface Matt Hart
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: