Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-19208

build flow takes a long time to detect that all started jobs have finished successfully

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Component/s: build-flow-plugin
    • Labels:
    • Environment:
    • Similar Issues:

      Description

      The following flow frequently takes a long time before detecting all spawned builds have completed.

      build.setDisplayName("#${params['pullrequest']}@${params['sha'].substring(0,6)}>${params['mergebranch']} [${build.number}]")
      
      // full, parallelized, Scala PR validation
      // scala-distpack should run fastdistpack-maven-opt, a new task that skips generating docs
      parallel (
        { retry (2, { build(params, "pr-scala-rangepos") }) },
      
        { distpack = build(params, "pr-scala-distpack")
          downParams = params + [ 'distpack_build' : "<SpecificBuildSelector><buildNumber>${distpack.build.number}</buildNumber></SpecificBuildSelector>" ]
          // has to run on same node as distpack for stability test
          // TODO: have separate job for stability (skip locker in distpack)
          testParams = downParams + [ 'node' : distpack.build.builtOnStr ]
      
          parallel (
            { retry(2, { build(testParams, "pr-scala-test") }) },
            { retry(2, { build(downParams, "pr-scala-integrate-ide") }) }, // cannot figure out how to add an ignore wrapper here..
            { retry(2, { build(downParams, "pr-scala-integrate-partest") }) }
          )
        }
      )
      
      

      Here's close to 30 examples:
      https://scala-webapps.epfl.ch/jenkins/view/pr-validators/job/pr-scala/1113/
      ...
      https://scala-webapps.epfl.ch/jenkins/view/pr-validators/job/pr-scala/1135/
      https://scala-webapps.epfl.ch/jenkins/view/pr-validators/job/pr-scala/1145/
      https://scala-webapps.epfl.ch/jenkins/view/pr-validators/job/pr-scala/1146/
      https://scala-webapps.epfl.ch/jenkins/view/pr-validators/job/pr-scala/1147/
      https://scala-webapps.epfl.ch/jenkins/view/pr-validators/job/pr-scala/1150/

      ps: the ignore combinator causes even worse breakage (I wanted to ignore a build in a parallel block, but had to abandon because it caused what looks like races)

        Attachments

          Activity

          adriaanm Adriaan Moors created issue -
          adriaanm Adriaan Moors made changes -
          Field Original Value New Value
          Description The following flow frequently "deadlocks" -- it does not realize the jobs it started have ended.

          {code}
          build.setDisplayName("#${params['pullrequest']}@${params['sha'].substring(0,6)}>${params['mergebranch']} [${build.number}]")

          // full, parallelized, Scala PR validation
          // scala-distpack should run fastdistpack-maven-opt, a new task that skips generating docs
          parallel (
            { retry (2, { build(params, "pr-scala-rangepos") }) },

            { distpack = build(params, "pr-scala-distpack")
              downParams = params + [ 'distpack_build' : "<SpecificBuildSelector><buildNumber>${distpack.build.number}</buildNumber></SpecificBuildSelector>" ]
              // has to run on same node as distpack for stability test
              // TODO: have separate job for stability (skip locker in distpack)
              testParams = downParams + [ 'node' : distpack.build.builtOnStr ]

              parallel (
                { retry(2, { build(testParams, "pr-scala-test") }) },
                { retry(2, { build(downParams, "pr-scala-integrate-ide") }) }, // cannot figure out how to add an ignore wrapper here.. flow plugin seems fundamentally broken
                { retry(2, { build(downParams, "pr-scala-integrate-partest") }) }
              )
            }
          )

          {code}

          Here's close to 30 examples:
          https://scala-webapps.epfl.ch/jenkins/view/pr-validators/job/pr-scala/1113/
          ...
          https://scala-webapps.epfl.ch/jenkins/view/pr-validators/job/pr-scala/1135/
          https://scala-webapps.epfl.ch/jenkins/view/pr-validators/job/pr-scala/1145/
          https://scala-webapps.epfl.ch/jenkins/view/pr-validators/job/pr-scala/1146/
          https://scala-webapps.epfl.ch/jenkins/view/pr-validators/job/pr-scala/1147/
          https://scala-webapps.epfl.ch/jenkins/view/pr-validators/job/pr-scala/1150/

          ps: the ignore combinator causes even worse breakage (I wanted to ignore
          The following flow frequently "deadlocks" -- it does not realize the jobs it started have ended.

          {code}
          build.setDisplayName("#${params['pullrequest']}@${params['sha'].substring(0,6)}>${params['mergebranch']} [${build.number}]")

          // full, parallelized, Scala PR validation
          // scala-distpack should run fastdistpack-maven-opt, a new task that skips generating docs
          parallel (
            { retry (2, { build(params, "pr-scala-rangepos") }) },

            { distpack = build(params, "pr-scala-distpack")
              downParams = params + [ 'distpack_build' : "<SpecificBuildSelector><buildNumber>${distpack.build.number}</buildNumber></SpecificBuildSelector>" ]
              // has to run on same node as distpack for stability test
              // TODO: have separate job for stability (skip locker in distpack)
              testParams = downParams + [ 'node' : distpack.build.builtOnStr ]

              parallel (
                { retry(2, { build(testParams, "pr-scala-test") }) },
                { retry(2, { build(downParams, "pr-scala-integrate-ide") }) }, // cannot figure out how to add an ignore wrapper here.. flow plugin seems fundamentally broken
                { retry(2, { build(downParams, "pr-scala-integrate-partest") }) }
              )
            }
          )

          {code}

          Here's close to 30 examples:
          https://scala-webapps.epfl.ch/jenkins/view/pr-validators/job/pr-scala/1113/
          ...
          https://scala-webapps.epfl.ch/jenkins/view/pr-validators/job/pr-scala/1135/
          https://scala-webapps.epfl.ch/jenkins/view/pr-validators/job/pr-scala/1145/
          https://scala-webapps.epfl.ch/jenkins/view/pr-validators/job/pr-scala/1146/
          https://scala-webapps.epfl.ch/jenkins/view/pr-validators/job/pr-scala/1147/
          https://scala-webapps.epfl.ch/jenkins/view/pr-validators/job/pr-scala/1150/

          ps: the ignore combinator causes even worse breakage (I wanted to ignore a build in a parallel block, but had to abandon because it caused similar races)
          adriaanm Adriaan Moors made changes -
          Description The following flow frequently "deadlocks" -- it does not realize the jobs it started have ended.

          {code}
          build.setDisplayName("#${params['pullrequest']}@${params['sha'].substring(0,6)}>${params['mergebranch']} [${build.number}]")

          // full, parallelized, Scala PR validation
          // scala-distpack should run fastdistpack-maven-opt, a new task that skips generating docs
          parallel (
            { retry (2, { build(params, "pr-scala-rangepos") }) },

            { distpack = build(params, "pr-scala-distpack")
              downParams = params + [ 'distpack_build' : "<SpecificBuildSelector><buildNumber>${distpack.build.number}</buildNumber></SpecificBuildSelector>" ]
              // has to run on same node as distpack for stability test
              // TODO: have separate job for stability (skip locker in distpack)
              testParams = downParams + [ 'node' : distpack.build.builtOnStr ]

              parallel (
                { retry(2, { build(testParams, "pr-scala-test") }) },
                { retry(2, { build(downParams, "pr-scala-integrate-ide") }) }, // cannot figure out how to add an ignore wrapper here.. flow plugin seems fundamentally broken
                { retry(2, { build(downParams, "pr-scala-integrate-partest") }) }
              )
            }
          )

          {code}

          Here's close to 30 examples:
          https://scala-webapps.epfl.ch/jenkins/view/pr-validators/job/pr-scala/1113/
          ...
          https://scala-webapps.epfl.ch/jenkins/view/pr-validators/job/pr-scala/1135/
          https://scala-webapps.epfl.ch/jenkins/view/pr-validators/job/pr-scala/1145/
          https://scala-webapps.epfl.ch/jenkins/view/pr-validators/job/pr-scala/1146/
          https://scala-webapps.epfl.ch/jenkins/view/pr-validators/job/pr-scala/1147/
          https://scala-webapps.epfl.ch/jenkins/view/pr-validators/job/pr-scala/1150/

          ps: the ignore combinator causes even worse breakage (I wanted to ignore a build in a parallel block, but had to abandon because it caused similar races)
          The following flow frequently "deadlocks" -- it does not realize the jobs it started have ended.

          {code}
          build.setDisplayName("#${params['pullrequest']}@${params['sha'].substring(0,6)}>${params['mergebranch']} [${build.number}]")

          // full, parallelized, Scala PR validation
          // scala-distpack should run fastdistpack-maven-opt, a new task that skips generating docs
          parallel (
            { retry (2, { build(params, "pr-scala-rangepos") }) },

            { distpack = build(params, "pr-scala-distpack")
              downParams = params + [ 'distpack_build' : "<SpecificBuildSelector><buildNumber>${distpack.build.number}</buildNumber></SpecificBuildSelector>" ]
              // has to run on same node as distpack for stability test
              // TODO: have separate job for stability (skip locker in distpack)
              testParams = downParams + [ 'node' : distpack.build.builtOnStr ]

              parallel (
                { retry(2, { build(testParams, "pr-scala-test") }) },
                { retry(2, { build(downParams, "pr-scala-integrate-ide") }) }, // cannot figure out how to add an ignore wrapper here..
                { retry(2, { build(downParams, "pr-scala-integrate-partest") }) }
              )
            }
          )

          {code}

          Here's close to 30 examples:
          https://scala-webapps.epfl.ch/jenkins/view/pr-validators/job/pr-scala/1113/
          ...
          https://scala-webapps.epfl.ch/jenkins/view/pr-validators/job/pr-scala/1135/
          https://scala-webapps.epfl.ch/jenkins/view/pr-validators/job/pr-scala/1145/
          https://scala-webapps.epfl.ch/jenkins/view/pr-validators/job/pr-scala/1146/
          https://scala-webapps.epfl.ch/jenkins/view/pr-validators/job/pr-scala/1147/
          https://scala-webapps.epfl.ch/jenkins/view/pr-validators/job/pr-scala/1150/

          ps: the ignore combinator causes even worse breakage (I wanted to ignore a build in a parallel block, but had to abandon because it caused similar races)
          Hide
          adriaanm Adriaan Moors added a comment -

          I'd love to hear about workarounds. We're drowning in spurious PR validation failures.

          Show
          adriaanm Adriaan Moors added a comment - I'd love to hear about workarounds. We're drowning in spurious PR validation failures.
          Hide
          adriaanm Adriaan Moors added a comment -

          Apologies, it seems it didn't hang. Was just very slow to realize all jobs had ended. Not sure if I misconfigured jenkins somehow. Any suggestions on where to look?

          Show
          adriaanm Adriaan Moors added a comment - Apologies, it seems it didn't hang. Was just very slow to realize all jobs had ended. Not sure if I misconfigured jenkins somehow. Any suggestions on where to look?
          adriaanm Adriaan Moors made changes -
          Priority Critical [ 2 ] Major [ 3 ]
          Summary deadlock: all started jobs finished successfully, flow hangs build flow takes a long time to detect that all started jobs have finished successfully
          adriaanm Adriaan Moors made changes -
          Description The following flow frequently "deadlocks" -- it does not realize the jobs it started have ended.

          {code}
          build.setDisplayName("#${params['pullrequest']}@${params['sha'].substring(0,6)}>${params['mergebranch']} [${build.number}]")

          // full, parallelized, Scala PR validation
          // scala-distpack should run fastdistpack-maven-opt, a new task that skips generating docs
          parallel (
            { retry (2, { build(params, "pr-scala-rangepos") }) },

            { distpack = build(params, "pr-scala-distpack")
              downParams = params + [ 'distpack_build' : "<SpecificBuildSelector><buildNumber>${distpack.build.number}</buildNumber></SpecificBuildSelector>" ]
              // has to run on same node as distpack for stability test
              // TODO: have separate job for stability (skip locker in distpack)
              testParams = downParams + [ 'node' : distpack.build.builtOnStr ]

              parallel (
                { retry(2, { build(testParams, "pr-scala-test") }) },
                { retry(2, { build(downParams, "pr-scala-integrate-ide") }) }, // cannot figure out how to add an ignore wrapper here..
                { retry(2, { build(downParams, "pr-scala-integrate-partest") }) }
              )
            }
          )

          {code}

          Here's close to 30 examples:
          https://scala-webapps.epfl.ch/jenkins/view/pr-validators/job/pr-scala/1113/
          ...
          https://scala-webapps.epfl.ch/jenkins/view/pr-validators/job/pr-scala/1135/
          https://scala-webapps.epfl.ch/jenkins/view/pr-validators/job/pr-scala/1145/
          https://scala-webapps.epfl.ch/jenkins/view/pr-validators/job/pr-scala/1146/
          https://scala-webapps.epfl.ch/jenkins/view/pr-validators/job/pr-scala/1147/
          https://scala-webapps.epfl.ch/jenkins/view/pr-validators/job/pr-scala/1150/

          ps: the ignore combinator causes even worse breakage (I wanted to ignore a build in a parallel block, but had to abandon because it caused similar races)
          The following flow frequently takes a long time before detecting all spawned builds have completed.

          {code}
          build.setDisplayName("#${params['pullrequest']}@${params['sha'].substring(0,6)}>${params['mergebranch']} [${build.number}]")

          // full, parallelized, Scala PR validation
          // scala-distpack should run fastdistpack-maven-opt, a new task that skips generating docs
          parallel (
            { retry (2, { build(params, "pr-scala-rangepos") }) },

            { distpack = build(params, "pr-scala-distpack")
              downParams = params + [ 'distpack_build' : "<SpecificBuildSelector><buildNumber>${distpack.build.number}</buildNumber></SpecificBuildSelector>" ]
              // has to run on same node as distpack for stability test
              // TODO: have separate job for stability (skip locker in distpack)
              testParams = downParams + [ 'node' : distpack.build.builtOnStr ]

              parallel (
                { retry(2, { build(testParams, "pr-scala-test") }) },
                { retry(2, { build(downParams, "pr-scala-integrate-ide") }) }, // cannot figure out how to add an ignore wrapper here..
                { retry(2, { build(downParams, "pr-scala-integrate-partest") }) }
              )
            }
          )

          {code}

          Here's close to 30 examples:
          https://scala-webapps.epfl.ch/jenkins/view/pr-validators/job/pr-scala/1113/
          ...
          https://scala-webapps.epfl.ch/jenkins/view/pr-validators/job/pr-scala/1135/
          https://scala-webapps.epfl.ch/jenkins/view/pr-validators/job/pr-scala/1145/
          https://scala-webapps.epfl.ch/jenkins/view/pr-validators/job/pr-scala/1146/
          https://scala-webapps.epfl.ch/jenkins/view/pr-validators/job/pr-scala/1147/
          https://scala-webapps.epfl.ch/jenkins/view/pr-validators/job/pr-scala/1150/

          ps: the ignore combinator causes even worse breakage (I wanted to ignore a build in a parallel block, but had to abandon because it caused what looks like races)
          adriaanm Adriaan Moors made changes -
          Labels hang parallel retry parallel retry
          batmat Baptiste Mathus made changes -
          Assignee Nicolas De Loof [ ndeloof ] Damien Nozay [ dnozay ]
          dnozay Damien Nozay made changes -
          Assignee Damien Nozay [ dnozay ]
          rtyler R. Tyler Croy made changes -
          Workflow JNJira [ 150656 ] JNJira + In-Review [ 177724 ]

            People

            Assignee:
            Unassigned Unassigned
            Reporter:
            adriaanm Adriaan Moors
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Dates

              Created:
              Updated: