[JENKINS-46569] Job execution time includes waiting time

Type: Bug
Resolution: Unresolved
Priority: Blocker
Component/s: pipeline
Labels:
None
Environment:

Hide
Jenkins LTS 2.60.3

Plugins:
Name Version Enabled
ace-editor 1.1 true
authentication-tokens 1.3 true
bouncycastle-api 2.16.2 true
branch-api 2.0.11 true
cloudbees-folder 6.1.2 true
credentials 2.1.14 true
credentials-binding 1.13 true
display-url-api 2.0 true
docker-commons 1.8 true
docker-workflow 1.12 true
durable-task 1.14 true
git 3.5.1 true
git-client 2.5.0 true
git-server 1.7 true
handlebars 1.1.1 true
icon-shim 2.0.3 true
jackson2-api 2.7.3 true
jquery-detached 1.2.1 true
junit 1.21 true
mailer 1.20 true
matrix-project 1.11 true
momentjs 1.1.1 true
pipeline-build-step 2.5.1 true
pipeline-graph-analysis 1.5 true
pipeline-input-step 2.8 true
pipeline-milestone-step 1.3.1 true
pipeline-model-api 1.1.9 true
pipeline-model-declarative-agent 1.1.1 true
pipeline-model-definition 1.1.9 true
pipeline-model-extensions 1.1.9 true
pipeline-rest-api 2.9 true
pipeline-stage-step 2.2 true
pipeline-stage-tags-metadata 1.1.9 true
pipeline-stage-view 2.9 true
plain-credentials 1.4 true
scm-api 2.2.1 true
script-security 1.33 true
ssh-credentials 1.13 true
structs 1.10 true
workflow-aggregator 2.5 true
workflow-api 2.20 true
workflow-basic-steps 2.6 true
workflow-cps 2.39 true
workflow-cps-global-lib 2.8 true
workflow-durable-task-step 2.15 true
workflow-job 2.12.2 true
workflow-multibranch 2.16 true
workflow-scm-step 2.6 true
workflow-step-api 2.12 true
workflow-support 2.14 true

Show
Jenkins LTS 2.60.3 Plugins: Name Version Enabled ace-editor 1.1 true authentication-tokens 1.3 true bouncycastle-api 2.16.2 true branch-api 2.0.11 true cloudbees-folder 6.1.2 true credentials 2.1.14 true credentials-binding 1.13 true display-url-api 2.0 true docker-commons 1.8 true docker-workflow 1.12 true durable-task 1.14 true git 3.5.1 true git-client 2.5.0 true git-server 1.7 true handlebars 1.1.1 true icon-shim 2.0.3 true jackson2-api 2.7.3 true jquery-detached 1.2.1 true junit 1.21 true mailer 1.20 true matrix-project 1.11 true momentjs 1.1.1 true pipeline-build-step 2.5.1 true pipeline-graph-analysis 1.5 true pipeline-input-step 2.8 true pipeline-milestone-step 1.3.1 true pipeline-model-api 1.1.9 true pipeline-model-declarative-agent 1.1.1 true pipeline-model-definition 1.1.9 true pipeline-model-extensions 1.1.9 true pipeline-rest-api 2.9 true pipeline-stage-step 2.2 true pipeline-stage-tags-metadata 1.1.9 true pipeline-stage-view 2.9 true plain-credentials 1.4 true scm-api 2.2.1 true script-security 1.33 true ssh-credentials 1.13 true structs 1.10 true workflow-aggregator 2.5 true workflow-api 2.20 true workflow-basic-steps 2.6 true workflow-cps 2.39 true workflow-cps-global-lib 2.8 true workflow-durable-task-step 2.15 true workflow-job 2.12.2 true workflow-multibranch 2.16 true workflow-scm-step 2.6 true workflow-step-api 2.12 true workflow-support 2.14 true

Similar Issues:
Powered by SuggestiMate

Show

The issue occours if you have a "Multibranch Pipeline" job that takes some time such as:

pipeline {
    agent any
    
    stages {
         stage('only'){
            steps {
                checkout scm
                sh 'sleep 300'
            }
        }
    }
}

This gets automatically detected by the pipeline and executed. This can also be kicked off manually by going into the job and clicking the run button beside the branch (e.g. 'master'). If this button is pushed twice in succession (or two branches are committed at the same time) two instances of the job will run. If there is a single build executor that these jobs can run on (I have only seen this when the executor is separate from the "master" node), the first will start running. The second will do some pipeline action to identify what needs to be run, but then wait in the queue for the first job to complete. Once the first job has completed the second job will run to completion. For the above job, the first job will show as taking the appropriate five minutes. However, the second job will show as taking ten minutes....it is including the time it was waiting for an executor! If this is done with many jobs, the time will include all the time that each job had to wait in the queue. When looking at the historical builds, it will look like it took ten minutes to execute even though it was five of waiting and five of executing. This will also affect the projected build time for the next job run.

This also shows "10 minutes building on an executor" if the metrics plugin is installed.

The solution to this is to not include the time spent waiting for an executor in the build time recorded. I believe the matrix jobs had a similar issue that was fixed in Issue #8112.

Thomas Kent created issue - 2017-08-31 16:35

pleemann added a comment - 2017-11-27 15:58

This problem also appears when there are more jobs running than executors available. After each stage, the job is added to the build queue again. Including the waiting time makes the recorded time quite useless.

pleemann added a comment - 2017-11-27 15:58 This problem also appears when there are more jobs running than executors available. After each stage, the job is added to the build queue again. Including the waiting time makes the recorded time quite useless.

Joe Kimmel added a comment - 2018-03-29 00:58

+1 – please! I don't understand how anyone gets past this: I have a large codebase, and I want to queue up many orthogonal unit tests to be run in parallel on a limited set of workers, but the "time limit" then needs to be the total time that the jobs might spend in the queue! What if there's a different run queued in front of them? I just want a timelimit that will kill hung jobs – e.g. I should be able to say "no single unit test suite will ever run for more than 5 minutes including repo checkout, build-time, and running" and then kill the job if there's an infinite loop, or an infinite hang on some network comm, where "infinite" equals "more than 5 minutes since the job started".

however with the current implementation I have to set it the timeout to something large like 30 minutes or 1 hour, and there will always be some pathological case where many jobs are queued and so some fail due to timeout!

Joe Kimmel added a comment - 2018-03-29 00:58 +1 – please! I don't understand how anyone gets past this: I have a large codebase, and I want to queue up many orthogonal unit tests to be run in parallel on a limited set of workers, but the "time limit" then needs to be the total time that the jobs might spend in the queue! What if there's a different run queued in front of them? I just want a timelimit that will kill hung jobs – e.g. I should be able to say "no single unit test suite will ever run for more than 5 minutes including repo checkout, build-time, and running" and then kill the job if there's an infinite loop, or an infinite hang on some network comm, where "infinite" equals "more than 5 minutes since the job started". however with the current implementation I have to set it the timeout to something large like 30 minutes or 1 hour, and there will always be some pathological case where many jobs are queued and so some fail due to timeout!

Daniel Daehler added a comment - 2018-07-27 08:14 - edited

This problem also occurs with pipeline definitions that specify an agent on the pipeline itself. Because reasons we cannot execute more than one build per agent, therefore each agent capable of executing the label in question is restricted to 1 executor and concurrent builds are disabled. If our devs commit to different branches of the project more or less simultaneously, jobs are started for each branch but some of the jobs just idle around until an executor is available. The time spent waiting for an initial executor should not be added to the execution time, or the job should be pending until an executor is available.

Any known workarounds for this issue?

Pipeline

pipeline {
    agent {
        node {
            label 'client-env'
        }
    }
    options {
        disableConcurrentBuilds()
    }
    triggers {
        pollSCM 'H/3 * * * *'
    }
    stages {
        stage('Build') {
            steps {
                echo 'Long running build'
            }
        }
    }
}

Daniel Daehler added a comment - 2018-07-27 08:14 - edited This problem also occurs with pipeline definitions that specify an agent on the pipeline itself. Because reasons we cannot execute more than one build per agent, therefore each agent capable of executing the label in question is restricted to 1 executor and concurrent builds are disabled. If our devs commit to different branches of the project more or less simultaneously, jobs are started for each branch but some of the jobs just idle around until an executor is available. The time spent waiting for an initial executor should not be added to the execution time, or the job should be pending until an executor is available. Any known workarounds for this issue? Pipeline pipeline { agent { node { label 'client-env' } } options { disableConcurrentBuilds() } triggers { pollSCM 'H/3 * * * *' } stages { stage( 'Build' ) { steps { echo ' Long running build' } } } }

Daniel Daehler made changes - 2018-07-27 10:01

Issue Type

Original: Improvement [ 4 ]

New: Bug [ 1 ]

Marty S added a comment - 2018-12-13 09:21

The priority of this should be higher than "Minor", since the additional waiting time will count towards possible timeouts.

So if I define a timeout of one hour and the job waits for 30 minutes, it will be cancelled after 30 minutes of "real" execution time.

Marty S added a comment - 2018-12-13 09:21 The priority of this should be higher than "Minor", since the additional waiting time will count towards possible timeouts. So if I define a timeout of one hour and the job waits for 30 minutes, it will be cancelled after 30 minutes of "real" execution time.

Josh Wand added a comment - 2018-12-21 18:28

I partially get around this by putting the timeout block inside the node block (using procedural pipeline, anyways):

stage('stage 1') {
  node {
    timeout(30) {
      // do stuff
    }
  }
}

But the build times are still wrong, even for a single stage–the total time reported still includes the time spent waiting for an executor.

Josh Wand added a comment - 2018-12-21 18:28 I partially get around this by putting the timeout block inside the node block (using procedural pipeline, anyways): stage( 'stage 1' ) { node { timeout(30) { // do stuff } } } But the build times are still wrong, even for a single stage–the total time reported still includes the time spent waiting for an executor.

Aaron D. Marasco added a comment - 2019-03-06 18:37

joshwand the problem with that workaround is that each stage/node needs its own timeout. :-/

Aaron D. Marasco added a comment - 2019-03-06 18:37 joshwand the problem with that workaround is that each stage/node needs its own timeout . :-/

David Resnick added a comment - 2020-08-10 15:33

I agree, this is definitely not minor. Big problem for us, a shame that it is not higher priority.

David Resnick added a comment - 2020-08-10 15:33 I agree, this is definitely not minor. Big problem for us, a shame that it is not higher priority.

Thomas de Grenier de Latour made changes - 2020-08-10 19:51

Description

Original: The issue occours if you have a "Multibranch Pipeline" job that takes some time such as:

{code:java}
pipeline {
    agent any

    stages {
        stage('only'){
            steps {
                checkout scm
                sh 'sleep 300'
            }
        }
    }
}{code}


This gets automatically detected by the pipeline and executed. This can also be kicked off manually by going into the job and clicking the run button beside the branch (e.g. 'master'). If this button is pushed twice in succession (or two branches are committed at the same time) two instances of the job will run. If there is a single build executor that these jobs can run on (I have only seen this when the executor is separate from the "master" node), the first will start running. The second will do some pipeline action to identify what needs to be run, but then wait in the queue for the first job to complete. Once the first job has completed the second job will run to completion. For the above job, the first job will show as taking the appropriate five minutes. However, the second job will show as taking ten minutes....it is including the time it was waiting for an executor! If this is done with many jobs, the time will include all the time that each job had to wait in the queue. When looking at the historical builds, it will look like it took ten minutes to execute even though it was five of waiting and five of executing. This will also affect the projected build time for the next job run.

This also shows "10 minutes building on an executor" if the metrics plugin is installed.

The solution to this is to not include the time spent waiting for an executor in the build time recorded. I believe the matrix jobs had a similar issue that was fixed in Issue #8112.

New: The issue occours if you have a "Multibranch Pipeline" job that takes some time such as:
{code:java}
pipeline {
    agent any

    stages {
         stage('only'){
            steps {
                checkout scm
                sh 'sleep 300'
            }
        }
    }
}{code}

This gets automatically detected by the pipeline and executed. This can also be kicked off manually by going into the job and clicking the run button beside the branch (e.g. 'master'). If this button is pushed twice in succession (or two branches are committed at the same time) two instances of the job will run. If there is a single build executor that these jobs can run on (I have only seen this when the executor is separate from the "master" node), the first will start running. The second will do some pipeline action to identify what needs to be run, but then wait in the queue for the first job to complete. Once the first job has completed the second job will run to completion. For the above job, the first job will show as taking the appropriate five minutes. However, the second job will show as taking ten minutes....it is including the time it was waiting for an executor! If this is done with many jobs, the time will include all the time that each job had to wait in the queue. When looking at the historical builds, it will look like it took ten minutes to execute even though it was five of waiting and five of executing. This will also affect the projected build time for the next job run.

This also shows "10 minutes building on an executor" if the metrics plugin is installed.

The solution to this is to not include the time spent waiting for an executor in the build time recorded. I believe the matrix jobs had a similar issue that was fixed in Issue #8112.

Assignee:: Unassigned

Reporter:: Thomas Kent

Votes:: 60 Vote for this issue

Watchers:: 61 Start watching this issue

Created:: 2017-08-31 16:35

Updated:: 2024-06-06 17:23

Jenkins

Details

Description

Attachments

Activity

Collapse comment: pleemann added a comment - 2017-11-27 15:58

Expand comment: pleemann added a comment - 2017-11-27 15:58

Collapse comment: Joe Kimmel added a comment - 2018-03-29 00:58

Expand comment: Joe Kimmel added a comment - 2018-03-29 00:58

Collapse comment: Daniel Daehler added a comment - 2018-07-27 08:14, Edited by Daniel Daehler - 2018-07-27 08:15

Expand comment: Daniel Daehler added a comment - 2018-07-27 08:14, Edited by Daniel Daehler - 2018-07-27 08:15

Collapse comment: Marty S added a comment - 2018-12-13 09:21

Expand comment: Marty S added a comment - 2018-12-13 09:21

Collapse comment: Josh Wand added a comment - 2018-12-21 18:28

Expand comment: Josh Wand added a comment - 2018-12-21 18:28

Collapse comment: Aaron D. Marasco added a comment - 2019-03-06 18:37

Expand comment: Aaron D. Marasco added a comment - 2019-03-06 18:37

Collapse comment: David Resnick added a comment - 2020-08-10 15:33

Expand comment: David Resnick added a comment - 2020-08-10 15:33

People

Dates