[JENKINS-24445] Retriggering builds via the "Manual Trigger" feature of the Gerrit Trigger Plugin causes wrong verification

Type: Bug
Resolution: Fixed
Priority: Major
Component/s: gerrit-trigger-plugin, OBSOLETE-gerrit-plugin
Labels:
Environment:
Operating System independent (verified on Linux as well as Windows)
Tested in 3 latest Jenkins LTS releases (1.509, 1.532 and 1.554)

Similar Issues:
Powered by SuggestiMate

Show
URL:
https://github.com/jenkinsci/gerrit-trigger-plugin/pull/172

There is a serious issue with the Gerrit trigger plugin and its "retrigger" functionality.

Problem

This issue causes the plugin to get confused about how many and which builds it started for a particular changeset/patchset combination.

This behaviour leads to inconsistent verification results in Gerrit, because it fails to wait for all tests to complete.

Replication instructions

Start a blank, vanilla Jenkins server of an arbitrary version on your local host (the jobs below assume that a BASH shell is available).
Install the gerrit-trigger plugin (arbitrary version, as all are affected)
Increase the executor count of the "master" node to at least 6.
Restart the server and create a connection to a Gerrit service.
Shut down the server and extract the two jobs (Test_1 and Test_2) into the "jobs" directory of the Jenkins server.
- Test_1 will always fail, Test_2 will always succeed. Both will create a lock-file named "LOCKFILE-${JOB_NAME}-${BUILD_NUMBER}" in "${JENKINS_HOME}/workspace/Central" and wait for it to be deleted, before succeeding or failing.
(Optional) Alter the Test_1 and Test_2 Gerrit Trigger settings to listen to a specific repository. By default, both repo and branch are set to "**".

After this set-up is done, upload a patchset to the Gerrit service that you configured Jenkins to listen to.

This will cause a build of "Test_1" and "Test_2" to be started.
Now, while they are running (remember that they will wait until you delete their lock-file), go into the "manual trigger" interface:
http://[jenkins]/gerrit_manual_trigger/?

Search for the changeset and issue a retrigger. This will cause two additional builds to spawn, one for "Test_1" and one for "Test_2".

Now that 4 tests are running, the correct behaviour of the Gerrit trigger plugin is to ignore the first two runs that have been started by the "Patchset Created" event.

Instead, it should wait until BOTH of the retriggered builds have finished and then upload a "Verified -1" message to Gerrit.

To test this, first delete the two original lock files:

${JENKINS_HOME}/workspace/Central/LOCKFILE-Test_1-1
${JENKINS_HOME}/workspace/Central/LOCKFILE-Test_2-1

This tests the first half of the problem. This works fine. After both have finished, the Gerrit Trigger plugin will still wait for the other two jobs to finish.

Now, to expose the bug, ONLY delete the lockfile for the re-triggered TEST_2 job:
${JENKINS_HOME}/workspace/Central/LOCKFILE-Test_2-2

This will let Test_2 finish successfully (as Test_2 always succeeds).
The bug that happens now, is that the trigger plugin uploads a "Verified +1" message, DESPITE "Test_1 #2" not yet being finished.

If you then also let "Test_1" finish, it uploads yet another verification, this time "Verified -1".

This behaviour is absolutely wrong, as it sends a verification state without waiting for all jobs to complete, thus leading to inconsistent results in Gerrit.

Example output in Gerrit, when error occurs

Here's an excerpt of how this behaviour looks like in Gerrit:

##############################################################################
Jenkins
Patch Set 1: -Verified
Build Started Test_2/1/ (1/2)
-------------------------------------------------------
Jenkins
Patch Set 1:
Build Started Test_1/1/ (2/2)
-------------------------------------------------------
Jenkins
Patch Set 1:
Build Started Test_2/2/ (2/2)
-------------------------------------------------------
Jenkins
Patch Set 1:
Build Started Test_1/2/ (2/2)
-------------------------------------------------------
Jenkins
Patch Set 1: Verified+1
Build Successful 
Test_2/2/ : SUCCESS
-------------------------------------------------------
Jenkins
Patch Set 1: Verified-1
Build Failed 
Test_1/2/ : FAILURE
##############################################################################

You can see two bugs outline above at work here:

1.) The plugin gets confused when counting the number of jobs started
2.) The plugin does not wait for (Test_1 #2) after (Test_2 #2) has finished.

[EDIT: Improved text formatting. Now the report is more than just a wall of text]

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

jobs.zip
2 kB
2014-08-26 12:59

Martin Schröder created issue - 2014-08-26 12:59

rin_ne added a comment - 2014-08-27 10:47

Why do you manipulate lock file from the external of jenkins instance? it is handled by Jenkins core. So your operation would affect Jenkins itself rather than plugin.

I think any plugins has no responsible for this because it is illegal operation.

If you want to take enough time until build completed, you can use "sleep SEC" in shell. (For Windows, "timeout /T SEC")

rin_ne added a comment - 2014-08-27 10:47 Why do you manipulate lock file from the external of jenkins instance? it is handled by Jenkins core. So your operation would affect Jenkins itself rather than plugin. I think any plugins has no responsible for this because it is illegal operation. If you want to take enough time until build completed, you can use "sleep SEC" in shell. (For Windows, "timeout /T SEC")

Martin Schröder added a comment - 2014-08-27 12:31 - edited

Hi Rinrin.

The lockfile has nothing to do with Jenkins. It is merely a file, that the Jenkins Bash Build Step is waiting for, precisely NOT to need to use a fixed sleep time.

This is used to let specific builds of "Test_1" and "Test_2" finish in the desired order, by deleting the "lock files" in the appropriate order. The Bash build step looks like that:

#!bash

# Write out a lock file, that delays, until it is deleted
lck="LOCKFILE-${JOB_NAME}-${BUILD_NUMBER}"
echo "Creating file and waiting for deletion of it: $lck"
touch "$lck"

while [[ -e "$lck" ]]; do
    sleep 1
done

exit 1

It is easy to see, that this buildstep looks like a totally normal build, to both Jenkins and the Gerrit Trigger plugin. It's just a shell build step that is taking some user-defined amount of time.

Thus, the only change over a normal "sleep <time>" is that you, as the debugger, can select the exact order in which these build steps will return.

You can easily change this to a fixed timeout, but that way, you need to be either quick about re-triggering the builds, or make the timeout so long, that you're needlessly waiting for the tests to end.

In any case, the bug is fully and wholly independent of what the build step does. It only matters in which order they return and that the retriggering happens WHILE the previously triggered tests are still running.

Martin Schröder added a comment - 2014-08-27 12:31 - edited Hi Rinrin. The lockfile has nothing to do with Jenkins. It is merely a file, that the Jenkins Bash Build Step is waiting for, precisely NOT to need to use a fixed sleep time. This is used to let specific builds of "Test_1" and "Test_2" finish in the desired order, by deleting the "lock files" in the appropriate order. The Bash build step looks like that: #!bash # Write out a lock file, that delays, until it is deleted lck="LOCKFILE-${JOB_NAME}-${BUILD_NUMBER}" echo "Creating file and waiting for deletion of it: $lck" touch "$lck" while [[ -e "$lck" ]]; do sleep 1 done exit 1 It is easy to see, that this buildstep looks like a totally normal build, to both Jenkins and the Gerrit Trigger plugin. It's just a shell build step that is taking some user-defined amount of time. Thus, the only change over a normal "sleep <time>" is that you, as the debugger, can select the exact order in which these build steps will return. You can easily change this to a fixed timeout, but that way, you need to be either quick about re-triggering the builds, or make the timeout so long, that you're needlessly waiting for the tests to end. In any case, the bug is fully and wholly independent of what the build step does. It only matters in which order they return and that the retriggering happens WHILE the previously triggered tests are still running.

Martin Schröder made changes - 2014-08-27 12:52

Environment

Original: Operating System independent (verified on Linux as well as Unix)
Tested in 3 latest Jenkins LTS releases (1.509, 1.532 and 1.554)

New: Operating System independent (verified on Linux as well as Windows)
Tested in 3 latest Jenkins LTS releases (1.509, 1.532 and 1.554)

Martin Schröder made changes - 2014-08-27 13:08

Description

Original: There is a serious issue with the Gerrit trigger plugin and its "retrigger" functionality.

This issue causes the plugin to get confused about how many and which builds it started for a particular changeset/patchset combination, and decide to post the Verification or Code Review update even if not all test have passed yet.

This behaviour leads to inconsistent verification results in Gerrit and can be easily replicated as follows:

1.) Start a blank, vanilla Jenkins server of an arbitrary version on your local host (the jobs below assume that a BASH shell is available).

2.) Install the gerrit-trigger plugin (arbitrary version, as all are affected)

3.) Increase the executor count of the "master" node to at least 6.

4.) Restart the server and create a connection to a Gerrit service.

5.) Shut down the server and extract the two jobs (Test_1 and Test_2) into the "jobs" directory of the Jenkins server.

5.a ) Test_1 will always fail, Test_2 will always succeed. Both will create a lock-file named "LOCKFILE-${JOB_NAME}-${BUILD_NUMBER}" in "${JENKINS_HOME}/workspace/Central" and wait for it to be deleted, before succeeding or failing.

6.) (Optional) Alter the Test_1 and Test_2 parameters to listen to a specific repository. By default repo and branch are set to "**".

After this set-up is done, upload a patchset to the Gerrit service in question.

This will cause a build of "Test_1" and "Test_2" to be started. Now, while they are running (they are wait until you delete their lock-file), go into the "manual trigger" interface:
http://[jenkins]/gerrit_manual_trigger/?

Search for the changeset and issue a retrigger. This will cause two additional builds to spawn, one for "Test_1" and one for "Test_2".

Now that 4 tests are running, the correct behaviour of the Gerrit trigger plugin is to ignore the first two runs that have been started by the "Patchset Created" event.

Instead, it should wait until BOTH of the retriggered builds have finished and then upload a "Verified -1" message to Gerrit.

To test this, first delete the two original lock files:

${JENKINS_HOME}/workspace/Central/LOCKFILE-Test_1-1
${JENKINS_HOME}/workspace/Central/LOCKFILE-Test_2-1

This tests the first half of the problem. This works fine. After both have finished, the Gerrit Trigger plugin will still wait for the others to finish.

Now, to expose the bug, ONLY delete the lockfile for the retriggered TEST_2 job:
${JENKINS_HOME}/workspace/Central/LOCKFILE-Test_2-2

This will let Test_2 finish successfully (as Test_2 always succeeds). The bug that happens now is, that the trigger plugin uploads a "Verified +1" message, DESPITE "Test_1 #2" not yet being finished.

If you then also let "Test_1" finish, it uploads yet another verification, this time "Verified -1".

This behaviour is absolutely wrong, as it sends a verification state without waiting for all jobs to complete, thus leading to inconsistent results in Gerrit.

Here's an excerpt of how this behaviour looks like in Gerrit:

##############################################################################
Jenkins
Patch Set 1: -Verified
Build Started Test_2/1/ (1/2)
-------------------------------------------------------
Jenkins
Patch Set 1:
Build Started Test_1/1/ (2/2)
-------------------------------------------------------
Jenkins
Patch Set 1:
Build Started Test_2/2/ (2/2)
-------------------------------------------------------
Jenkins
Patch Set 1:
Build Started Test_1/2/ (2/2)
-------------------------------------------------------
Jenkins
Patch Set 1: Verified+1
Build Successful
Test_2/2/ : SUCCESS
-------------------------------------------------------
Jenkins
Patch Set 1: Verified-1
Build Failed
Test_1/2/ : FAILURE
##############################################################################

You can see two bugs at work here:

1.) The plugin gets confused when counting the number of jobs started
2.) The plugin does not wait for (Test_1 #2) after (Test_2 #2) has finished.

New: There is a serious issue with the Gerrit trigger plugin and its "retrigger" functionality.

This issue causes the plugin to get confused about how many and which builds it started for a particular changeset/patchset combination
This causes it to decide to post the Verification or Code Review update *even if not all test have passed yet*.

This behaviour leads to inconsistent verification results in Gerrit and can be easily replicated as follows:

# Start a blank, vanilla Jenkins server of an arbitrary version on your local host (the jobs below assume that a BASH shell is available).
# Install the gerrit-trigger plugin (arbitrary version, as all are affected)
# Increase the executor count of the "master" node to at least 6.
# Restart the server and create a connection to a Gerrit service.
# Shut down the server and extract the two jobs (Test_1 and Test_2) into the "jobs" directory of the Jenkins server.
#* Test_1 will always fail, Test_2 will always succeed. Both will create a lock-file named "LOCKFILE-$\{JOB_NAME}-$\{BUILD_NUMBER}" in "$\{JENKINS_HOME}/workspace/Central" and wait for it to be deleted, before succeeding or failing.
# (Optional) Alter the Test_1 and Test_2 Gerrit Trigger settings to listen to a specific repository. By default, both repo and branch are set to "**".
\\

After this set-up is done, upload a patchset to the Gerrit service that you configured Jenkins to listen to.

This will cause a build of "Test_1" and "Test_2" to be started.
Now, *while they are running* (remember that they will wait until you delete their lock-file), go into the "manual trigger" interface:
http://[jenkins]/gerrit_manual_trigger/?

Search for the changeset and issue a *retrigger*. This will cause two additional builds to spawn, one for "Test_1" and one for "Test_2".

Now that 4 tests are running, the correct behaviour of the Gerrit trigger plugin is to ignore the first two runs that have been started by the "Patchset Created" event.

Instead, it should wait until BOTH of the retriggered builds have finished and then upload a "Verified -1" message to Gerrit.

To test this, first delete the two original lock files:

$\{JENKINS_HOME}/workspace/Central/LOCKFILE-Test_1-1
$\{JENKINS_HOME}/workspace/Central/LOCKFILE-Test_2-1

This tests the first half of the problem. This works fine. After both have finished, the Gerrit Trigger plugin will still wait for the other two jobs to finish.

Now, to expose the bug, *ONLY* delete the lockfile for the re-triggered TEST_2 job:
$\{JENKINS_HOME}/workspace/Central/LOCKFILE-Test_2-2

This will let Test_2 finish successfully (as Test_2 always succeeds).
*The bug that happens now, is that the trigger plugin uploads a "Verified +1" message, DESPITE "Test_1 #2" not yet being finished.*

If you then also let "Test_1" finish, it uploads yet another verification, this time "Verified -1".

This behaviour is _absolutely_ wrong, as it sends a verification state without waiting for all jobs to complete, thus leading to *inconsistent results* in Gerrit.

Here's an excerpt of how this behaviour looks like in Gerrit:

{noformat}
##############################################################################
Jenkins
Patch Set 1: -Verified
Build Started Test_2/1/ (1/2)
-------------------------------------------------------
Jenkins
Patch Set 1:
Build Started Test_1/1/ (2/2)
-------------------------------------------------------
Jenkins
Patch Set 1:
Build Started Test_2/2/ (2/2)
-------------------------------------------------------
Jenkins
Patch Set 1:
Build Started Test_1/2/ (2/2)
-------------------------------------------------------
Jenkins
Patch Set 1: Verified+1
Build Successful
Test_2/2/ : SUCCESS
-------------------------------------------------------
Jenkins
Patch Set 1: Verified-1
Build Failed
Test_1/2/ : FAILURE
##############################################################################
{noformat}

You can see two bugs at work here:

1.) The plugin gets confused when counting the number of jobs started
2.) The plugin does not wait for (Test_1 #2) after (Test_2 #2) has finished.

Martin Schröder made changes - 2014-08-27 13:13

Description

Original: There is a serious issue with the Gerrit trigger plugin and its "retrigger" functionality.

This issue causes the plugin to get confused about how many and which builds it started for a particular changeset/patchset combination
This causes it to decide to post the Verification or Code Review update *even if not all test have passed yet*.

This behaviour leads to inconsistent verification results in Gerrit and can be easily replicated as follows:

# Start a blank, vanilla Jenkins server of an arbitrary version on your local host (the jobs below assume that a BASH shell is available).
# Install the gerrit-trigger plugin (arbitrary version, as all are affected)
# Increase the executor count of the "master" node to at least 6.
# Restart the server and create a connection to a Gerrit service.
# Shut down the server and extract the two jobs (Test_1 and Test_2) into the "jobs" directory of the Jenkins server.
#* Test_1 will always fail, Test_2 will always succeed. Both will create a lock-file named "LOCKFILE-$\{JOB_NAME}-$\{BUILD_NUMBER}" in "$\{JENKINS_HOME}/workspace/Central" and wait for it to be deleted, before succeeding or failing.
# (Optional) Alter the Test_1 and Test_2 Gerrit Trigger settings to listen to a specific repository. By default, both repo and branch are set to "**".
\\

After this set-up is done, upload a patchset to the Gerrit service that you configured Jenkins to listen to.

This will cause a build of "Test_1" and "Test_2" to be started.
Now, *while they are running* (remember that they will wait until you delete their lock-file), go into the "manual trigger" interface:
http://[jenkins]/gerrit_manual_trigger/?

Search for the changeset and issue a *retrigger*. This will cause two additional builds to spawn, one for "Test_1" and one for "Test_2".

Now that 4 tests are running, the correct behaviour of the Gerrit trigger plugin is to ignore the first two runs that have been started by the "Patchset Created" event.

Instead, it should wait until BOTH of the retriggered builds have finished and then upload a "Verified -1" message to Gerrit.

To test this, first delete the two original lock files:

$\{JENKINS_HOME}/workspace/Central/LOCKFILE-Test_1-1
$\{JENKINS_HOME}/workspace/Central/LOCKFILE-Test_2-1

This tests the first half of the problem. This works fine. After both have finished, the Gerrit Trigger plugin will still wait for the other two jobs to finish.

Now, to expose the bug, *ONLY* delete the lockfile for the re-triggered TEST_2 job:
$\{JENKINS_HOME}/workspace/Central/LOCKFILE-Test_2-2

This will let Test_2 finish successfully (as Test_2 always succeeds).
*The bug that happens now, is that the trigger plugin uploads a "Verified +1" message, DESPITE "Test_1 #2" not yet being finished.*

If you then also let "Test_1" finish, it uploads yet another verification, this time "Verified -1".

This behaviour is _absolutely_ wrong, as it sends a verification state without waiting for all jobs to complete, thus leading to *inconsistent results* in Gerrit.

Here's an excerpt of how this behaviour looks like in Gerrit:

{noformat}
##############################################################################
Jenkins
Patch Set 1: -Verified
Build Started Test_2/1/ (1/2)
-------------------------------------------------------
Jenkins
Patch Set 1:
Build Started Test_1/1/ (2/2)
-------------------------------------------------------
Jenkins
Patch Set 1:
Build Started Test_2/2/ (2/2)
-------------------------------------------------------
Jenkins
Patch Set 1:
Build Started Test_1/2/ (2/2)
-------------------------------------------------------
Jenkins
Patch Set 1: Verified+1
Build Successful
Test_2/2/ : SUCCESS
-------------------------------------------------------
Jenkins
Patch Set 1: Verified-1
Build Failed
Test_1/2/ : FAILURE
##############################################################################
{noformat}

You can see two bugs at work here:

1.) The plugin gets confused when counting the number of jobs started
2.) The plugin does not wait for (Test_1 #2) after (Test_2 #2) has finished.

New: There is a serious issue with the Gerrit trigger plugin and its "retrigger" functionality.

h4. Problem

This issue causes the plugin to get confused about how many and which builds it started for a particular changeset/patchset combination.

This behaviour leads to inconsistent verification results in Gerrit, because *it fails wait for all tests to complete.*

h4. Replication instructions

# Start a blank, vanilla Jenkins server of an arbitrary version on your local host (the jobs below assume that a BASH shell is available).
# Install the gerrit-trigger plugin (arbitrary version, as all are affected)
# Increase the executor count of the "master" node to at least 6.
# Restart the server and create a connection to a Gerrit service.
# Shut down the server and extract the two jobs (Test_1 and Test_2) into the "jobs" directory of the Jenkins server.
#* Test_1 will always fail, Test_2 will always succeed. Both will create a lock-file named "LOCKFILE-$\{JOB_NAME}-$\{BUILD_NUMBER}" in "$\{JENKINS_HOME}/workspace/Central" and wait for it to be deleted, before succeeding or failing.
# (Optional) Alter the Test_1 and Test_2 Gerrit Trigger settings to listen to a specific repository. By default, both repo and branch are set to "**".
\\

After this set-up is done, upload a patchset to the Gerrit service that you configured Jenkins to listen to.

This will cause a build of "Test_1" and "Test_2" to be started.
Now, *while they are running* (remember that they will wait until you delete their lock-file), go into the "manual trigger" interface:
http://[jenkins]/gerrit_manual_trigger/?

Search for the changeset and issue a *retrigger*. This will cause two additional builds to spawn, one for "Test_1" and one for "Test_2".

Now that 4 tests are running, the correct behaviour of the Gerrit trigger plugin is to ignore the first two runs that have been started by the "Patchset Created" event.

Instead, it should wait until BOTH of the retriggered builds have finished and then upload a "Verified -1" message to Gerrit.

To test this, first delete the two original lock files:

$\{JENKINS_HOME}/workspace/Central/LOCKFILE-Test_1-1
$\{JENKINS_HOME}/workspace/Central/LOCKFILE-Test_2-1

This tests the first half of the problem. This works fine. After both have finished, the Gerrit Trigger plugin will still wait for the other two jobs to finish.

Now, to expose the bug, *ONLY* delete the lockfile for the re-triggered TEST_2 job:
$\{JENKINS_HOME}/workspace/Central/LOCKFILE-Test_2-2

This will let Test_2 finish successfully (as Test_2 always succeeds).
*The bug that happens now, is that the trigger plugin uploads a "Verified +1" message, DESPITE "Test_1 #2" not yet being finished.*

If you then also let "Test_1" finish, it uploads yet another verification, this time "Verified -1".

This behaviour is _absolutely_ wrong, as it sends a verification state without waiting for all jobs to complete, thus leading to *inconsistent results* in Gerrit.

h4. Example output in Gerrit, when error occurs

Here's an excerpt of how this behaviour looks like in Gerrit:

{noformat}
##############################################################################
Jenkins
Patch Set 1: -Verified
Build Started Test_2/1/ (1/2)
-------------------------------------------------------
Jenkins
Patch Set 1:
Build Started Test_1/1/ (2/2)
-------------------------------------------------------
Jenkins
Patch Set 1:
Build Started Test_2/2/ (2/2)
-------------------------------------------------------
Jenkins
Patch Set 1:
Build Started Test_1/2/ (2/2)
-------------------------------------------------------
Jenkins
Patch Set 1: Verified+1
Build Successful
Test_2/2/ : SUCCESS
-------------------------------------------------------
Jenkins
Patch Set 1: Verified-1
Build Failed
Test_1/2/ : FAILURE
##############################################################################
{noformat}

You can see two bugs outline above at work here:

1.) The plugin gets confused when counting the number of jobs started
2.) The plugin does not wait for (Test_1 #2) after (Test_2 #2) has finished.

[EDIT: Improved text formatting. Now the report is more than just a wall of text]

Martin Schröder made changes - 2014-08-27 13:17

Description

Original: There is a serious issue with the Gerrit trigger plugin and its "retrigger" functionality.

h4. Problem

This issue causes the plugin to get confused about how many and which builds it started for a particular changeset/patchset combination.

This behaviour leads to inconsistent verification results in Gerrit, because *it fails wait for all tests to complete.*

h4. Replication instructions

# Start a blank, vanilla Jenkins server of an arbitrary version on your local host (the jobs below assume that a BASH shell is available).
# Install the gerrit-trigger plugin (arbitrary version, as all are affected)
# Increase the executor count of the "master" node to at least 6.
# Restart the server and create a connection to a Gerrit service.
# Shut down the server and extract the two jobs (Test_1 and Test_2) into the "jobs" directory of the Jenkins server.
#* Test_1 will always fail, Test_2 will always succeed. Both will create a lock-file named "LOCKFILE-$\{JOB_NAME}-$\{BUILD_NUMBER}" in "$\{JENKINS_HOME}/workspace/Central" and wait for it to be deleted, before succeeding or failing.
# (Optional) Alter the Test_1 and Test_2 Gerrit Trigger settings to listen to a specific repository. By default, both repo and branch are set to "**".
\\

After this set-up is done, upload a patchset to the Gerrit service that you configured Jenkins to listen to.

This will cause a build of "Test_1" and "Test_2" to be started.
Now, *while they are running* (remember that they will wait until you delete their lock-file), go into the "manual trigger" interface:
http://[jenkins]/gerrit_manual_trigger/?

Search for the changeset and issue a *retrigger*. This will cause two additional builds to spawn, one for "Test_1" and one for "Test_2".

Now that 4 tests are running, the correct behaviour of the Gerrit trigger plugin is to ignore the first two runs that have been started by the "Patchset Created" event.

Instead, it should wait until BOTH of the retriggered builds have finished and then upload a "Verified -1" message to Gerrit.

To test this, first delete the two original lock files:

$\{JENKINS_HOME}/workspace/Central/LOCKFILE-Test_1-1
$\{JENKINS_HOME}/workspace/Central/LOCKFILE-Test_2-1

This tests the first half of the problem. This works fine. After both have finished, the Gerrit Trigger plugin will still wait for the other two jobs to finish.

Now, to expose the bug, *ONLY* delete the lockfile for the re-triggered TEST_2 job:
$\{JENKINS_HOME}/workspace/Central/LOCKFILE-Test_2-2

This will let Test_2 finish successfully (as Test_2 always succeeds).
*The bug that happens now, is that the trigger plugin uploads a "Verified +1" message, DESPITE "Test_1 #2" not yet being finished.*

If you then also let "Test_1" finish, it uploads yet another verification, this time "Verified -1".

This behaviour is _absolutely_ wrong, as it sends a verification state without waiting for all jobs to complete, thus leading to *inconsistent results* in Gerrit.

h4. Example output in Gerrit, when error occurs

Here's an excerpt of how this behaviour looks like in Gerrit:

{noformat}
##############################################################################
Jenkins
Patch Set 1: -Verified
Build Started Test_2/1/ (1/2)
-------------------------------------------------------
Jenkins
Patch Set 1:
Build Started Test_1/1/ (2/2)
-------------------------------------------------------
Jenkins
Patch Set 1:
Build Started Test_2/2/ (2/2)
-------------------------------------------------------
Jenkins
Patch Set 1:
Build Started Test_1/2/ (2/2)
-------------------------------------------------------
Jenkins
Patch Set 1: Verified+1
Build Successful
Test_2/2/ : SUCCESS
-------------------------------------------------------
Jenkins
Patch Set 1: Verified-1
Build Failed
Test_1/2/ : FAILURE
##############################################################################
{noformat}

You can see two bugs outline above at work here:

1.) The plugin gets confused when counting the number of jobs started
2.) The plugin does not wait for (Test_1 #2) after (Test_2 #2) has finished.

[EDIT: Improved text formatting. Now the report is more than just a wall of text]

New: There is a serious issue with the Gerrit trigger plugin and its "retrigger" functionality.

h4. Problem

This issue causes the plugin to get confused about how many and which builds it started for a particular changeset/patchset combination.

This behaviour leads to inconsistent verification results in Gerrit, because *it fails to wait for all tests to complete.*

h4. Replication instructions

# Start a blank, vanilla Jenkins server of an arbitrary version on your local host (the jobs below assume that a BASH shell is available).
# Install the gerrit-trigger plugin (arbitrary version, as all are affected)
# Increase the executor count of the "master" node to at least 6.
# Restart the server and create a connection to a Gerrit service.
# Shut down the server and extract the two jobs (Test_1 and Test_2) into the "jobs" directory of the Jenkins server.
#* Test_1 will always fail, Test_2 will always succeed. Both will create a lock-file named "LOCKFILE-$\{JOB_NAME}-$\{BUILD_NUMBER}" in "$\{JENKINS_HOME}/workspace/Central" and wait for it to be deleted, before succeeding or failing.
# (Optional) Alter the Test_1 and Test_2 Gerrit Trigger settings to listen to a specific repository. By default, both repo and branch are set to "**".
\\

After this set-up is done, upload a patchset to the Gerrit service that you configured Jenkins to listen to.

This will cause a build of "Test_1" and "Test_2" to be started.
Now, *while they are running* (remember that they will wait until you delete their lock-file), go into the "manual trigger" interface:
http://[jenkins]/gerrit_manual_trigger/?

Search for the changeset and issue a *retrigger*. This will cause two additional builds to spawn, one for "Test_1" and one for "Test_2".

Now that 4 tests are running, the correct behaviour of the Gerrit trigger plugin is to ignore the first two runs that have been started by the "Patchset Created" event.

Instead, it should wait until BOTH of the retriggered builds have finished and then upload a "Verified -1" message to Gerrit.

To test this, first delete the two original lock files:

$\{JENKINS_HOME}/workspace/Central/LOCKFILE-Test_1-1
$\{JENKINS_HOME}/workspace/Central/LOCKFILE-Test_2-1

This tests the first half of the problem. This works fine. After both have finished, the Gerrit Trigger plugin will still wait for the other two jobs to finish.

Now, to expose the bug, *ONLY* delete the lockfile for the re-triggered TEST_2 job:
$\{JENKINS_HOME}/workspace/Central/LOCKFILE-Test_2-2

This will let Test_2 finish successfully (as Test_2 always succeeds).
*The bug that happens now, is that the trigger plugin uploads a "Verified +1" message, DESPITE "Test_1 #2" not yet being finished.*

If you then also let "Test_1" finish, it uploads yet another verification, this time "Verified -1".

This behaviour is _absolutely_ wrong, as it sends a verification state without waiting for all jobs to complete, thus leading to *inconsistent results* in Gerrit.

h4. Example output in Gerrit, when error occurs

Here's an excerpt of how this behaviour looks like in Gerrit:

{noformat}
##############################################################################
Jenkins
Patch Set 1: -Verified
Build Started Test_2/1/ (1/2)
-------------------------------------------------------
Jenkins
Patch Set 1:
Build Started Test_1/1/ (2/2)
-------------------------------------------------------
Jenkins
Patch Set 1:
Build Started Test_2/2/ (2/2)
-------------------------------------------------------
Jenkins
Patch Set 1:
Build Started Test_1/2/ (2/2)
-------------------------------------------------------
Jenkins
Patch Set 1: Verified+1
Build Successful
Test_2/2/ : SUCCESS
-------------------------------------------------------
Jenkins
Patch Set 1: Verified-1
Build Failed
Test_1/2/ : FAILURE
##############################################################################
{noformat}

You can see two bugs outline above at work here:

1.) The plugin gets confused when counting the number of jobs started
2.) The plugin does not wait for (Test_1 #2) after (Test_2 #2) has finished.

[EDIT: Improved text formatting. Now the report is more than just a wall of text]

rin_ne added a comment - 2014-08-28 07:28

Sorry, I missed attachment and fully understand.

BTW, it seems that this plugin has some useful logs around the related logic.
So I think you had better provide logs if you have.

rin_ne added a comment - 2014-08-28 07:28 Sorry, I missed attachment and fully understand. BTW, it seems that this plugin has some useful logs around the related logic. So I think you had better provide logs if you have.

rin_ne added a comment - 2014-08-28 08:24 - edited

I reproduced this on the latest plugin code but there is a bit difference with what you mentioned.

UPDATE:
The below results are in condition: jobs has one completed build as "#1".
So "#2" is triggered build by event, "#3" is triggered build by manual trigger.

This will let Test_2 finish successfully (as Test_2 always succeeds).
The bug that happens now, is that the trigger plugin uploads a "Verified +1" message, DESPITE "Test_1 #2" not yet being finished.

When LOCKFILE_Test_2-3 was removed, any message was not uploaded.

Next, LOCKFILE_Test_1-2 was removed. Then message "Verified -1" was uploaded from "Test_1 #2". Also message "Verified +1" was uploaded when LOCKFILE_Test_2-2 was removed.

Finally, LOCKFILE_Test_1-3 was removed. Then message "Verified -1" was uploaded with verdict value both "Test_1 #3" and "Test_2 #3". It means that uploading message for "Test_2 #3" was pended until "Test_1 #3" is completed.

It means that:

Running builds for event from Gerrit were deviated from verdict logic by manual triggered ones.
But these were still enabled as individual triggered build.

So I think that the issue you should address is #2. right?

rin_ne added a comment - 2014-08-28 08:24 - edited I reproduced this on the latest plugin code but there is a bit difference with what you mentioned. UPDATE: The below results are in condition: jobs has one completed build as "#1". So "#2" is triggered build by event, "#3" is triggered build by manual trigger. This will let Test_2 finish successfully (as Test_2 always succeeds). The bug that happens now, is that the trigger plugin uploads a "Verified +1" message, DESPITE "Test_1 #2" not yet being finished. When LOCKFILE_Test_2-3 was removed, any message was not uploaded. Next, LOCKFILE_Test_1-2 was removed. Then message "Verified -1" was uploaded from "Test_1 #2". Also message "Verified +1" was uploaded when LOCKFILE_Test_2-2 was removed. Finally, LOCKFILE_Test_1-3 was removed. Then message "Verified -1" was uploaded with verdict value both "Test_1 #3" and "Test_2 #3". It means that uploading message for "Test_2 #3" was pended until "Test_1 #3" is completed. It means that: Running builds for event from Gerrit were deviated from verdict logic by manual triggered ones. But these were still enabled as individual triggered build. So I think that the issue you should address is #2. right?

rin_ne added a comment - 2014-08-28 09:19 - edited

Sorry, missed one condition. So I will update.

In case jobs has no builds(means started as #1), NPE was happened.

INFO: Test_1 #1 main build action completed: FAILURE
Aug 28, 2014 6:13:06 PM com.sonyericsson.hudson.plugins.gerrit.trigger.gerritnotifier.ToGerritRunListener onCompleted
INFO: Obtained failure message: null
Aug 28, 2014 6:13:06 PM com.sonyericsson.hudson.plugins.gerrit.trigger.gerritnotifier.model.BuildMemory setEntryFailureMessage
INFO: Recording unsuccessful message for PatchsetCreated: Change: 7 PatchSet: 28: null
Aug 28, 2014 6:13:06 PM com.sonyericsson.hudson.plugins.gerrit.trigger.gerritnotifier.ToGerritRunListener allBuildsCompleted
INFO: All Builds are completed for cause: GerritCause: PatchsetCreated: Change: 7 PatchSet: 28 silent: false
Aug 28, 2014 6:13:06 PM com.sonyericsson.hudson.plugins.gerrit.trigger.gerritnotifier.GerritNotifier buildCompleted
SEVERE: Could not complete BuildCompleted notification!
java.lang.NullPointerException
	at com.sonyericsson.hudson.plugins.gerrit.trigger.gerritnotifier.ParameterExpander.createBuildsStats(ParameterExpander.java:554)
	at com.sonyericsson.hudson.plugins.gerrit.trigger.gerritnotifier.ParameterExpander.getBuildCompletedCommand(ParameterExpander.java:485)
	at com.sonyericsson.hudson.plugins.gerrit.trigger.gerritnotifier.GerritNotifier.buildCompleted(GerritNotifier.java:113)
	at com.sonyericsson.hudson.plugins.gerrit.trigger.gerritnotifier.job.ssh.BuildCompletedCommandJob.run(BuildCompletedCommandJob.java:64)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

But I think the issue in previous comment is potential one for this NPE.

rin_ne added a comment - 2014-08-28 09:19 - edited Sorry, missed one condition. So I will update. In case jobs has no builds(means started as #1), NPE was happened. INFO: Test_1 #1 main build action completed: FAILURE Aug 28, 2014 6:13:06 PM com.sonyericsson.hudson.plugins.gerrit.trigger.gerritnotifier.ToGerritRunListener onCompleted INFO: Obtained failure message: null Aug 28, 2014 6:13:06 PM com.sonyericsson.hudson.plugins.gerrit.trigger.gerritnotifier.model.BuildMemory setEntryFailureMessage INFO: Recording unsuccessful message for PatchsetCreated: Change: 7 PatchSet: 28: null Aug 28, 2014 6:13:06 PM com.sonyericsson.hudson.plugins.gerrit.trigger.gerritnotifier.ToGerritRunListener allBuildsCompleted INFO: All Builds are completed for cause: GerritCause: PatchsetCreated: Change: 7 PatchSet: 28 silent: false Aug 28, 2014 6:13:06 PM com.sonyericsson.hudson.plugins.gerrit.trigger.gerritnotifier.GerritNotifier buildCompleted SEVERE: Could not complete BuildCompleted notification! java.lang.NullPointerException at com.sonyericsson.hudson.plugins.gerrit.trigger.gerritnotifier.ParameterExpander.createBuildsStats(ParameterExpander.java:554) at com.sonyericsson.hudson.plugins.gerrit.trigger.gerritnotifier.ParameterExpander.getBuildCompletedCommand(ParameterExpander.java:485) at com.sonyericsson.hudson.plugins.gerrit.trigger.gerritnotifier.GerritNotifier.buildCompleted(GerritNotifier.java:113) at com.sonyericsson.hudson.plugins.gerrit.trigger.gerritnotifier.job.ssh.BuildCompletedCommandJob.run(BuildCompletedCommandJob.java:64) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang. Thread .run( Thread .java:745) But I think the issue in previous comment is potential one for this NPE.

Assignee:: rsandell

Reporter:: Martin Schröder

Votes:: 1 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2014-08-26 12:59

Updated:: 2014-09-04 07:35

Resolved:: 2014-09-04 07:35

Jenkins

Details

Description

Problem

Replication instructions

Example output in Gerrit, when error occurs

Attachments

Attachments

Activity

Collapse comment: rin_ne added a comment - 2014-08-27 10:47

Expand comment: rin_ne added a comment - 2014-08-27 10:47

Collapse comment: Martin Schröder added a comment - 2014-08-27 12:31, Edited by Martin Schröder - 2014-08-27 12:33

Expand comment: Martin Schröder added a comment - 2014-08-27 12:31, Edited by Martin Schröder - 2014-08-27 12:33

Collapse comment: rin_ne added a comment - 2014-08-28 07:28

Expand comment: rin_ne added a comment - 2014-08-28 07:28

Collapse comment: rin_ne added a comment - 2014-08-28 08:24, Edited by rin_ne - 2014-08-28 09:24

Expand comment: rin_ne added a comment - 2014-08-28 08:24, Edited by rin_ne - 2014-08-28 09:24

Collapse comment: rin_ne added a comment - 2014-08-28 09:19, Edited by rin_ne - 2014-08-28 09:39

Expand comment: rin_ne added a comment - 2014-08-28 09:19, Edited by rin_ne - 2014-08-28 09:39

People

Dates