-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
Jenkins 2.289.3
-
Powered by SuggestiMate
Hi,
Jenkins is deployed on EKS.
Everytime Jenkins is redployed, the past jobs are showing dated "1 Jan 1970".
All past jobs are in hanged mode, each time needing to abort from script console.
[JENKINS-66328] pipeline in Jenkins are showing like executed back to 1 Jan 1970
The build.xml for 207 shown above is:
<?xml version='1.1' encoding='UTF-8'?>
<flow-build plugin="workflow-job@1385.vb_58b_86ea_fff1">
<actions/>
<queueId>-1</queueId>
<timestamp>0</timestamp>
<startTime>0</startTime>
<result>FAILURE</result>
<duration>0</duration>
<keepLog>false</keepLog>
<completed>true</completed>
</flow-build>
This job definitely succeeded on the 17th. The log showing that is right there in the directory.
A restart of Jenkins on the 23rd has rewritten this file.
Can each of those that are affected by the problem please provide the detailed list of installed plugins and their versions?
https://github.com/jenkins-infra/helpdesk/issues/3934#issuecomment-1932685147 indicates that the datadog plugin 6.0.0 may be the cause of some of the issues, though it is not yet clear if it is the cause of all of the issues.
All comments
jeffrey_cai I have already provided that information. I install every update when it comes out. Since the last list I have removed the GitHub and the ShiningPanda plugins. I do not have the Datadog plugin.
There is some race condition, possibly related to Bitbucket Branch, that corrupts jobs rarely and at random after a restart, and build.xml gets overwritten on disk with an "empty" version. I have to manually backup and restore them. You will probably need a large history of many built jobs in order to reproduce.
Hi jameshowe thanks for your info.. this issue is a headache. As you mentioned below
The very next restart (for plugin updates) it happened again.
May I ask if you have the plugin diff in that "restart"? So I can make a cross-check on our side
jeffrey_cai it's not the plugin changes in the restart that cause it. Please read previous comments. Restarting with no changes also triggers it, but not every time.
But if you insist, it was lockable-resources, ldap, and json-path-api.
Restarting with no changes also triggers it
sorry I missed this point before, thank you very much. let me check our side.
Hi James, we noticed this plugin "lockable-resources" in your list (which is also used by us).
From their release notes, we noticed something about timestamp (but not sure if it's related), see
- https://github.com/jenkinsci/lockable-resources-plugin/releases
- fix version: 1245.vb_05f8a_4e28db_, Set the "reserved" timestamp when stealing a lock as well (#637)
May I ask what's your version of this plugin? is it before the above fixed-version? if it's older, do you think upgrading to this version could help?
jeffrey_cai it would save us both time if you read the previous comments. All plugins are the latest version. That change item has nothing to do with the issue.
Thank you James, and sorry for the duplicated question. I saw you were using "Lockable Resources plugin (lockable-resources): 1232.v512d6c434eb_d". I have a guess (only a guess) that maybe bumping to `1245.vb_05f8a_4e28db_` could help.
I will update here later after more testing from our side. thanks again
Hi all. Jumping in, because we've just came across this issue, too, and also at the updating plugins stage. I hope a bit of a longer description of what we've seen is alright.
We updated the plugins, restarted Jenkins (version 2.456), and then found that some (or maybe even all?) jobs had an amount of their recent builds look to be in a "running state" and dated at 1970, as per the picture attached. We looked into a few of those builds and their logs, and they weren't actually running, the logs were normal and from their actual run. We did this on the 1st of May, and the affected builds happened to be from the 4th of April, around the same time of the day, onwards.
The builds weren't affected at the time of the restart itself, but at the time of even slight attempt at accessing them, even the view page, i.e. for example `/view/staging` , etc. If I don't click onto the next view, I can check one of that other view's job's build's build.xml in the terminal and it's good. When the view is changed, the build.xml files for the April 4th and later builds get updated and the builds in UI look to be "running".
The build.xml gets actually changed - I took a copy of a couple, and for example one turned from 291 lines to 57 lines. Within the <action> section, it looks like all the plugins lines get removed; the <execution> and <checkouts> sections get entirely deleted out of the file; and the basic info like QueueId, timestamp, startTime, result, duration, charset, etc, just become these 4:
<queueId>-1</queueId> <timestamp>0</timestamp> <startTime>0</startTime> <duration>0</duration>
After a bit of time or another restart, the affected builds turned into looking like a failed state and they stayed like that.
In our case, though, the culprit was the Datadog plugin - https://github.com/jenkinsci/datadog-plugin - and upgrading it from 6.0.2 to 7.0.0.
Initially we upgraded all plugins at once, the above issue happened, so we rolled back (we had a pre-upgrade snapshot), and then I did a little research into the plugins till I narrowed it down to a few names, which then I tried updating on a non-production instance one by one. The Datadog plugin was what caused the problem happen again.
I did consider the Lockable Resources plugin that was also mentioned in the comments here, but neither version of it caused any issue.
The Datadog changes between the 6.0.2 version of the plugin and the current: https://github.com/jenkinsci/datadog-plugin/compare/728ed26b1e7a360fc1fb8674b3cbf40088fd84b0..master
One of the 7.0.0 version fixes is actually "Fix queue time calculation for builds and pipeline steps.", so it could be related. As of 6.0.3 in their pom.xml at <scm> section it also turns from version 6.0.2 to 3.0.0 which is a bit confusing. (This 3.0.0 seems like it always happens between versions... a bit strange, but okay?)
The above bits could be the only culprits in our case, or it could be the above's interaction with another part that goes badly.
Our solution will probably just be removing the Datadog plugin, because we don't really use it, in the past it just seemed like a good idea to have and then it never happened...
thank you for sharing your troubleshooting details, we stumbled on this bug as well after upgrading Core 2.387.3 -> 2.440.3. We also upgraded about 90 plugins as part of that which I can provide.
We isolated the issue to the plugin allure-plugin 2.30.1 which wasn't upgraded. We also tried installing the latest version which didn't help. We saw the bug on the builds that were using the plugin and didn't see the bug on builds that didn't use it on the same job. It seems like there is some incompatibility between 2.440.3 Jenkins core and some plugins messing with build.xml
I wanted to share in case it could help reproduce or bisect the bug
jeffrey_cai , James had stated earlier that he does not have the Datadog plugin installed.