-
Bug
-
Resolution: Unresolved
-
Major
-
jenkins-1.509.3
-
Powered by SuggestiMate
We submit about 120 parallel builds with 15sec execution time => issue may be caused by frequent log rotation without locks.
Jobs seem to be OK, but there is a lot of SEVERE messages in the Jenkins log.
SEVERE: Executor threw an exception
java.util.NoSuchElementException
at com.google.common.collect.AbstractIterator.next(AbstractIterator.java:154)
at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1067)
at java.util.AbstractMap$2$1.next(AbstractMap.java:385)
at hudson.util.RunList.subList(RunList.java:143)
at hudson.tasks.LogRotator.perform(LogRotator.java:119)
at hudson.model.Job.logRotate(Job.java:404)
at hudson.model.Run.execute(Run.java:1655)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:237)ct.AbstractIterator.next(AbstractIterator.java:154)
[JENKINS-19743] Massive parallel builds sometimes cause errors in LogRotation
We’ve find out that following source changes on Jenkins version 1.606 are a workaround to handle the issue.
1. Adaption on Run.java on the method delete()
/** * Deletes this build and its entire log * * @throws IOException * if we fail to delete. */ public void delete() throws IOException { File rootDir = getRootDir(); if (!rootDir.isDirectory()) { LOGGER.log(Level.WARNING, "IOException: " + rootDir + " looks to have already been deleted; siblings: " + Arrays.toString(project.getBuildDir().list())); //throw new IOException(this + ": " + rootDir + " looks to have already been deleted; siblings: " + Arrays.toString(project.getBuildDir().list())); } RunListener.fireDeleted(this); synchronized (this) { // avoid holding a lock while calling plugin impls of onDeleted File tmp = new File(rootDir.getParentFile(),'.'+rootDir.getName()); if (tmp.exists()) { Util.deleteRecursive(tmp); } // TODO on Java 7 prefer: Files.move(rootDir.toPath(), tmp.toPath(), StandardCopyOption.ATOMIC_MOVE) boolean renamingSucceeded = rootDir.renameTo(tmp); Util.deleteRecursive(tmp); // some user reported that they see some left-over .xyz files in the workspace, // so just to make sure we've really deleted it, schedule the deletion on VM exit, too. if(tmp.exists()) tmp.deleteOnExit(); if(!renamingSucceeded) { LOGGER.log(Level.WARNING, rootDir+" is in use"); //throw new IOException(rootDir+" is in use"); } LOGGER.log(FINE, "{0}: {1} successfully deleted", new Object[] {this, rootDir}); removeRunFromParent(); } }
2. Adaption on LogRotator.java on the method perform(Job<?,?> job)
public void perform(Job<?,?> job) throws IOException, InterruptedException { LOGGER.log(FINE, "Running the log rotation for {0} with numToKeep={1} daysToKeep={2} artifactNumToKeep={3} artifactDaysToKeep={4}", new Object[] {job, numToKeep, daysToKeep, artifactNumToKeep, artifactDaysToKeep}); // always keep the last successful and the last stable builds Run lsb = job.getLastSuccessfulBuild(); Run lstb = job.getLastStableBuild(); if(numToKeep!=-1) { // Note that RunList.size is deprecated, and indeed here we are loading all the builds of the job. // However we would need to load the first numToKeep anyway, just to skip over them; // and we would need to load the rest anyway, to delete them. // (Using RunMap.headMap would not suffice, since we do not know if some recent builds have been deleted for other reasons, // so simply subtracting numToKeep from the currently last build number might cause us to delete too many.) try { List<? extends Run<?,?>> builds = job.getBuilds(); for (Run r : copy(builds.subList(Math.min(builds.size(), numToKeep), builds.size()))) { if (shouldKeepRun(r, lsb, lstb)) { continue; } LOGGER.log(FINE, "{0} is to be removed", r); r.delete(); } } catch(Exception e) { LOGGER.log(FINE, "subList creating failed", e); } }
Hi Daniel,
this is not a fix only a working workaround for the issue. See attached Run.txt and LogRotator.txt file for diff patch.
Using Jenkins ver. 1.636, we are seeing the following exception on a job allowing concurrent builds and having lots of short jobs:
hudson.model.Run execute
SEVERE: Failed to rotate log
java.util.NoSuchElementException
at jenkins.model.lazy.LazyLoadRunMapEntrySet$1.next(LazyLoadRunMapEntrySet.java:76)
at jenkins.model.lazy.LazyLoadRunMapEntrySet$1.next(LazyLoadRunMapEntrySet.java:63)
at java.util.AbstractMap$2$1.next(AbstractMap.java:396)
at hudson.util.RunList.subList(RunList.java:139)
at hudson.tasks.LogRotator.perform(LogRotator.java:125)
at hudson.model.Job.logRotate(Job.java:467)
at hudson.model.Run.execute(Run.java:1805)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:98)
at hudson.model.Executor.run(Executor.java:410)
We are also seeing this one (reported above as well). So it seems log rotation for concurrent jobs has a race condition.
Nov 06, 2015 6:01:49 AM hudson.model.Run execute
SEVERE: Failed to rotate log
java.io.IOException:Redacted #173213: /redacted/builds/173213 looks to have already been deleted; siblings: [....lots of job ids....]
at hudson.model.Run.delete(Run.java:1483)
at hudson.tasks.LogRotator.perform(LogRotator.java:144)
at hudson.model.Job.logRotate(Job.java:467)
at hudson.model.Run.execute(Run.java:1805)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:98)
at hudson.model.Executor.run(Executor.java:410)
I am seeing the same (SEVERE: Failed to rotate log), on 1.625.3 (most recent LTS).
Once the SEVERE log entries start, Jenkins service starts to use all CPU it can get and becomes very slow/unresponsive.
We see this on 1.611. We too have 100s of parallel jobs running. This pollutes the logs with scary messages. Not sure if it actually affects us apart from logs, but it would be nice to see this resolved in case it is the cause of the occasional unexplained problem.
I am getting these errors on the machine we build pull requests on:
Nov 09, 2017 10:11:53 AM SEVERE hudson.model.Run executeFailed to rotate logjava.io.IOException: My-Build #1000: /var/lib/jenkins/jobs/My-Build/builds/1000 looks to have already been deleted; siblings: [1021, 1027, 1047, 1091, 1010, 1034, 1040, 1013, 1041, 1083, 1016, 1049, 1030, 1070, 1004, 1099, 1073, 1024, 1009, 1084, 1039, 1001, 1094, 1100, 1057, 1003, 1007, 1052, 1065, lastSuccessfulBuild, 1045, 1026, 1022, 1061, 1054, 1044, 1093, .1000, 1087, 1063, 1072, 1018, 1096, 1074, 1019, lastUnstableBuild, 1092, 1031, 1033, 1005, 1051, 1043, 1068, 1075, 1095, 1079, 1036, 1032, 1029, 1048, 1042, legacyIds, lastFailedBuild, lastUnsuccessfulBuild, 1025, 1078, 1080, 1090, 1046, 1069, 999, 1014, 1020, lastStableBuild, 1067, 1053, 1028, 1002, 1064, 1059, 1082, 1056, 1017, 1071, 1077, 1097, 1037, 1086, 1076, 1008, 1006, 1081, 1088, 1058, 998, 1023, 1060, 1050, 1012, 1062, 1066, 1038, 1098, 1085, 1055, 1015, 1011, 1089, 1035] at hudson.model.Run.delete(Run.java:1483) at hudson.tasks.LogRotator.perform(LogRotator.java:131) at hudson.model.Job.logRotate(Job.java:474) at hudson.model.Run.execute(Run.java:1784) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43) at hudson.model.ResourceController.execute(ResourceController.java:98) at hudson.model.Executor.run(Executor.java:404)
Nov 09, 2017 10:11:53 AM SEVERE hudson.model.Run executeFailed to rotate logjava.io.IOException: /var/lib/jenkins/jobs/My-Build/builds/1000 is in use at hudson.model.Run.delete(Run.java:1503) at hudson.tasks.LogRotator.perform(LogRotator.java:131) at hudson.model.Job.logRotate(Job.java:474) at hudson.model.Run.execute(Run.java:1784) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43) at hudson.model.ResourceController.execute(ResourceController.java:98) at hudson.model.Executor.run(Executor.java:404)
I have same problem:
2021-03-28 03:25:07.984+0000 [id=1870887] SEVERE hudson.model.Run#execute: Failed to rotate log java.util.NoSuchElementException at jenkins.model.lazy.LazyLoadRunMapEntrySet$1.next(LazyLoadRunMapEntrySet.java:76) at jenkins.model.lazy.LazyLoadRunMapEntrySet$1.next(LazyLoadRunMapEntrySet.java:63) at java.util.AbstractMap$2$1.next(AbstractMap.java:418) at hudson.util.RunList.subList(RunList.java:154) at hudson.tasks.LogRotator.perform(LogRotator.java:160) at hudson.model.Job.logRotate(Job.java:469) at hudson.model.Run.execute(Run.java:1971) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43) at hudson.model.ResourceController.execute(ResourceController.java:97) at hudson.model.Executor.run(Executor.java:429)
statistics in /monitoring (JavaMelody) says:
12 hits/min on 15 errors, 500k same errors per month
Debian 9, Jenkins 2.263.4
maybe there is a way to at least disable this type of error?
with an increase in the number of saved logs, I get a large load on the CPU (I do not know whether it is connected with this or not)
We've built a Jenkins test project where we can reproduce the issue. Following requirements and project setup describes how you we can reproduce the error deterministically.
Software requirements:
Test environment:
Jenkins projects
We have two Jenkins Build Flow projects:
Both are executing the same StressTest_SubJob.
Sample command code for the build section (Windows batch):
See also the attached C# script BuildFlowPluginTest.cs code.