I'll re-post the relevant part of IRC-woes log below:
[09:31:06] <jimklimov> huh, swarm agents say: "No valid crumb was included in the request" and refuse to start, what gives? :\ 2.239 weekly and 3.21 swarm
[09:37:27] <jimklimov> uh, i'm relieved - after server restart completed, agents connected
[09:37:41] <jimklimov> except those older ones not yet updated :\
[11:44:58] <jimklimov> hm, after that upgrade there is another problem with agents: the static SSH agent lines blinked up and down in the JENKINS_URL/computer page and ultimately disappeared - both from the details listing (free space, clock, etc.) and from the left column (builds running or idling)
[11:45:35] <jimklimov> If I ask to make a New Node as a copy of existing one, those AWOL SSH agents are listed as something to copy.
[11:46:55] <jimklimov> however a JENKINS_URL/computer/AGENTNAME returns an HTTP-404
[11:50:01] <jimklimov> in the jenkins.log I see several dozen of such entries for 10 minutes following Jenkins JVM startup:
[11:50:01] <jimklimov> 2020-06-05 07:31:56.010+0000 [id=30] WARNING hudson.model.AbstractCIBase#updateComputer: Error updating node master-worker-old, continuing
[11:50:01] <jimklimov> java.lang.IllegalStateException: The class jenkins.security.QueueItemAuthenticatorConfiguration was not found, potentially not yet loaded
[11:52:27] <jimklimov> amounts vary, from about 40 to 200 per agent name that I checked
[12:01:18] <jimklimov> this class is also mentioned in other cases:
[12:01:18] <jimklimov> 2020-06-05 09:52:00.616+0000 [id=320664] WARNING j.util.AtmostOneTaskExecutor$1#call
[12:01:18] <jimklimov> java.lang.IllegalStateException: The class jenkins.security.QueueItemAuthenticatorConfiguration was not found, potentially not yet loaded
[12:01:18] <jimklimov> and a lot of (probably from UI driven status refreshes?) same class in these combos of lines: https://pastebin.com/MFvQukr3
[12:10:32] <jimklimov> for the agents' connections, the fuller stack traces are here : https://pastebin.com/siJbMdWP
[12:10:59] <jimklimov> but overall the common culprit seems to be "java.lang.IllegalStateException: The class jenkins.security.QueueItemAuthenticatorConfiguration was not found, potentially not yet loaded" in different cases
[12:54:08] <jimklimov> back to (missing) jenkins.security.QueueItemAuthenticatorConfiguration issues above - I wonder if it is also linked to our detected workers still doing nothing, and queue growing? it says that required labels for jobs are not served by any agent... but the list in that comment does not include the actual workers that do serve the label. Some other jobs are "waiting for available executor on X" and X has dozens free executors. Actually all of them since nothing runs.
[13:07:34] <jimklimov> and back to our usual programming: I restarted this instance with plugins again updated (RBAC) and see new logged errors I did not notice before: lots of the likes of:
[13:07:34] <jimklimov> 2020-06-05 11:04:41.529+0000 [id=20] WARNING h.ExtensionFinder$GuiceFinder$FaultTolerantScope$1#error: Failed to instantiate Key[type=io.jenkins.plugins.analysis.warnings.Pmd$Descriptor, annotation=[none]]; skipping this component
[13:07:34] <jimklimov> java.lang.NoClassDefFoundError: Could not initialize class net.sourceforge.pmd.lang.LanguageRegistry
[13:07:34] <jimklimov> and
[13:07:34] <jimklimov> 2020-06-05 11:04:41.526+0000 [id=20] WARNING h.ExtensionFinder$GuiceFinder$FaultTolerantScope$1#error: Failed to instantiate Key[type=jenkins.security.QueueItemAuthenticatorConfiguration, annotation=[none]]; skipping this component
[13:07:34] <jimklimov> java.lang.IllegalStateException: Singleton is called recursively returning different results
[13:08:00] <jimklimov> I wonder if the latter is why that class won't load
[13:08:24] <jimklimov> wild guess: can it be that different plugins and/or core deliver actively conflicting versions of some libs?
[13:09:54] <jimklimov> a discussion at https://issues.jenkins-ci.org/browse/JENKINS-61990 looks inconclusive, but relevant
[13:09:55] <jenkins-admin> JENKINS-61990:StackOverflowError on boot related to QueueItemAuthenticatorConfiguration (Open) https://issues.jenkins-ci.org/browse/JENKINS-61990
[13:28:32] <jimklimov> removing queue.xml and restarting... saw some newbies in the mile-long stacktraces...
[13:28:32] <jimklimov> 2020-06-05 11:25:53.700+0000 [id=98] WARNING h.i.i.InstallUncaughtExceptionHandler#handleException: Caught unhandled exception with ID 3dffb911-e981-4a52-b2d0-be0cd7bf6c80
[13:28:32] <jimklimov> java.lang.AssertionError: class hudson.security.csrf.DefaultCrumbIssuer is missing its descriptor
[13:28:32] <jimklimov> and a more detailed
[13:28:32] <jimklimov> at com.thoughtworks.xstream.core.TreeUnmarshaller.convert(TreeUnmarshaller.java:72)
[13:28:32] <jimklimov> Caused: com.google.inject.ProvisionException: Unable to provision, see the following errors:
[13:28:32] <jimklimov> 1) Error injecting constructor, java.lang.NoClassDefFoundError: Could not initialize class net.sourceforge.pmd.lang.LanguageRegistry
[13:28:33] <jimklimov> at io.jenkins.plugins.analysis.warnings.Pmd$Descriptor.<init>(Pmd.java:61)
[13:28:33] <jimklimov> 1 error
[13:28:34] <jimklimov> at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:52)
[13:28:34] <jimklimov> at com.google.inject.internal.SingletonScope$1.get(SingletonScope.java:145)
[13:30:03] <jimklimov> oh, wow: 2020-06-05 11:29:17.842+0000 [id=164] SEVERE h.triggers.SCMTrigger$Runner#runPolling: Failed to record SCM polling for org.jenkinsci.plugins.workflow.job.WorkflowJob at 110d8469[42ity-github/czmq/v3 dot 0.2-FTY] // java.lang.NoClassDefFoundError: Could not initialize class hudson.FilePath
[13:52:33] <jimklimov> what the snap is happening there? :\
[13:52:33] <jimklimov> java.lang.IllegalStateException: Jenkins.instance is missing. Read the documentation of Jenkins.getInstanceOrNull to see what you are doing wrong.
[14:07:22] <jimklimov> another try goes by... moved away org.jenkinsci.plugins.workflow.flow.FlowExecutionList.xml.bak-20200605 org.jenkinsci.plugins.pipeline.milestone.MilestoneStep.xml.bak-20200605 queue.xml.bak-20200605 and shut down the reverse proxy that exposes Jenkins so it is not confused by the clients and agents until ready...
[14:09:23] <jimklimov> also in jenkins.security.QueueItemAuthenticatorConfiguration.xml set the all the strategies to "true" (flipped org.jenkinsci.plugins.authorizeproject.strategy.AnonymousAuthorizationStrategy and org.jenkinsci.plugins.authorizeproject.strategy.SpecificUsersAuthorizationStrategy from false they had)
[14:10:58] <jimklimov> and recovered a config.xml - something chopped a few XML tags from it between reboots
[14:12:22] <jimklimov> and it's up, and a lot faster than usual data:image/s3,"s3://crabby-images/01422/01422d15214099b2f7dcd666a47a44e98626571b" alt=""
[14:41:51] <jimklimov> so that good start was with a reverted 2.238 war... activated the 2.239 - and hell broke loose again, reverted to 2.238 - still broken. Removed the newly appeared queue and workflow xmls, and 2.239 seens to be starting well. So something is seriously borked with that queue persistence :\
I'll re-post the relevant part of IRC-woes log below:
[09:31:06] <jimklimov> huh, swarm agents say: "No valid crumb was included in the request" and refuse to start, what gives? :\ 2.239 weekly and 3.21 swarm
[09:37:27] <jimklimov> uh, i'm relieved - after server restart completed, agents connected
[09:37:41] <jimklimov> except those older ones not yet updated :\
[11:44:58] <jimklimov> hm, after that upgrade there is another problem with agents: the static SSH agent lines blinked up and down in the JENKINS_URL/computer page and ultimately disappeared - both from the details listing (free space, clock, etc.) and from the left column (builds running or idling)
[11:45:35] <jimklimov> If I ask to make a New Node as a copy of existing one, those AWOL SSH agents are listed as something to copy.
[11:46:55] <jimklimov> however a JENKINS_URL/computer/AGENTNAME returns an HTTP-404
[11:50:01] <jimklimov> in the jenkins.log I see several dozen of such entries for 10 minutes following Jenkins JVM startup:
[11:50:01] <jimklimov> 2020-06-05 07:31:56.010+0000 [id=30] WARNING hudson.model.AbstractCIBase#updateComputer: Error updating node master-worker-old, continuing
[11:50:01] <jimklimov> java.lang.IllegalStateException: The class jenkins.security.QueueItemAuthenticatorConfiguration was not found, potentially not yet loaded
[11:52:27] <jimklimov> amounts vary, from about 40 to 200 per agent name that I checked
[12:01:18] <jimklimov> this class is also mentioned in other cases:
[12:01:18] <jimklimov> 2020-06-05 09:52:00.616+0000 [id=320664] WARNING j.util.AtmostOneTaskExecutor$1#call
[12:01:18] <jimklimov> java.lang.IllegalStateException: The class jenkins.security.QueueItemAuthenticatorConfiguration was not found, potentially not yet loaded
[12:01:18] <jimklimov> and a lot of (probably from UI driven status refreshes?) same class in these combos of lines: https://pastebin.com/MFvQukr3
[12:10:32] <jimklimov> for the agents' connections, the fuller stack traces are here : https://pastebin.com/siJbMdWP
[12:10:59] <jimklimov> but overall the common culprit seems to be "java.lang.IllegalStateException: The class jenkins.security.QueueItemAuthenticatorConfiguration was not found, potentially not yet loaded" in different cases
[12:54:08] <jimklimov> back to (missing) jenkins.security.QueueItemAuthenticatorConfiguration issues above - I wonder if it is also linked to our detected workers still doing nothing, and queue growing? it says that required labels for jobs are not served by any agent... but the list in that comment does not include the actual workers that do serve the label. Some other jobs are "waiting for available executor on X" and X has dozens free executors. Actually all of them since nothing runs.
[13:07:34] <jimklimov> and back to our usual programming: I restarted this instance with plugins again updated (RBAC) and see new logged errors I did not notice before: lots of the likes of:
[13:07:34] <jimklimov> 2020-06-05 11:04:41.529+0000 [id=20] WARNING h.ExtensionFinder$GuiceFinder$FaultTolerantScope$1#error: Failed to instantiate Key[type=io.jenkins.plugins.analysis.warnings.Pmd$Descriptor, annotation=[none]]; skipping this component
[13:07:34] <jimklimov> java.lang.NoClassDefFoundError: Could not initialize class net.sourceforge.pmd.lang.LanguageRegistry
[13:07:34] <jimklimov> and
[13:07:34] <jimklimov> 2020-06-05 11:04:41.526+0000 [id=20] WARNING h.ExtensionFinder$GuiceFinder$FaultTolerantScope$1#error: Failed to instantiate Key[type=jenkins.security.QueueItemAuthenticatorConfiguration, annotation=[none]]; skipping this component
[13:07:34] <jimklimov> java.lang.IllegalStateException: Singleton is called recursively returning different results
[13:08:00] <jimklimov> I wonder if the latter is why that class won't load
[13:08:24] <jimklimov> wild guess: can it be that different plugins and/or core deliver actively conflicting versions of some libs?
[13:09:54] <jimklimov> a discussion at https://issues.jenkins-ci.org/browse/JENKINS-61990 looks inconclusive, but relevant
[13:09:55] <jenkins-admin> JENKINS-61990:StackOverflowError on boot related to QueueItemAuthenticatorConfiguration (Open) https://issues.jenkins-ci.org/browse/JENKINS-61990
[13:28:32] <jimklimov> removing queue.xml and restarting... saw some newbies in the mile-long stacktraces...
[13:28:32] <jimklimov> 2020-06-05 11:25:53.700+0000 [id=98] WARNING h.i.i.InstallUncaughtExceptionHandler#handleException: Caught unhandled exception with ID 3dffb911-e981-4a52-b2d0-be0cd7bf6c80
[13:28:32] <jimklimov> java.lang.AssertionError: class hudson.security.csrf.DefaultCrumbIssuer is missing its descriptor
[13:28:32] <jimklimov> and a more detailed
[13:28:32] <jimklimov> at com.thoughtworks.xstream.core.TreeUnmarshaller.convert(TreeUnmarshaller.java:72)
[13:28:32] <jimklimov> Caused: com.google.inject.ProvisionException: Unable to provision, see the following errors:
[13:28:32] <jimklimov> 1) Error injecting constructor, java.lang.NoClassDefFoundError: Could not initialize class net.sourceforge.pmd.lang.LanguageRegistry
[13:28:33] <jimklimov> at io.jenkins.plugins.analysis.warnings.Pmd$Descriptor.<init>(Pmd.java:61)
[13:28:33] <jimklimov> 1 error
[13:28:34] <jimklimov> at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:52)
[13:28:34] <jimklimov> at com.google.inject.internal.SingletonScope$1.get(SingletonScope.java:145)
[13:30:03] <jimklimov> oh, wow: 2020-06-05 11:29:17.842+0000 [id=164] SEVERE h.triggers.SCMTrigger$Runner#runPolling: Failed to record SCM polling for org.jenkinsci.plugins.workflow.job.WorkflowJob at 110d8469[42ity-github/czmq/v3 dot 0.2-FTY] // java.lang.NoClassDefFoundError: Could not initialize class hudson.FilePath
[13:52:33] <jimklimov> what the snap is happening there? :\
[13:52:33] <jimklimov> java.lang.IllegalStateException: Jenkins.instance is missing. Read the documentation of Jenkins.getInstanceOrNull to see what you are doing wrong.
[14:07:22] <jimklimov> another try goes by... moved away org.jenkinsci.plugins.workflow.flow.FlowExecutionList.xml.bak-20200605 org.jenkinsci.plugins.pipeline.milestone.MilestoneStep.xml.bak-20200605 queue.xml.bak-20200605 and shut down the reverse proxy that exposes Jenkins so it is not confused by the clients and agents until ready...data:image/s3,"s3://crabby-images/01422/01422d15214099b2f7dcd666a47a44e98626571b" alt=""
[14:09:23] <jimklimov> also in jenkins.security.QueueItemAuthenticatorConfiguration.xml set the all the strategies to "true" (flipped org.jenkinsci.plugins.authorizeproject.strategy.AnonymousAuthorizationStrategy and org.jenkinsci.plugins.authorizeproject.strategy.SpecificUsersAuthorizationStrategy from false they had)
[14:10:58] <jimklimov> and recovered a config.xml - something chopped a few XML tags from it between reboots
[14:12:22] <jimklimov> and it's up, and a lot faster than usual
[14:41:51] <jimklimov> so that good start was with a reverted 2.238 war... activated the 2.239 - and hell broke loose again, reverted to 2.238 - still broken. Removed the newly appeared queue and workflow xmls, and 2.239 seens to be starting well. So something is seriously borked with that queue persistence :\