Similar to the well-known problem in ssh-slaves but affecting the Jenkins SSH server. I noticed that workflow-plugin functional tests were running very slowly with low CPU usage, and saw in thread dumps

      "Executing testDeleteSubFolder(org.jenkinsci.plugins.workflow.steps.DeleteDirStepTest)" ...
      java.lang.Thread.State: WAITING (on object monitor)
      	at java.lang.Object.wait(Native Method)
      	at java.lang.Object.wait(Object.java:502)
      	at org.jvnet.hudson.reactor.Reactor.execute(Reactor.java:267)
      	- locked <...> (a jenkins.model.Jenkins$7)
      	at jenkins.InitReactorRunner.run(InitReactorRunner.java:44)
      	at jenkins.model.Jenkins.executeReactor(Jenkins.java:915)
      	at jenkins.model.Jenkins.<init>(Jenkins.java:814)
      	at hudson.model.Hudson.<init>(Hudson.java:83)
      	at org.jvnet.hudson.test.JenkinsRule.newHudson(JenkinsRule.java:559)
      	at org.jvnet.hudson.test.JenkinsRule.before(JenkinsRule.java:346)
      	at ...
      "SSHD.init" ...
         java.lang.Thread.State: RUNNABLE
      	at java.io.FileInputStream.readBytes(Native Method)
      	at java.io.FileInputStream.read(FileInputStream.java:255)
      	at sun.security.provider.NativePRNG$RandomIO.readFully(NativePRNG.java:410)
      	at sun.security.provider.NativePRNG$RandomIO.implGenerateSeed(NativePRNG.java:427)
      	- locked <...> (a java.lang.Object)
      	at sun.security.provider.NativePRNG$RandomIO.access$500(NativePRNG.java:329)
      	at sun.security.provider.NativePRNG.engineGenerateSeed(NativePRNG.java:224)
      	at java.security.SecureRandom.generateSeed(SecureRandom.java:533)
      	at org.apache.sshd.common.random.BouncyCastleRandom.<init>(BouncyCastleRandom.java:57)
      	at org.apache.sshd.common.random.BouncyCastleRandom$Factory.create(BouncyCastleRandom.java:48)
      	at org.apache.sshd.common.random.BouncyCastleRandom$Factory.create(BouncyCastleRandom.java:41)
      	at org.apache.sshd.common.random.SingletonRandomFactory.<init>(SingletonRandomFactory.java:37)
      	at org.apache.sshd.SshServer.setUpDefaultServer(SshServer.java:452)
      	at org.jenkinsci.main.modules.sshd.SSHD.start(SSHD.java:83)
      	- locked <...> (a org.jenkinsci.main.modules.sshd.SSHD)
      	at org.jenkinsci.main.modules.sshd.SSHD.init(SSHD.java:146)
      	at sun.reflect.GeneratedMethodAccessor149.invoke(Unknown Source)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:497)
      	at hudson.init.TaskMethodFinder.invoke(TaskMethodFinder.java:106)
      	at hudson.init.TaskMethodFinder$TaskImpl.run(TaskMethodFinder.java:176)
      	at org.jvnet.hudson.reactor.Reactor.runTask(Reactor.java:282)
      	at jenkins.model.Jenkins$7.runTask(Jenkins.java:904)
      	at ...
      

      The bundled server should be switched to not rely on NativePRNG, which can be quite slow.

      sudo apt-get install haveged seems to have improved test speed in my case, but I should not have to do this.

          [JENKINS-32510] SSHD relying on NativePRNG

          Reinhold Füreder added a comment - - edited

          I guess this actually led to a range of failing (old/existing) tests of PRs (e.g. for https://github.com/jenkinsci/email-ext-plugin/pull/170 for email-ext plugin currently maintained by davidvanlaatum) due to running into timeout:

          • https://ci.jenkins.io/blue/organizations/jenkins/Plugins%2Femail-ext-plugin/detail/PR-170/5/tests
            • ScriptContentTest.testWhenTemplateNotFoundThrowFileNotFoundException
              Error Message
              
              test timed out after 180 seconds
              
              Stacktraceorg.junit.runners.model.TestTimedOutException: test timed out after 180 seconds
              	at java.lang.Object.wait(Native Method)
              	at java.lang.Object.wait(Object.java:502)
              	at org.jvnet.hudson.reactor.Reactor.execute(Reactor.java:267)
              	at jenkins.InitReactorRunner.run(InitReactorRunner.java:45)
              	at jenkins.model.Jenkins.executeReactor(Jenkins.java:1009)
              	at jenkins.model.Jenkins.<init>(Jenkins.java:877)
              	at hudson.model.Hudson.<init>(Hudson.java:85)
              	at org.jvnet.hudson.test.JenkinsRule.newHudson(JenkinsRule.java:617)
              	at org.jvnet.hudson.test.JenkinsRule.before(JenkinsRule.java:390)
              	at hudson.plugins.emailext.plugins.content.ScriptContentTest$1.before(ScriptContentTest.java:62)
              	at org.jvnet.hudson.test.JenkinsRule$1.evaluate(JenkinsRule.java:543)
              	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
              	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
              	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              	at java.lang.Thread.run(Thread.java:748)
              Standard Output
              === Starting testWhenTemplateNotFoundThrowFileNotFoundException(hudson.plugins.emailext.plugins.content.ScriptContentTest)
              
              Standard Error
                 0.009 [id=1157]	INFO	o.jvnet.hudson.test.JenkinsRule#createWebServer: Running on http://localhost:40893/jenkins/
                 0.047 [id=1163]	INFO	jenkins.InitReactorRunner$1#onAttained: Started initialization
                 0.049 [id=1175]	INFO	jenkins.InitReactorRunner$1#onAttained: Listed all plugins
                 0.087 [id=1163]	WARNING	o.j.h.test.TestExtensionLoader#isActive: ignoring class org.jenkinsci.plugins.workflow.support.actions.LogActionImplTest$ChattyStep$DescriptorImpl compiled against jenkins-test-harness older than 2.16
                 0.088 [id=1163]	WARNING	o.j.h.test.TestExtensionLoader#isActive: ignoring class org.jenkinsci.plugins.workflow.steps.AbstractStepImplRestartTest$BlockStep$DescriptorImpl compiled against jenkins-test-harness older than 2.16
                 0.120 [id=1163]	INFO	jenkins.InitReactorRunner$1#onAttained: Prepared all plugins
                 0.120 [id=1176]	INFO	jenkins.InitReactorRunner$1#onAttained: Started all plugins
                 0.121 [id=1168]	INFO	jenkins.InitReactorRunner$1#onAttained: Augmented all extensions
                 0.292 [id=1169]	INFO	jenkins.InitReactorRunner$1#onAttained: Loaded all jobs
               180.002 [id=1]	WARNING	o.j.hudson.test.JenkinsRule$2#evaluate: Test timed out (after 180 seconds).
              
              ...
              
              "SSHD.init" Id=1172 Group=FailOnTimeoutGroup RUNNABLE (in native)
              	at java.io.FileInputStream.readBytes(Native Method)
              	at java.io.FileInputStream.read(FileInputStream.java:255)
              	at sun.security.provider.NativePRNG$RandomIO.readFully(NativePRNG.java:424)
              	at sun.security.provider.NativePRNG$RandomIO.implGenerateSeed(NativePRNG.java:441)
              	-  locked java.lang.Object@6deee370
              	at sun.security.provider.NativePRNG$RandomIO.access$500(NativePRNG.java:331)
              	at sun.security.provider.NativePRNG.engineGenerateSeed(NativePRNG.java:226)
              	at java.security.SecureRandom.generateSeed(SecureRandom.java:533)
              	at org.apache.sshd.common.random.BouncyCastleRandom.<init>(BouncyCastleRandom.java:57)
              	at org.apache.sshd.common.random.BouncyCastleRandom$Factory.create(BouncyCastleRandom.java:48)
              	at org.apache.sshd.common.random.BouncyCastleRandom$Factory.create(BouncyCastleRandom.java:41)
              	at org.apache.sshd.common.random.SingletonRandomFactory.<init>(SingletonRandomFactory.java:37)
              	at org.apache.sshd.SshServer.setUpDefaultServer(SshServer.java:452)
              	at org.jenkinsci.main.modules.sshd.SSHD.start(SSHD.java:83)
              	-  locked org.jenkinsci.main.modules.sshd.SSHD@49c17ba4
              	at org.jenkinsci.main.modules.sshd.SSHD.init(SSHD.java:146)
              	at sun.reflect.GeneratedMethodAccessor284.invoke(Unknown Source)
              	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              	at java.lang.reflect.Method.invoke(Method.java:498)
              	at hudson.init.TaskMethodFinder.invoke(TaskMethodFinder.java:104)
              	at hudson.init.TaskMethodFinder$TaskImpl.run(TaskMethodFinder.java:175)
              	at org.jvnet.hudson.reactor.Reactor.runTask(Reactor.java:282)
              	at jenkins.model.Jenkins$7.runTask(Jenkins.java:998)
              	at org.jvnet.hudson.reactor.Reactor$2.run(Reactor.java:210)
              	at org.jvnet.hudson.reactor.Reactor$Node.run(Reactor.java:117)
              	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              	at java.lang.Thread.run(Thread.java:748)	Number of locked synchronizers = 1
              	- java.util.concurrent.ThreadPoolExecutor$Worker@5414b9f

          Reinhold Füreder added a comment - - edited I guess this actually led to a range of failing (old/existing) tests of PRs (e.g. for https://github.com/jenkinsci/email-ext-plugin/pull/170 for email-ext plugin currently maintained by davidvanlaatum ) due to running into timeout: https://ci.jenkins.io/blue/organizations/jenkins/Plugins%2Femail-ext-plugin/detail/PR-170/5/tests ScriptContentTest.testWhenTemplateNotFoundThrowFileNotFoundException Error Message test timed out after 180 seconds Stacktraceorg.junit.runners.model.TestTimedOutException: test timed out after 180 seconds at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) at org.jvnet.hudson.reactor.Reactor.execute(Reactor.java:267) at jenkins.InitReactorRunner.run(InitReactorRunner.java:45) at jenkins.model.Jenkins.executeReactor(Jenkins.java:1009) at jenkins.model.Jenkins.<init>(Jenkins.java:877) at hudson.model.Hudson.<init>(Hudson.java:85) at org.jvnet.hudson.test.JenkinsRule.newHudson(JenkinsRule.java:617) at org.jvnet.hudson.test.JenkinsRule.before(JenkinsRule.java:390) at hudson.plugins.emailext.plugins.content.ScriptContentTest$1.before(ScriptContentTest.java:62) at org.jvnet.hudson.test.JenkinsRule$1.evaluate(JenkinsRule.java:543) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.lang.Thread.run(Thread.java:748) Standard Output === Starting testWhenTemplateNotFoundThrowFileNotFoundException(hudson.plugins.emailext.plugins.content.ScriptContentTest) Standard Error 0.009 [id=1157] INFO o.jvnet.hudson.test.JenkinsRule#createWebServer: Running on http://localhost:40893/jenkins/ 0.047 [id=1163] INFO jenkins.InitReactorRunner$1#onAttained: Started initialization 0.049 [id=1175] INFO jenkins.InitReactorRunner$1#onAttained: Listed all plugins 0.087 [id=1163] WARNING o.j.h.test.TestExtensionLoader#isActive: ignoring class org.jenkinsci.plugins.workflow.support.actions.LogActionImplTest$ChattyStep$DescriptorImpl compiled against jenkins-test-harness older than 2.16 0.088 [id=1163] WARNING o.j.h.test.TestExtensionLoader#isActive: ignoring class org.jenkinsci.plugins.workflow.steps.AbstractStepImplRestartTest$BlockStep$DescriptorImpl compiled against jenkins-test-harness older than 2.16 0.120 [id=1163] INFO jenkins.InitReactorRunner$1#onAttained: Prepared all plugins 0.120 [id=1176] INFO jenkins.InitReactorRunner$1#onAttained: Started all plugins 0.121 [id=1168] INFO jenkins.InitReactorRunner$1#onAttained: Augmented all extensions 0.292 [id=1169] INFO jenkins.InitReactorRunner$1#onAttained: Loaded all jobs 180.002 [id=1] WARNING o.j.hudson.test.JenkinsRule$2#evaluate: Test timed out (after 180 seconds). ... "SSHD.init" Id=1172 Group=FailOnTimeoutGroup RUNNABLE (in native) at java.io.FileInputStream.readBytes(Native Method) at java.io.FileInputStream.read(FileInputStream.java:255) at sun.security.provider.NativePRNG$RandomIO.readFully(NativePRNG.java:424) at sun.security.provider.NativePRNG$RandomIO.implGenerateSeed(NativePRNG.java:441) - locked java.lang.Object@6deee370 at sun.security.provider.NativePRNG$RandomIO.access$500(NativePRNG.java:331) at sun.security.provider.NativePRNG.engineGenerateSeed(NativePRNG.java:226) at java.security.SecureRandom.generateSeed(SecureRandom.java:533) at org.apache.sshd.common.random.BouncyCastleRandom.<init>(BouncyCastleRandom.java:57) at org.apache.sshd.common.random.BouncyCastleRandom$Factory.create(BouncyCastleRandom.java:48) at org.apache.sshd.common.random.BouncyCastleRandom$Factory.create(BouncyCastleRandom.java:41) at org.apache.sshd.common.random.SingletonRandomFactory.<init>(SingletonRandomFactory.java:37) at org.apache.sshd.SshServer.setUpDefaultServer(SshServer.java:452) at org.jenkinsci.main.modules.sshd.SSHD.start(SSHD.java:83) - locked org.jenkinsci.main.modules.sshd.SSHD@49c17ba4 at org.jenkinsci.main.modules.sshd.SSHD.init(SSHD.java:146) at sun.reflect.GeneratedMethodAccessor284.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at hudson.init.TaskMethodFinder.invoke(TaskMethodFinder.java:104) at hudson.init.TaskMethodFinder$TaskImpl.run(TaskMethodFinder.java:175) at org.jvnet.hudson.reactor.Reactor.runTask(Reactor.java:282) at jenkins.model.Jenkins$7.runTask(Jenkins.java:998) at org.jvnet.hudson.reactor.Reactor$2.run(Reactor.java:210) at org.jvnet.hudson.reactor.Reactor$Node.run(Reactor.java:117) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Number of locked synchronizers = 1 - java.util.concurrent.ThreadPoolExecutor$Worker@5414b9f https://ci.jenkins.io/blue/organizations/jenkins/Plugins%2Femail-ext-plugin/detail/PR-170/1/tests/ AbstractScriptTriggerTest.beforeBuildTriggerMigration ScriptContentSecureTest.testWhenTemplateNotFoundThrowFileNotFoundException https://ci.jenkins.io/blue/organizations/jenkins/Plugins%2Femail-ext-plugin/detail/PR-170/6/tests ScriptContentSecureTest.testShouldFindTemplateOnClassPath

          Sam Van Oort added a comment -

          dnusbaum Could I get you to take a look? Normally one wouldn't stress too much about test flakes, but this ties to a cluster of actual issues we've seen reported in the past and some of our colleagues have seen.

          Sam Van Oort added a comment - dnusbaum Could I get you to take a look? Normally one wouldn't stress too much about test flakes, but this ties to a cluster of actual issues we've seen reported in the past and some of our colleagues have seen.

          Devin Nusbaum added a comment - - edited

          SSHD's BouncyCastleRandom class only uses the SecureRandom instance to generate an 8-byte seed using SecureRandom#generateSeed for a pure-Java PRNG from BouncyCastle, and the constructed BouncyCastleRandom instance is used as a singleton in SSHD, so that call should only block once. The most likely explanation is that it is using /dev/random, which is blocking occasionally when running tests.

          We could change the SecureRandom instance to use the NativePRNGNonBlocking provider, which will use /dev/urandom for engineGenerateSeed.

          Notably, it looks like SHA1PRNG's generateSeed and nextBytes methods use either /dev/random or /dev/urandom as their source based on the values of java.security.egd and securerandom.source, which we don't want to modify, so SHA1PRNG seems like a bad choice here. We could still use SHA1PRNG's engineNextBytes method, since the seed that it uses to initialize the state is computed statically, and so may already be ready by the time we want to use if if someone else initialized it, but we can't guarantee that it won't block

          Since NativePRNGNonBlocking always uses /dev/urandom, and unlike in JENKINS-20108 we don't care about SHA1PRNG's increased throughput, I think switching to NativePRNGNonBlocking is the best option.

          Devin Nusbaum added a comment - - edited SSHD's BouncyCastleRandom class only uses the SecureRandom instance to generate an 8-byte seed using SecureRandom#generateSeed for a pure-Java PRNG from BouncyCastle, and the constructed BouncyCastleRandom instance is used as a singleton in SSHD, so that call should only block once. The most likely explanation is that it is using /dev/random , which is blocking occasionally when running tests. We could change the SecureRandom instance to use the NativePRNGNonBlocking provider, which will use /dev/urandom for engineGenerateSeed. Notably, it looks like SHA1PRNG's generateSeed and nextBytes methods use either /dev/random or /dev/urandom as their source based on the values of java.security.egd and securerandom.source , which we don't want to modify, so SHA1PRNG seems like a bad choice here. We could still use SHA1PRNG's engineNextBytes method, since the seed that it uses to initialize the state is computed statically , and so may already be ready by the time we want to use if if someone else initialized it, but we can't guarantee that it won't block Since NativePRNGNonBlocking always uses /dev/urandom , and unlike in JENKINS-20108 we don't care about SHA1PRNG's increased throughput, I think switching to NativePRNGNonBlocking is the best option.

            Unassigned Unassigned
            jglick Jesse Glick
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: