Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-20108

SSH slaves can block for a long time in NativePRNG

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      Have encountered some reports of slow slave performance on a Unix master using many slaves where the thread dumps show all but one slave connection thread waiting for a single lock, which is held by a thread that looks like this:

      "Pipe writer thread: ..." - Thread ...
         java.lang.Thread.State: RUNNABLE
          at sun.security.provider.NativePRNG$RandomIO.implNextBytes(NativePRNG.java:255)
          - locked <598aec0c> (a java.lang.Object)
          at sun.security.provider.NativePRNG$RandomIO.access$200(NativePRNG.java:108)
          at sun.security.provider.NativePRNG.engineNextBytes(NativePRNG.java:97)
          at java.security.SecureRandom.nextBytes(SecureRandom.java:433)
          - locked <329129da> (a java.security.SecureRandom)
          at java.security.SecureRandom.next(SecureRandom.java:455)
          at java.util.Random.nextInt(Random.java:189)
          at com.trilead.ssh2.transport.TransportConnection.sendMessage(TransportConnection.java:154)
      

      From what I can tell neither the Jenkins SSH Slaves plugin nor the Trilead SSH library are to blame, as they produce a different SecureRandom instance for each slave. Rather it is NativePRNG (the default implementation on typical Linux installations among others) which uses a global lock, to synchronize access to /dev/random and /dev/urandom; and random can block waiting for sufficient entropy to accumulate.

      It might help for the SSH Slaves plugin to offer a java.security.SecureRandom based on sun.security.provider.SecureRandom, which does not acquire a global lock to process connection data. (It may take longer to set up a connection, since it needs to seed the random-number generator based on thread activity.)

      Unconfirmed workarounds:

      • Edit the JRE's $JAVA_HOME/lib/security/java.security to comment out the line securerandom.source=file:/dev/urandom (should switch back to the generic implementation)
      • Running -Djava.security.egd=file:/dev/./urandom (should force use of urandom which is supposed to be nonblocking)

        Attachments

          Issue Links

            Activity

            jglick Jesse Glick created issue -
            jglick Jesse Glick made changes -
            Field Original Value New Value
            Assignee Kohsuke Kawaguchi [ kohsuke ]
            scm_issue_link SCM/JIRA link daemon made changes -
            Resolution Fixed [ 1 ]
            Status Open [ 1 ] Resolved [ 5 ]
            jglick Jesse Glick made changes -
            Labels performance lts-candidate performance
            jglick Jesse Glick made changes -
            Assignee Kohsuke Kawaguchi [ kohsuke ]
            jglick Jesse Glick made changes -
            Link This issue depends on JENKINS-25241 [ JENKINS-25241 ]
            olivergondza Oliver Gondža made changes -
            Labels lts-candidate performance 1.580.2-fixed performance
            olivergondza Oliver Gondža made changes -
            Labels 1.580.2-fixed performance 1.580.2-rejected performance
            olivergondza Oliver Gondža made changes -
            Labels 1.580.2-rejected performance 1.580.2-rejected lts-candidate performance
            jglick Jesse Glick made changes -
            Link This issue is blocking JENKINS-25241 [ JENKINS-25241 ]
            jglick Jesse Glick made changes -
            Link This issue depends on JENKINS-25241 [ JENKINS-25241 ]
            olivergondza Oliver Gondža made changes -
            Labels 1.580.2-rejected lts-candidate performance 1.580.2-rejected performance
            olivergondza Oliver Gondža made changes -
            Labels 1.580.2-rejected performance 1.580.2-rejected 1.580.3-rejected performance
            jglick Jesse Glick made changes -
            Link This issue is related to JENKINS-32510 [ JENKINS-32510 ]
            rtyler R. Tyler Croy made changes -
            Workflow JNJira [ 151630 ] JNJira + In-Review [ 194015 ]
            cloudbees CloudBees Inc. made changes -
            Remote Link This issue links to "CloudBees Internal CJP-2555 (Web Link)" [ 19185 ]

              People

              Assignee:
              kohsuke Kohsuke Kawaguchi
              Reporter:
              jglick Jesse Glick
              Votes:
              1 Vote for this issue
              Watchers:
              12 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: