Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-54894

Server becomes compute bound and unresponsive to http requests after org.jenkinsci.plugins.workflow.support.concurrent.Timeout lambda$ping$0

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • core
    • None

      For the last two months we have been experiencing intermittent situations where Jenkins becomes compute bound with the CPU at 100%.  Clients don't get a response but viewing the server log on the machine seems to show that the server is still running.  It also seems to lose connectivity with slaves.  Not all slaves, just some.  We have found that restarting the Jenkins service (running on Linux) isn't sufficient, and we have to restart the machine the server is running on.  Then we have to go through all builds that are running and find ones that seem to be hung up or not communicating with their slaves and forcibly kill them via /kill.

       

      It happened again today and I saw this in the log file, which I have seen before:

      Nov 27, 2018 8:08:06 AM org.jenkinsci.plugins.workflow.support.concurrent.Timeout lambda$ping$0Nov 27, 2018 8:08:06 AM org.jenkinsci.plugins.workflow.support.concurrent.Timeout lambda$ping$0INFO: org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep [#8314] unresponsive for 1 min 0 secNov 27, 2018 8:09:26 AM hudson.plugins.active_directory.ActiveDirectoryUnixAuthenticationProvider parseMembersWARNING: JENKINS-42687 Might be more members for user  CN=vra_api,CN=Users,DC=dev,DC=cninc,DC=comjavax.naming.PartialResultException [Root exception is javax.naming.NamingException: LDAP response read timed out, timeout used:60000ms. [Root exception is com.sun.jndi.ldap.LdapReferralException: Continuation Reference; remaining name '']; remaining name ''] at com.sun.jndi.ldap.AbstractLdapNamingEnumeration.hasMoreImpl(AbstractLdapNamingEnumeration.java:237) at com.sun.jndi.ldap.AbstractLdapNamingEnumeration.hasMoreReferrals(AbstractLdapNamingEnumeration.java:347) at com.sun.jndi.ldap.AbstractLdapNamingEnumeration.hasMoreImpl(AbstractLdapNamingEnumeration.java:227) at com.sun.jndi.ldap.AbstractLdapNamingEnumeration.hasMoreReferrals(AbstractLdapNamingEnumeration.java:347) at com.sun.jndi.ldap.AbstractLdapNamingEnumeration.hasMoreImpl(AbstractLdapNamingEnumeration.java:227) at com.sun.jndi.ldap.AbstractLdapNamingEnumeration.hasMoreReferrals(AbstractLdapNamingEnumeration.java:347) at com.sun.jndi.ldap.AbstractLdapNamingEnumeration.hasMoreImpl(AbstractLdapNamingEnumeration.java:227) at com.sun.jndi.ldap.AbstractLdapNamingEnumeration.hasMoreReferrals(AbstractLdapNamingEnumeration.java:347) at com.sun.jndi.ldap.AbstractLdapNamingEnumeration.hasMoreImpl(AbstractLdapNamingEnumeration.java:227) at com.sun.jndi.ldap.AbstractLdapNamingEnumeration.hasMore(AbstractLdapNamingEnumeration.java:189) at hudson.plugins.active_directory.ActiveDirectoryUnixAuthenticationProvider.parseMembers(ActiveDirectoryUnixAuthenticationProvider.java:803) at hudson.plugins.active_directory.ActiveDirectoryUnixAuthenticationProvider.chainGroupLookup(ActiveDirectoryUnixAuthenticationProvider.java:746) at hudson.plugins.active_directory.ActiveDirectoryUnixAuthenticationProvider.resolveGroups(ActiveDirectoryUnixAuthenticationProvider.java:721) at hudson.plugins.active_directory.ActiveDirectoryUnixAuthenticationProvider.access$500(ActiveDirectoryUnixAuthenticationProvider.java:86) at hudson.plugins.active_directory.ActiveDirectoryUnixAuthenticationProvider$1.call(ActiveDirectoryUnixAuthenticationProvider.java:421) at hudson.plugins.active_directory.ActiveDirectoryUnixAuthenticationProvider$1.call(ActiveDirectoryUnixAuthenticationProvider.java:341) at com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4767) at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568) at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350) at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313) at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228) at com.google.common.cache.LocalCache.get(LocalCache.java:3965) at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4764) at hudson.plugins.active_directory.ActiveDirectoryUnixAuthenticationProvider.retrieveUser(ActiveDirectoryUnixAuthenticationProvider.java:341) at hudson.plugins.active_directory.ActiveDirectoryUnixAuthenticationProvider.retrieveUser(ActiveDirectoryUnixAuthenticationProvider.java:304) at hudson.plugins.active_directory.ActiveDirectoryUnixAuthenticationProvider.retrieveUser(ActiveDirectoryUnixAuthenticationProvider.java:226) at hudson.plugins.active_directory.AbstractActiveDirectoryAuthenticationProvider.loadUserByUsername(AbstractActiveDirectoryAuthenticationProvider.java:55) at jenkins.security.ImpersonatingUserDetailsService.loadUserByUsername(ImpersonatingUserDetailsService.java:32) at hudson.model.User.getUserDetailsForImpersonation(User.java:354) at jenkins.security.BasicHeaderApiTokenAuthenticator.authenticate(BasicHeaderApiTokenAuthenticator.java:35) at jenkins.security.BasicHeaderProcessor.doFilter(BasicHeaderProcessor.java:79) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at org.acegisecurity.context.HttpSessionContextIntegrationFilter.doFilter(HttpSessionContextIntegrationFilter.java:249) at hudson.security.HttpSessionContextIntegrationFilter2.doFilter(HttpSessionContextIntegrationFilter2.java:67) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at hudson.security.ChainedServletFilter.doFilter(ChainedServletFilter.java:90) at hudson.security.HudsonFilter.doFilter(HudsonFilter.java:171) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.kohsuke.stapler.compression.CompressionFilter.doFilter(CompressionFilter.java:49) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at hudson.util.CharacterEncodingFilter.doFilter(CharacterEncodingFilter.java:82) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.kohsuke.stapler.DiagnosticThreadNameFilter.doFilter(DiagnosticThreadNameFilter.java:30) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1340) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1242) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.Server.handle(Server.java:503) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:364) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103) at org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:411) at org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:305) at org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:159) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103) at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126) at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683) at java.lang.Thread.run(Thread.java:748)Caused by: javax.naming.NamingException: LDAP response read timed out, timeout used:60000ms. [Root exception is com.sun.jndi.ldap.LdapReferralException: Continuation Reference; remaining name '']; remaining name '' at com.sun.jndi.ldap.Connection.readReply(Connection.java:507) at com.sun.jndi.ldap.LdapClient.getSearchReply(LdapClient.java:638) at com.sun.jndi.ldap.LdapClient.search(LdapClient.java:561) at com.sun.jndi.ldap.LdapCtx.doSearch(LdapCtx.java:1985) at com.sun.jndi.ldap.LdapCtx.searchAux(LdapCtx.java:1844) at com.sun.jndi.ldap.LdapCtx.c_search(LdapCtx.java:1769) at com.sun.jndi.toolkit.ctx.ComponentDirContext.p_search(ComponentDirContext.java:392) at com.sun.jndi.toolkit.ctx.PartialCompositeDirContext.search(PartialCompositeDirContext.java:358) at com.sun.jndi.ldap.LdapReferralContext.search(LdapReferralContext.java:672) at com.sun.jndi.ldap.LdapSearchEnumeration.getReferredResults(LdapSearchEnumeration.java:205) at com.sun.jndi.ldap.AbstractLdapNamingEnumeration.hasMoreReferrals(AbstractLdapNamingEnumeration.java:330) at com.sun.jndi.ldap.AbstractLdapNamingEnumeration.hasMoreImpl(AbstractLdapNamingEnumeration.java:227) at com.sun.jndi.ldap.AbstractLdapNamingEnumeration.hasMore(AbstractLdapNamingEnumeration.java:189) at hudson.plugins.active_directory.ActiveDirectoryUnixAuthenticationProvider.chainGroupLookup(ActiveDirectoryUnixAuthenticationProvider.java:740) ... 65 moreCaused by: com.sun.jndi.ldap.LdapReferralException: Continuation Reference; remaining name '' at com.sun.jndi.ldap.LdapReferralContext.search(LdapReferralContext.java:706) at com.sun.jndi.ldap.LdapSearchEnumeration.getReferredResults(LdapSearchEnumeration.java:205) at com.sun.jndi.ldap.AbstractLdapNamingEnumeration.hasMoreReferrals(AbstractLdapNamingEnumeration.java:330) at com.sun.jndi.ldap.AbstractLdapNamingEnumeration.hasMoreImpl(AbstractLdapNamingEnumeration.java:227) at com.sun.jndi.ldap.AbstractLdapNamingEnumeration.hasMoreReferrals(AbstractLdapNamingEnumeration.java:347) at com.sun.jndi.ldap.AbstractLdapNamingEnumeration.hasMoreImpl(AbstractLdapNamingEnumeration.java:227) at com.sun.jndi.ldap.AbstractLdapNamingEnumeration.hasMore(AbstractLdapNamingEnumeration.java:189) at hudson.plugins.active_directory.ActiveDirectoryUnixAuthenticationProvider.parseMembers(ActiveDirectoryUnixAuthenticationProvider.java:803) at hudson.plugins.active_directory.ActiveDirectoryUnixAuthenticationProvider.chainGroupLookup(ActiveDirectoryUnixAuthenticationProvider.java:746) ... 65 more 

      Shortly after, we then see a cascade of communication related problems with the server losing connectivity to slaves.

      Nov 27, 2018 8:20:43 AM hudson.slaves.ChannelPinger$1 onDead
      INFO: Ping failed. Terminating the channel JNLP4-connect connection from slv-win-prd-13.dev.cninc.com/10.13.1.99:55672.
      java.util.concurrent.TimeoutException: Ping started at 1543324466010 hasn't completed by 1543324724447
      	at hudson.remoting.PingThread.ping(PingThread.java:134)
      	at hudson.remoting.PingThread.run(PingThread.java:90)
      
      Nov 27, 2018 8:23:03 AM jenkins.slaves.DefaultJnlpSlaveReceiver channelClosed
      WARNING: Computer.threadPoolForRemoting [#32306] for slv-win-prd-13 terminated
      java.nio.channels.ClosedChannelException
      	at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer.onReadClosed(ChannelApplicationLayer.java:209)
      	at org.jenkinsci.remoting.protocol.ApplicationLayer.onRecvClosed(ApplicationLayer.java:222)
      	at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:816)
      	at org.jenkinsci.remoting.protocol.FilterLayer.onRecvClosed(FilterLayer.java:287)
      	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecvClosed(SSLEngineFilterLayer.java:181)
      	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.switchToNoSecure(SSLEngineFilterLayer.java:283)
      	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processWrite(SSLEngineFilterLayer.java:503)
      	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processQueuedWrites(SSLEngineFilterLayer.java:248)
      	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.doSend(SSLEngineFilterLayer.java:200)
      	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.doCloseSend(SSLEngineFilterLayer.java:213)
      	at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.doCloseSend(ProtocolStack.java:784)
      	at org.jenkinsci.remoting.protocol.ApplicationLayer.doCloseWrite(ApplicationLayer.java:173)
      	at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer$ByteBufferCommandTransport.closeWrite(ChannelApplicationLayer.java:314)
      	at hudson.remoting.Channel.close(Channel.java:1450)
      	at hudson.remoting.Channel.close(Channel.java:1403)
      	at hudson.slaves.SlaveComputer.closeChannel(SlaveComputer.java:821)
      	at hudson.slaves.SlaveComputer.access$800(SlaveComputer.java:105)
      	at hudson.slaves.SlaveComputer$3.run(SlaveComputer.java:737)
      	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
      	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      
      Nov 27, 2018 8:23:03 AM hudson.slaves.ChannelPinger$1 onDead
      INFO: Ping failed. Terminating the channel JNLP4-connect connection from slv-win-prd-02.dev.cninc.com/10.13.0.179:54403.
      java.util.concurrent.TimeoutException: Ping started at 1543324664159 hasn't completed by 1543324983601
      	at hudson.remoting.PingThread.ping(PingThread.java:134)
      	at hudson.remoting.PingThread.run(PingThread.java:90)
      
      Nov 27, 2018 8:23:03 AM jenkins.slaves.DefaultJnlpSlaveReceiver channelClosed
      WARNING: Computer.threadPoolForRemoting [#32289] for slv-win-prd-02 terminated
      java.nio.channels.ClosedChannelException
      	at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer.onReadClosed(ChannelApplicationLayer.java:209)
      	at org.jenkinsci.remoting.protocol.ApplicationLayer.onRecvClosed(ApplicationLayer.java:222)
      	at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:816)
      	at org.jenkinsci.remoting.protocol.FilterLayer.onRecvClosed(FilterLayer.java:287)
      	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecvClosed(SSLEngineFilterLayer.java:181)
      	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.switchToNoSecure(SSLEngineFilterLayer.java:283)
      	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processWrite(SSLEngineFilterLayer.java:503)
      	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processQueuedWrites(SSLEngineFilterLayer.java:248)
      	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.doSend(SSLEngineFilterLayer.java:200)
      	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.doCloseSend(SSLEngineFilterLayer.java:213)
      	at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.doCloseSend(ProtocolStack.java:784)
      	at org.jenkinsci.remoting.protocol.ApplicationLayer.doCloseWrite(ApplicationLayer.java:173)
      	at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer$ByteBufferCommandTransport.closeWrite(ChannelApplicationLayer.java:314)
      	at hudson.remoting.Channel.close(Channel.java:1450)
      	at hudson.remoting.Channel.close(Channel.java:1403)
      	at hudson.slaves.SlaveComputer.closeChannel(SlaveComputer.java:821)
      	at hudson.slaves.SlaveComputer.access$800(SlaveComputer.java:105)
      	at hudson.slaves.SlaveComputer$3.run(SlaveComputer.java:737)
      	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
      	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      

      After this we are unable to communicate with the server and end up having to reboot the VM it resides on.

      We have had this happen on 3 different VMs, all with the same behavior. I believe we are running into a bug internally.

            Unassigned Unassigned
            jrioux Joe Rioux
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: