Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-44634

JNLP Rejected Agent Connections Leak Socket Handles

    XMLWordPrintable

Details

    • Bug
    • Status: Open (View Workflow)
    • Critical
    • Resolution: Unresolved
    • windows-slaves-plugin

    Description

      If a JNLP agent repeatedly tries to connect while there is a valid connection already established, Jenkins will open a socket but not close it after rejecting the connection. This pattern can continue to repeat until the Jenkins process hits an open file limit imposed by the operating system, leading to the disconnection of all agents.

      These can be seen with:

      $ lsof -u $JENKINS_USER
      ...
      java 29334 jenkins-ci 12u sock 0,6 0t0 284498333 can't identify protocol
      

      This issue can be resolved by restarting Jenkins, the OS networking subsystem, or possible by connecting gdb and manually closing open sockets.

      Attachments

        Activity

          oleg_nenashev Oleg Nenashev added a comment -

          The issue is valid. I need to double-check if there are things to fix in the current baseline. In 2.46.1 we had JENKINS-42371, but obviously it was not enough.

          Just in case, could you please provide output of the File Leak Detector (http://file-leak-detector.kohsuke.org/ )?  

          oleg_nenashev Oleg Nenashev added a comment - The issue is valid. I need to double-check if there are things to fix in the current baseline. In 2.46.1 we had  JENKINS-42371 , but obviously it was not enough. Just in case, could you please provide output of the File Leak Detector ( http://file-leak-detector.kohsuke.org/  )?  
          oleg_nenashev Oleg Nenashev added a comment -

          bramwelt ping

          oleg_nenashev Oleg Nenashev added a comment - bramwelt ping
          dave_pierce Dave Pierce added a comment - - edited

          Hi!

          We are experiencing very similar (identical symptoms, maybe not exactly the same root cause) with Jenkins 2.85, Ubuntu 14.04.5 and OpenJDK 1.8.0_141.

          Am doing more research...

          File Leak Detector is unable to connect to the Jenkins JVM.

          dave_pierce Dave Pierce added a comment - - edited Hi! We are experiencing very similar (identical symptoms, maybe not exactly the same root cause) with Jenkins 2.85, Ubuntu 14.04.5 and OpenJDK 1.8.0_141. Am doing more research... File Leak Detector is unable to connect to the Jenkins JVM.
          oleg_nenashev Oleg Nenashev added a comment -

          You need to run it as Java agent

          oleg_nenashev Oleg Nenashev added a comment - You need to run it as Java agent
          dave_pierce Dave Pierce added a comment - - edited

          Thanks; I will have to try that after hours. (It's production.)

          Is the file leak detector plugin likely to be of any use? https://wiki.jenkins.io/display/JENKINS/File+Leak+Detector+Plugin

          dave_pierce Dave Pierce added a comment - - edited Thanks; I will have to try that after hours. (It's production.) Is the file leak detector plugin likely to be of any use? https://wiki.jenkins.io/display/JENKINS/File+Leak+Detector+Plugin
          dave_pierce Dave Pierce added a comment -

          I ran it. I get a lot of these:

           

          #2 socket channel by thread:Computer.threadPoolForRemoting 356 on Wed Dec 13 15:08:38 CST 2017
              at sun.nio.ch.SocketChannelImpl.<init>(SocketChannelImpl.java:108)
              at sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:60)
              at java.nio.channels.SocketChannel.open(SocketChannel.java:145)
              at java.nio.channels.SocketChannel.open(SocketChannel.java:187)
              at org.jinterop.dcom.transport.JIComTransport.attach(JIComTransport.java:98)
              at rpc.Stub.attach(Stub.java:104)
              at org.jinterop.dcom.core.JIComServer.call(JIComServer.java:860)
              at org.jinterop.dcom.core.JIComServer.call(JIComServer.java:825)
              at org.jinterop.dcom.core.JIComServer.addRef_ReleaseRef(JIComServer.java:909)
              at org.jinterop.dcom.core.JISession.releaseRef(JISession.java:730)
              at org.jinterop.dcom.core.JIComServer.createInstance(JIComServer.java:746)
              at org.jvnet.hudson.wmi.WMI.connect(WMI.java:61)
              at hudson.os.windows.ManagedWindowsServiceLauncher.launch(ManagedWindowsServiceLauncher.java:208)
              at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:285)
              at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748)

          dave_pierce Dave Pierce added a comment - I ran it. I get a lot of these:   #2 socket channel by thread:Computer.threadPoolForRemoting 356 on Wed Dec 13 15:08:38 CST 2017     at sun.nio.ch.SocketChannelImpl.<init>(SocketChannelImpl.java:108)     at sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:60)     at java.nio.channels.SocketChannel.open(SocketChannel.java:145)     at java.nio.channels.SocketChannel.open(SocketChannel.java:187)     at org.jinterop.dcom.transport.JIComTransport.attach(JIComTransport.java:98)     at rpc.Stub.attach(Stub.java:104)     at org.jinterop.dcom.core.JIComServer.call(JIComServer.java:860)     at org.jinterop.dcom.core.JIComServer.call(JIComServer.java:825)     at org.jinterop.dcom.core.JIComServer.addRef_ReleaseRef(JIComServer.java:909)     at org.jinterop.dcom.core.JISession.releaseRef(JISession.java:730)     at org.jinterop.dcom.core.JIComServer.createInstance(JIComServer.java:746)     at org.jvnet.hudson.wmi.WMI.connect(WMI.java:61)     at hudson.os.windows.ManagedWindowsServiceLauncher.launch(ManagedWindowsServiceLauncher.java:208)     at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:285)     at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)     at java.util.concurrent.FutureTask.run(FutureTask.java:266)     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)     at java.lang.Thread.run(Thread.java:748)
          oleg_nenashev Oleg Nenashev added a comment -

          Code in Windows Slaves plugin is definitely prone to the connection leaks during launch() and afterDisconnect(). Not sure this is the originally reported defect, but bramwelt also has the Windows Slaves plugin installed.

          Let's try to fix this defect in the ticket

          oleg_nenashev Oleg Nenashev added a comment - Code in Windows Slaves plugin is definitely prone to the connection leaks during launch() and afterDisconnect(). Not sure this is the originally reported defect, but bramwelt also has the Windows Slaves plugin installed. Let's try to fix this defect in the ticket

          People

            Unassigned Unassigned
            bramwelt Trevor Bramwell
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated: