Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-48106

java.io.EOFException continue occur when using health check with aws network load balancer

    • Icon: Bug Bug
    • Resolution: Not A Defect
    • Icon: Minor Minor
    • core
    • Jenkins ver. 2.90

      Hi, we currently using AWS Network Load Balancer (http://docs.aws.amazon.com/elasticloadbalancing/latest/network/introduction.html) for jenkins jnlp service port auto discovery.

      But we continues got this exceptions.

      Nov 20, 2017 10:55:50 AM FINE hudson.TcpSlaveAgentListener
      Accepted connection #41,949 from /10.240.0.4:24748
      Nov 20, 2017 10:55:50 AM WARNING hudson.TcpSlaveAgentListener$ConnectionHandler run
      Connection #41949 failed java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:197) at java.io.DataInputStream.readFully(DataInputStream.java:169) at hudson.TcpSlaveAgentListener$ConnectionHandler.run(TcpSlaveAgentListener.java:241)
      

      slaves although are still working normally.

          [JENKINS-48106] java.io.EOFException continue occur when using health check with aws network load balancer

          Chris Lee created issue -
          Oleg Nenashev made changes -
          Labels Original: jnlp slave New: jnlp remoting slave

          Owen Mehegan added a comment -

          Dupe of JENKINS-46893?

          Owen Mehegan added a comment - Dupe of JENKINS-46893 ?

          Jeff Thompson added a comment -

          Are you still seeing this issue? Can you provide further information about what is happening?

          Jeff Thompson added a comment - Are you still seeing this issue? Can you provide further information about what is happening?

          Jeff Thompson added a comment -

          Closing for lack of response providing sufficient reproduction or diagnostic information.

          Jeff Thompson added a comment - Closing for lack of response providing sufficient reproduction or diagnostic information.
          Jeff Thompson made changes -
          Resolution New: Cannot Reproduce [ 5 ]
          Status Original: Open [ 1 ] New: Closed [ 6 ]
          Aaron Trout made changes -
          Attachment New: nlb-hc.pcap [ 45240 ]

          Aaron Trout added a comment - - edited

          I'm also hitting this. I am guessing that Jenkins is unhappy about the TCP health check probes that the NLB is sending it. 

          My limited understanding of Jenkins internals is that on this JNLP / agent port there are multiple protocols (HTTP for JNLP protocol discovery, various JNLP versions). Since the load balancer is just L4, it is doing a simple TCP health check. I have attached a pcap of the health checks coming from the AWS NLB; it just sets up a TCP connection and immediately tears it down again without sending any data over the connection. nlb-hc.pcap

          During this time we are getting one instance of the below error for every new TCP connection received:

           

          Nov 21, 2018 11:27:52 AM hudson.TcpSlaveAgentListener$ConnectionHandler run
           WARNING: Connection #2576 failed
           java.io.EOFException
           at java.io.DataInputStream.readFully(DataInputStream.java:197)
           at java.io.DataInputStream.readFully(DataInputStream.java:169)
           at hudson.TcpSlaveAgentListener$ConnectionHandler.run(TcpSlaveAgentListener.java:242)
          

           

           I have a similar setup, running Jenkins in Kubernetes with a `Service` of type `LoadBalancer` and the annotation to make it use the new NLB (rather than ELB classic). Because of this, the problem gets worse the larger your cluster is, since the NLB will run the health check against every node in the cluster (and kube-proxy will forward all those requests to the single Jenkins instance).

          Aaron Trout added a comment - - edited I'm also hitting this. I am guessing that Jenkins is unhappy about the TCP health check probes that the NLB is sending it.  My limited understanding of Jenkins internals is that on this JNLP / agent port there are multiple protocols (HTTP for JNLP protocol discovery, various JNLP versions). Since the load balancer is just L4, it is doing a simple TCP health check. I have attached a pcap of the health checks coming from the AWS NLB; it just sets up a TCP connection and immediately tears it down again without sending any data over the connection.  nlb-hc.pcap During this time we are getting one instance of the below error for every new TCP connection received:   Nov 21, 2018 11:27:52 AM hudson.TcpSlaveAgentListener$ConnectionHandler run WARNING: Connection #2576 failed java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:197) at java.io.DataInputStream.readFully(DataInputStream.java:169) at hudson.TcpSlaveAgentListener$ConnectionHandler.run(TcpSlaveAgentListener.java:242)    I have a similar setup, running Jenkins in Kubernetes with a `Service` of type `LoadBalancer` and the annotation to make it use the new NLB (rather than ELB classic). Because of this, the problem gets worse the larger your cluster is, since the NLB will run the health check against every node in the cluster (and kube-proxy will forward all those requests to the single Jenkins instance).

          Aaron Trout added a comment - - edited

          By the way, a MUCH easier way to reproduce this is to just use netcat against the JNLP port:

          Run a local Jenkins:

          $ docker run -p 50000:50000 jenkins/jenkins:lts

          Wait for the "Jenkins is fully up and running" message, then in another terminal:

          $ echo | nc localhost 50000
          

          causes:

          WARNING: Connection #1 failed
          java.io.EOFException
                  at java.io.DataInputStream.readFully(DataInputStream.java:197)
                  at java.io.DataInputStream.readFully(DataInputStream.java:169)
                  at hudson.TcpSlaveAgentListener$ConnectionHandler.run(TcpSlaveAgentListener.java:242)
          

          Aaron Trout added a comment - - edited By the way, a MUCH easier way to reproduce this is to just use netcat against the JNLP port: Run a local Jenkins: $ docker run -p 50000:50000 jenkins/jenkins:lts Wait for the "Jenkins is fully up and running" message, then in another terminal: $ echo | nc localhost 50000 causes: WARNING: Connection #1 failed java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:197) at java.io.DataInputStream.readFully(DataInputStream.java:169) at hudson.TcpSlaveAgentListener$ConnectionHandler.run(TcpSlaveAgentListener.java:242)

          chris vest added a comment -

          I am seeing the same issue as aaron465

          chris vest added a comment - I am seeing the same issue as aaron465

            jthompson Jeff Thompson
            protosschris Chris Lee
            Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: