Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-46893

Huge amount of TcpSlaveAgentListener EOFException on Kubernetes

    • Icon: Bug Bug
    • Resolution: Not A Defect
    • Icon: Major Major
    • core, remoting
    • None
    • jenkins-2.60.2

      I set up a Jenkins master on Azure container service (Kubernetes). I got a lot of warning since the Jenkins master was set up and never end:

       

      Sep 15, 2017 8:33:23 AM hudson.TcpSlaveAgentListener$ConnectionHandler run
      WARNING: Connection #600 failed
      java.io.EOFException
              at java.io.DataInputStream.readFully(DataInputStream.java:197)
              at java.io.DataInputStream.readFully(DataInputStream.java:169)
              at hudson.TcpSlaveAgentListener$ConnectionHandler.run(TcpSlaveAgentListener.java:213)
      Sep 15, 2017 8:33:23 AM hudson.TcpSlaveAgentListener$ConnectionHandler run
      WARNING: Connection #602 failed
      java.io.EOFException
              at java.io.DataInputStream.readFully(DataInputStream.java:197)
              at java.io.DataInputStream.readFully(DataInputStream.java:169)
              at hudson.TcpSlaveAgentListener$ConnectionHandler.run(TcpSlaveAgentListener.java:213)
      Sep 15, 2017 8:33:27 AM hudson.TcpSlaveAgentListener$ConnectionHandler run
      WARNING: Connection #603 failed
      java.io.EOFException
              at java.io.DataInputStream.readFully(DataInputStream.java:197)
              at java.io.DataInputStream.readFully(DataInputStream.java:169)
              at hudson.TcpSlaveAgentListener$ConnectionHandler.run(TcpSlaveAgentListener.java:213)
      

      Although there is a lot of warning, but everything is OK. I can even use Azure VM plugin to set up jnlp slaves.

      I think this issue related to port: 50000 because if I expose port 50000 as a cluster port, there is no error. However, if I set port 50000 as LoadBalancer (we have to do this), the errors shows above.

      Here is my kube files:

       

      kind: PersistentVolumeClaim
      apiVersion: v1
      metadata:
        name: azdisk
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 5Gi
        storageClassName: azuredisk
      ---
      kind: Deployment
      apiVersion: apps/v1beta1
      metadata:
        name: jenkins-1
      spec:
        replicas: 1
        template:
          metadata:
            name: jenkins-1
            labels:
              app: jenkins-1
          spec:
            containers:
            - name: jenkins-container
              image: zackliu1995/jenkins
              volumeMounts:
              - name: azure
                mountPath: /var/jenkins_home
              securityContext:
                privileged: true
              ports:
                - name: port8080
                  containerPort: 8080
                  protocol: TCP
                - name: port50000
                  containerPort: 50000
                  protocol: TCP
                - name: port22
                  containerPort: 22
            volumes:
              - name: azure
                persistentVolumeClaim:
                  claimName: azdisk
      ---
      apiVersion: v1
      kind: Service
      metadata:
        name: jenkins-srv
      spec:
        selector:
          app: jenkins-1
        ports:
          - name: http
            port: 80
            protocol: TCP
            targetPort: 8080
          - name: slave
            port: 50000
            protocol: TCP
            targetPort: 50000
          - name: ssh
            port: 22
            targetPort: 22
        type: LoadBalancer
      
      

      I tried official Jenkins image, it caused the same issue.

       

       

       

          [JENKINS-46893] Huge amount of TcpSlaveAgentListener EOFException on Kubernetes

          Oleg Nenashev added a comment -

          Why do you have to set up a port as a load balancer?

          If you want to setup high availability or load balancing for Jenkins, the recommended approach is to expose ports of underlying Jenkins master instances and let agents connect to them directly using the "hudson.TcpSlaveAgentListener.hostName" system property (it will be returned to agents in HttpResponse headers)

          Oleg Nenashev added a comment - Why do you have to set up a port as a load balancer? If you want to setup high availability or load balancing for Jenkins, the recommended approach is to expose ports of underlying Jenkins master instances and let agents connect to them directly using the "hudson.TcpSlaveAgentListener.hostName" system property (it will be returned to agents in HttpResponse headers)

          Chenyang Liu added a comment -

          Because we want to make Jenkins as service for different users.

          As we know, LoadBalancer will apply for a new public IP in Azure automatically and our user can use this new IP to visit his Jenkins.

          Do you know the reason why there are so much warnings when the Jenkins start to run. But in fact there is no issue to connect slave.

          Chenyang Liu added a comment - Because we want to make Jenkins as service for different users. As we know, LoadBalancer will apply for a new public IP in Azure automatically and our user can use this new IP to visit his Jenkins. Do you know the reason why there are so much warnings when the Jenkins start to run. But in fact there is no issue to connect slave.

          Oleg Nenashev added a comment -

          I would guess there is a broken agent. Or maybe VMWare plugin forcefully disconnects the agent. It usually happen with onRecvClosed() warning, but this one may also happen if the data transfer is in progress. In order to say more, I would need agent logs at least

          Oleg Nenashev added a comment - I would guess there is a broken agent. Or maybe VMWare plugin forcefully disconnects the agent. It usually happen with onRecvClosed() warning, but this one may also happen if the data transfer is in progress. In order to say more, I would need agent logs at least

          Chenyang Liu added a comment -

          Oh, you probably misunderstood the issue. There is no agent but only a Jenkins master. This issue happened since the very beginning that I haven't even finished my wizard (haven't unlocked the Jenkins and haven't installed the suggest plugins) and of cause haven't installed VM agent plugin.

          So, I think maybe it caused by Jenkins's self check... I don't know, so strange.

          Chenyang Liu added a comment - Oh, you probably misunderstood the issue. There is no agent but only a Jenkins master. This issue happened since the very beginning that I haven't even finished my wizard (haven't unlocked the Jenkins and haven't installed the suggest plugins) and of cause haven't installed VM agent plugin. So, I think maybe it caused by Jenkins's self check... I don't know, so strange.

          Oleg Nenashev added a comment -

          I can add more diagnostics, but there is no such self-check in Jenkins for sure.

          The errors may be also coming from Remoting-based CLI, non-terminated Jenkins Maven project runs on instances. There are also other components from vendors which may be trying to connect to the master via Remoting.

          Oleg Nenashev added a comment - I can add more diagnostics, but there is no such self-check in Jenkins for sure. The errors may be also coming from Remoting-based CLI, non-terminated Jenkins Maven project runs on instances. There are also other components from vendors which may be trying to connect to the master via Remoting.

          Daniel Beck added a comment -

          Does a load balancer connect to the port in question as a health check of sorts?

          Perhaps rtyler or olblak can take a look at this as our resident Kubernetes on Azure experts.

          Daniel Beck added a comment - Does a load balancer connect to the port in question as a health check of sorts? Perhaps rtyler or olblak can take a look at this as our resident Kubernetes on Azure experts.

          Chenyang Liu added a comment -

          That's probably the case, Azure Load Balancer does have health probes for every 5 second by default.

          I will check it later on Monday.

          Chenyang Liu added a comment - That's probably the case, Azure Load Balancer does have health probes for every 5 second by default. I will check it later on Monday.

          R. Tyler Croy added a comment -

          danielbeck, from my understanding this JIRA is not a support forum.

          R. Tyler Croy added a comment - danielbeck , from my understanding this JIRA is not a support forum.

          Daniel Beck added a comment - - edited

          rtyler It is not, but so far it's unclear to me whether this is a bug or not. You saying my guess is right (or some other unreasonable load balancer behavior causes this) would make this Not A Defect.

          Daniel Beck added a comment - - edited rtyler It is not, but so far it's unclear to me whether this is a bug or not. You saying my guess is right (or some other unreasonable load balancer behavior causes this) would make this Not A Defect.

          Oleg Nenashev added a comment -

          zackliu ping.

          Oleg Nenashev added a comment - zackliu ping.

          Chenyang Liu added a comment -

          It's caused by health probes, can close this issue.

          Chenyang Liu added a comment - It's caused by health probes, can close this issue.

          Oleg Nenashev added a comment -

          Closing according to the response

          Oleg Nenashev added a comment - Closing according to the response

          gp guan added a comment -

          I met the same issue in my local k8s cluster, and service type is ClusterIp. When the problem occurs,the log of jenkins master and jenkins slave as follows. 
          jenkins-slave.txt

          gp guan added a comment - I met the same issue in my local k8s cluster, and service type is ClusterIp. When the problem occurs,the log of jenkins master and jenkins slave as follows.  jenkins-slave.txt

            Unassigned Unassigned
            zackliu Chenyang Liu
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: