Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-69152

Remote call on JNLP4-connect connection from XXX failed

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • kubernetes-plugin, remoting
    • None
    • Jenkins 2.346.2 with Java java-11-openjdk-headless-11.0.15.0.9-2.el7_9.x86_64
      Kubernetes plugin 3670.v6ca_059233222
      Kubernetes client API plugin 5.12.2-193.v26a_6078f65a_9
      Kubernetes agents image inbound-agent:4.11-1-jdk11

      We're seeing an issue that just started after updating to Jenkins 2.346.2 (we were on 2.332.3), and with this upgrade we did the required JRE update from 8->11. Since this update we're having issues where some pipelines running on Kubernetes agents stall with the error message shown in the console output:

      Cannot contact e2e-integrations-e2e-personal-e2e-april-494-x1rj1-wntgr-lf3s3: java.io.IOException: Remote call on JNLP4-connect connection from kubernetes_worker_hostname/X.X.X.X:42193 failed 

      No further output to the pipeline appears, no matter how long we leave it. To be clear, this error happens mid-run, we get some output then it dies usually in a similar spot, but not always. Not all pipelines experience this issue. This error doesn't make a lot of sense for a few reasons:

      1. We see an established TCP connections to Jenkins still in netstat for this port/agent
      2. If running tcpdump we see "data" still being exchanged over this connection, packets constantly flow. 
      3. There is no packet loss/drop/delay issue, we have confirmed this with packet captures. 

      Whatever is happening here, its not the network at fault as the error messages seems to indicate. 

      When we take a look at the kubernetes pod for this job, the JNLP container is running fine with the last log message of:

      Jul 28, 2022 1:35:38 AM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Connected 

      Looking at our executor container in this pod, our test processes are still running fine:

      USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMANDe2euser        8  0.0  0.0   5996  3760 pts/1    Ss   01:46   0:00 bashe2euser      164  0.0  0.0   8596  3292 pts/1    R+   01:48   0:00  \_ ps auxfe2euser        1  0.0  0.0   2176   576 ?        Ss   01:46   0:00 /usr/bin/dumb-init cate2euser        7  0.0  0.0   4364   512 pts/0    Ss+  01:46   0:00 cate2euser       69  0.0  0.0   2432   116 ?        S    01:47   0:00 sh -c ({ while [ -d '/home/jenkins/agent/workspace/test1/e2e-personal/e2e/test-code@tmp/durable-ba81b591' -a \! -f '/home/jenkins/agent/woe2euser       70  0.0  0.0   2432   156 ?        S    01:47   0:00  \_ sh -c ({ while [ -d '/home/jenkins/agent/workspace/test1/e2e-personal/e2e/test-code@tmp/durable-ba81b591' -a \! -f '/home/jenkins/agene2euser      163  0.0  0.0   4236   588 ?        S    01:48   0:00  |   \_ sleep 3e2euser       72  0.0  0.0   2432   544 ?        S    01:47   0:00  \_ sh -xe /home/jenkins/agent/workspace/test1/e2e-personal/e2e-/test-code@tmp/durable-ba81b591/script.she2euser       73 20.0  1.5 4624652 4538752 ?     Sl   01:47   0:14      \_ /usr/local/bin/python /usr/local/bin/pytest --capture tee-sys -rA tests/test.py --log DEBUG --runslo

      There are no errors logged on the Jenkins server side beyond what it shown in the pipeline output pasted above.  

      Downgrading Java or Jenkins at this point isn't really an option due to the sheer number of agents we have and the amount of teams involved. 

            Unassigned Unassigned
            abrandel Anthony
            Votes:
            2 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: