Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-69152

Remote call on JNLP4-connect connection from XXX failed

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • kubernetes-plugin, remoting
    • None
    • Jenkins 2.346.2 with Java java-11-openjdk-headless-11.0.15.0.9-2.el7_9.x86_64
      Kubernetes plugin 3670.v6ca_059233222
      Kubernetes client API plugin 5.12.2-193.v26a_6078f65a_9
      Kubernetes agents image inbound-agent:4.11-1-jdk11

      We're seeing an issue that just started after updating to Jenkins 2.346.2 (we were on 2.332.3), and with this upgrade we did the required JRE update from 8->11. Since this update we're having issues where some pipelines running on Kubernetes agents stall with the error message shown in the console output:

      Cannot contact e2e-integrations-e2e-personal-e2e-april-494-x1rj1-wntgr-lf3s3: java.io.IOException: Remote call on JNLP4-connect connection from kubernetes_worker_hostname/X.X.X.X:42193 failed 

      No further output to the pipeline appears, no matter how long we leave it. To be clear, this error happens mid-run, we get some output then it dies usually in a similar spot, but not always. Not all pipelines experience this issue. This error doesn't make a lot of sense for a few reasons:

      1. We see an established TCP connections to Jenkins still in netstat for this port/agent
      2. If running tcpdump we see "data" still being exchanged over this connection, packets constantly flow. 
      3. There is no packet loss/drop/delay issue, we have confirmed this with packet captures. 

      Whatever is happening here, its not the network at fault as the error messages seems to indicate. 

      When we take a look at the kubernetes pod for this job, the JNLP container is running fine with the last log message of:

      Jul 28, 2022 1:35:38 AM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Connected 

      Looking at our executor container in this pod, our test processes are still running fine:

      USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMANDe2euser        8  0.0  0.0   5996  3760 pts/1    Ss   01:46   0:00 bashe2euser      164  0.0  0.0   8596  3292 pts/1    R+   01:48   0:00  \_ ps auxfe2euser        1  0.0  0.0   2176   576 ?        Ss   01:46   0:00 /usr/bin/dumb-init cate2euser        7  0.0  0.0   4364   512 pts/0    Ss+  01:46   0:00 cate2euser       69  0.0  0.0   2432   116 ?        S    01:47   0:00 sh -c ({ while [ -d '/home/jenkins/agent/workspace/test1/e2e-personal/e2e/test-code@tmp/durable-ba81b591' -a \! -f '/home/jenkins/agent/woe2euser       70  0.0  0.0   2432   156 ?        S    01:47   0:00  \_ sh -c ({ while [ -d '/home/jenkins/agent/workspace/test1/e2e-personal/e2e/test-code@tmp/durable-ba81b591' -a \! -f '/home/jenkins/agene2euser      163  0.0  0.0   4236   588 ?        S    01:48   0:00  |   \_ sleep 3e2euser       72  0.0  0.0   2432   544 ?        S    01:47   0:00  \_ sh -xe /home/jenkins/agent/workspace/test1/e2e-personal/e2e-/test-code@tmp/durable-ba81b591/script.she2euser       73 20.0  1.5 4624652 4538752 ?     Sl   01:47   0:14      \_ /usr/local/bin/python /usr/local/bin/pytest --capture tee-sys -rA tests/test.py --log DEBUG --runslo

      There are no errors logged on the Jenkins server side beyond what it shown in the pipeline output pasted above.  

      Downgrading Java or Jenkins at this point isn't really an option due to the sheer number of agents we have and the amount of teams involved. 

          [JENKINS-69152] Remote call on JNLP4-connect connection from XXX failed

          Anthony created issue -
          Anthony made changes -
          Component/s New: remoting [ 15489 ]
          Anthony made changes -
          Description Original: We're seeing an issue that just started after updating to Jenkins 2.346.2 (we were on 2.332.3), and with this upgrade we did the required JRE update from 8->11. Since this update we're having issues where some pipelines running on Kubernetes agents stall with the error message shown in the console output:
          {code:java}
          Cannot contact e2e-integrations-e2e-personal-e2e-april-494-x1rj1-wntgr-lf3s3: java.io.IOException: Remote call on JNLP4-connect connection from kubernetes_worker_hostname/X.X.X.X:42193 failed {code}
          No further output to the pipeline appears, no matter how long we leave it. To be clear, this error happens mid-run, we get some output then it dies usually in a similar spot, but not always. Not all pipelines experience this issue. This error doesn't make a lot of sense for a few reasons:
           # We see an established TCP connections to Jenkins still in netstat for this port/agent
           # If running tcpdump we see "data" still being exchanged over this connection, packets constantly flow. 
           # There is no packet loss/drop/delay issue, we have confirmed this with packet captures. 

          Whatever is happening here, its not the network at fault as the error messages to indicate. 

          When we take a look at the kubernetes pod for this job, the JNLP container is running fine with the last log message of:
          {code:java}
          Jul 28, 2022 1:35:38 AM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Connected {code}
          Looking at our executor container in this pod, our test processes are still running fine:
          {code:java}
          USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMANDe2euser 8 0.0 0.0 5996 3760 pts/1 Ss 01:46 0:00 bashe2euser 164 0.0 0.0 8596 3292 pts/1 R+ 01:48 0:00 \_ ps auxfe2euser 1 0.0 0.0 2176 576 ? Ss 01:46 0:00 /usr/bin/dumb-init cate2euser 7 0.0 0.0 4364 512 pts/0 Ss+ 01:46 0:00 cate2euser 69 0.0 0.0 2432 116 ? S 01:47 0:00 sh -c ({ while [ -d '/home/jenkins/agent/workspace/test1/e2e-personal/e2e/test-code@tmp/durable-ba81b591' -a \! -f '/home/jenkins/agent/woe2euser 70 0.0 0.0 2432 156 ? S 01:47 0:00 \_ sh -c ({ while [ -d '/home/jenkins/agent/workspace/test1/e2e-personal/e2e/test-code@tmp/durable-ba81b591' -a \! -f '/home/jenkins/agene2euser 163 0.0 0.0 4236 588 ? S 01:48 0:00 | \_ sleep 3e2euser 72 0.0 0.0 2432 544 ? S 01:47 0:00 \_ sh -xe /home/jenkins/agent/workspace/test1/e2e-personal/e2e-/test-code@tmp/durable-ba81b591/script.she2euser 73 20.0 1.5 4624652 4538752 ? Sl 01:47 0:14 \_ /usr/local/bin/python /usr/local/bin/pytest --capture tee-sys -rA tests/test.py --log DEBUG --runslo{code}
          There are no errors logged on the Jenkins server side beyond what it shown in the pipeline output pasted above.  

          Downgrading Java or Jenkins at this point isn't really an option due to the sheer number of agents we have and the amount of teams involved. 
          New: We're seeing an issue that just started after updating to Jenkins 2.346.2 (we were on 2.332.3), and with this upgrade we did the required JRE update from 8->11. Since this update we're having issues where some pipelines running on Kubernetes agents stall with the error message shown in the console output:
          {code:java}
          Cannot contact e2e-integrations-e2e-personal-e2e-april-494-x1rj1-wntgr-lf3s3: java.io.IOException: Remote call on JNLP4-connect connection from kubernetes_worker_hostname/X.X.X.X:42193 failed {code}
          No further output to the pipeline appears, no matter how long we leave it. To be clear, this error happens mid-run, we get some output then it dies usually in a similar spot, but not always. Not all pipelines experience this issue. This error doesn't make a lot of sense for a few reasons:
           # We see an established TCP connections to Jenkins still in netstat for this port/agent
           # If running tcpdump we see "data" still being exchanged over this connection, packets constantly flow. 
           # There is no packet loss/drop/delay issue, we have confirmed this with packet captures. 

          Whatever is happening here, its not the network at fault as the error messages seems to indicate. 

          When we take a look at the kubernetes pod for this job, the JNLP container is running fine with the last log message of:
          {code:java}
          Jul 28, 2022 1:35:38 AM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Connected {code}
          Looking at our executor container in this pod, our test processes are still running fine:
          {code:java}
          USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMANDe2euser 8 0.0 0.0 5996 3760 pts/1 Ss 01:46 0:00 bashe2euser 164 0.0 0.0 8596 3292 pts/1 R+ 01:48 0:00 \_ ps auxfe2euser 1 0.0 0.0 2176 576 ? Ss 01:46 0:00 /usr/bin/dumb-init cate2euser 7 0.0 0.0 4364 512 pts/0 Ss+ 01:46 0:00 cate2euser 69 0.0 0.0 2432 116 ? S 01:47 0:00 sh -c ({ while [ -d '/home/jenkins/agent/workspace/test1/e2e-personal/e2e/test-code@tmp/durable-ba81b591' -a \! -f '/home/jenkins/agent/woe2euser 70 0.0 0.0 2432 156 ? S 01:47 0:00 \_ sh -c ({ while [ -d '/home/jenkins/agent/workspace/test1/e2e-personal/e2e/test-code@tmp/durable-ba81b591' -a \! -f '/home/jenkins/agene2euser 163 0.0 0.0 4236 588 ? S 01:48 0:00 | \_ sleep 3e2euser 72 0.0 0.0 2432 544 ? S 01:47 0:00 \_ sh -xe /home/jenkins/agent/workspace/test1/e2e-personal/e2e-/test-code@tmp/durable-ba81b591/script.she2euser 73 20.0 1.5 4624652 4538752 ? Sl 01:47 0:14 \_ /usr/local/bin/python /usr/local/bin/pytest --capture tee-sys -rA tests/test.py --log DEBUG --runslo{code}
          There are no errors logged on the Jenkins server side beyond what it shown in the pipeline output pasted above.  

          Downgrading Java or Jenkins at this point isn't really an option due to the sheer number of agents we have and the amount of teams involved. 

            Unassigned Unassigned
            abrandel Anthony
            Votes:
            2 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: