Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-67062

Jenkins fails to resume builds during restarts when the Agent is connected with WebSockets

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Critical Critical
    • None
    • Jenkins: 2.303.3 JDK11 (latest LTS to date)
      Kubernetes Plugin: 1.30.6 (latest to date)
      jenkins/inbound-agent:4.11-1 (latest to date)
    • 2.338

      There is no error shown in the Jenkins logs itself, but the agent fails with:

      ❯ kubectl logs -f default-728vq
      Warning: SECRET is defined twice in command-line arguments and the environment variable
      Warning: AGENT_NAME is defined twice in command-line arguments and the environment variable
      Nov 04, 2021 7:45:59 PM hudson.remoting.jnlp.Main createEngine
      INFO: Setting up agent: default-728vq
      Nov 04, 2021 7:45:59 PM hudson.remoting.jnlp.Main$CuiListener <init>
      INFO: Jenkins agent is running in headless mode.
      Nov 04, 2021 7:45:59 PM hudson.remoting.Engine startEngine
      INFO: Using Remoting version: 4.11
      Nov 04, 2021 7:45:59 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
      INFO: Using /home/jenkins/agent/remoting as a remoting work directory
      Nov 04, 2021 7:45:59 PM org.jenkinsci.remoting.engine.WorkDirManager setupLogging
      INFO: Both error and output logs will be printed to /home/jenkins/agent/remoting
      Nov 04, 2021 7:46:00 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: WebSocket connection open
      Nov 04, 2021 7:46:00 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Connected
      Nov 04, 2021 7:46:27 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Write side closed
      Nov 04, 2021 7:46:27 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Read side closed
      Nov 04, 2021 7:46:27 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Terminated
      Nov 04, 2021 7:46:27 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Read side closed
      Nov 04, 2021 7:46:27 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Read side closed
      Nov 04, 2021 7:46:27 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: http://jenkins.default.svc.cluster.local:8080/login is not ready: 503
      Nov 04, 2021 7:46:38 PM hudson.remoting.Engine lambda$new$1
      SEVERE: Uncaught exception in Engine thread Thread[Thread-0,5,main]
      java.lang.NoClassDefFoundError: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller
              at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$FindEffectiveRestarters$1.onReconnect(JnlpSlaveRestarterInstaller.java:91)
              at hudson.remoting.EngineListenerSplitter.onReconnect(EngineListenerSplitter.java:54)
              at hudson.remoting.Engine.runWebSocket(Engine.java:687)
              at hudson.remoting.Engine.run(Engine.java:496)
      Caused by: java.lang.ClassNotFoundException: jenkins.slaves.restarter.JnlpSlaveRestarterInstaller
              at java.base/java.net.URLClassLoader.findClass(Unknown Source)
              at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:215)
              at java.base/java.lang.ClassLoader.loadClass(Unknown Source)
              at java.base/java.lang.ClassLoader.loadClass(Unknown Source)
              ... 4 more
      

      The issue can be easily reproduced in any environment:

      $ kind create cluster
      
      $ helm repo add jenkins https://charts.jenkins.io
      
      $ helm repo update
      
      $ helm upgrade jenkins jenkins/jenkins --install --wait --debug -f- <<'EOF'
      controller:
        adminPassword: admin
        agentListenerEnabled: false
        # specifying plugins without version makes sure to use the latest
        installPlugins:
          - kubernetes
          - workflow-aggregator
          - git
          - configuration-as-code
          - job-dsl
          - saferestart
        JCasC:
          configScripts:
            my-jobs: |
              jobs:
                - script: |
                    pipelineJob('testjob') {
                      definition {
                        cps {
                          script("""\
                            pipeline {
                              agent any
                              stages {
                                stage ('test') {
                                  steps {
                                    sleep 1000
                                  }
                                }
                              }
                            }""".stripIndent())
                          sandbox()
                        }
                      }
                    }
      agent:
        websocket: true
        tag: 4.11-1
      EOF
      
      $ echo http://127.0.0.1:8080 && kubectl --namespace default port-forward svc/jenkins 8080:8080
      

      Then:

      1. Go to the Jenkins UI at http://127.0.0.1:8080
      2. Login with "admin" as user and password
      3. Trigger a build of "testjob"
      4. Wait for a pod to be assigned to the build and the sleep command to start running
      5. Start following the pod logs with kubectl logs -f <name-of-pod>
      6. Go to Jenkins home page and click in Restart Safely and confirm
      7. Watch the pod logs, it will fail with the stack trace mentioned above. The build will also fail.

          [JENKINS-67062] Jenkins fails to resume builds during restarts when the Agent is connected with WebSockets

            Unassigned Unassigned
            felipecassiors Felipe Santos
            Votes:
            8 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: