Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-67062

Jenkins fails to resume builds during restarts when the Agent is connected with WebSockets

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Critical Critical
    • None
    • Jenkins: 2.303.3 JDK11 (latest LTS to date)
      Kubernetes Plugin: 1.30.6 (latest to date)
      jenkins/inbound-agent:4.11-1 (latest to date)
    • 2.338

      There is no error shown in the Jenkins logs itself, but the agent fails with:

      ❯ kubectl logs -f default-728vq
      Warning: SECRET is defined twice in command-line arguments and the environment variable
      Warning: AGENT_NAME is defined twice in command-line arguments and the environment variable
      Nov 04, 2021 7:45:59 PM hudson.remoting.jnlp.Main createEngine
      INFO: Setting up agent: default-728vq
      Nov 04, 2021 7:45:59 PM hudson.remoting.jnlp.Main$CuiListener <init>
      INFO: Jenkins agent is running in headless mode.
      Nov 04, 2021 7:45:59 PM hudson.remoting.Engine startEngine
      INFO: Using Remoting version: 4.11
      Nov 04, 2021 7:45:59 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
      INFO: Using /home/jenkins/agent/remoting as a remoting work directory
      Nov 04, 2021 7:45:59 PM org.jenkinsci.remoting.engine.WorkDirManager setupLogging
      INFO: Both error and output logs will be printed to /home/jenkins/agent/remoting
      Nov 04, 2021 7:46:00 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: WebSocket connection open
      Nov 04, 2021 7:46:00 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Connected
      Nov 04, 2021 7:46:27 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Write side closed
      Nov 04, 2021 7:46:27 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Read side closed
      Nov 04, 2021 7:46:27 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Terminated
      Nov 04, 2021 7:46:27 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Read side closed
      Nov 04, 2021 7:46:27 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Read side closed
      Nov 04, 2021 7:46:27 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: http://jenkins.default.svc.cluster.local:8080/login is not ready: 503
      Nov 04, 2021 7:46:38 PM hudson.remoting.Engine lambda$new$1
      SEVERE: Uncaught exception in Engine thread Thread[Thread-0,5,main]
      java.lang.NoClassDefFoundError: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller
              at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$FindEffectiveRestarters$1.onReconnect(JnlpSlaveRestarterInstaller.java:91)
              at hudson.remoting.EngineListenerSplitter.onReconnect(EngineListenerSplitter.java:54)
              at hudson.remoting.Engine.runWebSocket(Engine.java:687)
              at hudson.remoting.Engine.run(Engine.java:496)
      Caused by: java.lang.ClassNotFoundException: jenkins.slaves.restarter.JnlpSlaveRestarterInstaller
              at java.base/java.net.URLClassLoader.findClass(Unknown Source)
              at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:215)
              at java.base/java.lang.ClassLoader.loadClass(Unknown Source)
              at java.base/java.lang.ClassLoader.loadClass(Unknown Source)
              ... 4 more
      

      The issue can be easily reproduced in any environment:

      $ kind create cluster
      
      $ helm repo add jenkins https://charts.jenkins.io
      
      $ helm repo update
      
      $ helm upgrade jenkins jenkins/jenkins --install --wait --debug -f- <<'EOF'
      controller:
        adminPassword: admin
        agentListenerEnabled: false
        # specifying plugins without version makes sure to use the latest
        installPlugins:
          - kubernetes
          - workflow-aggregator
          - git
          - configuration-as-code
          - job-dsl
          - saferestart
        JCasC:
          configScripts:
            my-jobs: |
              jobs:
                - script: |
                    pipelineJob('testjob') {
                      definition {
                        cps {
                          script("""\
                            pipeline {
                              agent any
                              stages {
                                stage ('test') {
                                  steps {
                                    sleep 1000
                                  }
                                }
                              }
                            }""".stripIndent())
                          sandbox()
                        }
                      }
                    }
      agent:
        websocket: true
        tag: 4.11-1
      EOF
      
      $ echo http://127.0.0.1:8080 && kubectl --namespace default port-forward svc/jenkins 8080:8080
      

      Then:

      1. Go to the Jenkins UI at http://127.0.0.1:8080
      2. Login with "admin" as user and password
      3. Trigger a build of "testjob"
      4. Wait for a pod to be assigned to the build and the sleep command to start running
      5. Start following the pod logs with kubectl logs -f <name-of-pod>
      6. Go to Jenkins home page and click in Restart Safely and confirm
      7. Watch the pod logs, it will fail with the stack trace mentioned above. The build will also fail.

            Unassigned Unassigned
            felipecassiors Felipe Santos
            Votes:
            8 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: