Jenkins controller restart causes ec2 clouds to fail launching new agents

This issue is archived. You can view it, but you can't modify it. Learn more

XMLWordPrintable

    • Type: Bug
    • Resolution: Unresolved
    • Priority: Critical
    • Component/s: ec2-plugin
    • Environment:
      Jenkins version: 2.452.2
      amazonEC2 plugin version: 1688.v8c07e01d657f
      Configuration as Code plugin version: 1810.v9b_c30a_249a_4c

      We have seceral clouds configured via the configuration as code plugin for amazonEC2. Every time Jenkins restarts, we have to go into each cloud configuration and save. If we don't do this, Jenkins will fail to launch any new agent on EC2. See agent log output attached.

      This looks to be either related or the same as https://issues.jenkins.io/browse/JENKINS-56066. Although the stack trace is different.

      Example yaml config for one of our clouds:

      - "amazonEC2":
            "name": "spot-small"
            "noDelayProvisioning": true
            "region": "eu-west-1"
            "sshKeysCredentialsId": "ssh-private-key"
            "templates":
            - "ami": "ami-<redcated>"
              "amiType":
                "unixData":
                  "sshPort": "22"
              "associatePublicIp": false
              "connectBySSHProcess": true
              "connectionStrategy": "PRIVATE_IP"
              "customDeviceMapping": "/dev/sda1=:50"
              "deleteRootOnTermination": true
              "description": "spot-small"
              "ebsEncryptRootVolume": "DEFAULT"
              "ebsOptimized": false
              "hostKeyVerificationStrategy": "CHECK_NEW_HARD"
              "iamInstanceProfile": "<redacted>"
              "idleTerminationMinutes": 1
              "instanceCapStr": 10
              "javaPath": "java"
              "labelString": "spot-small"
              "maxTotalUses": 5
              "metadataEndpointEnabled": true
              "metadataHopsLimit": 1
              "metadataSupported": true
              "metadataTokensRequired": false
              "minimumNumberOfInstances": 0
              "minimumNumberOfSpareInstances": 0
              "mode": "EXCLUSIVE"
              "monitoring": true
              "numExecutors": 1
              "remoteAdmin": "<redacted>"
              "securityGroups": "<redacted>"
              "spotConfig":
                "spotMaxBidPrice": "0.020"
                "useBidPrice": true
              "stopOnTerminate": true
              "subnetId": "<redacted>,<redacted>,<redacted>"
              "t2Unlimited": false
              "tags":
              - "name": "Name"
                "value": "spot-small"
              "tenancy": "Default"
              "type": "T3Small"
              "useEphemeralDevices": false
            "useInstanceProfileForCredentials": false

      Jenkins is running in AWS ECS Fargate using image jenkins/jenkins:lts-jdk17 with AWS EFS for persisted storage.

      CPU: 2 vCPU
      Memory: 4GB

      Note, in the Jenkins Controller logs on startup I see stack traces with this pattern:

      WARNING jenkins.model.Nodes#load: could not load /var/jenkins_home/nodes/<redacted> (i-<redacted>)
          java.nio.file.NoSuchFileException: /var/jenkins_home/nodes/<redacted> (i-<redacted>)/config.xml
          at java.base/sun.nio.fs.UnixException.translateToIOException(Unknown Source)
          at java.base/sun.nio.fs.UnixException.rethrowAsIOException(Unknown Source)
          at java.base/sun.nio.fs.UnixException.rethrowAsIOException(Unknown Source)
          at java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(Unknown Source)
          at java.base/java.nio.file.Files.newByteChannel(Unknown Source)
          at java.base/java.nio.file.Files.newByteChannel(Unknown Source)
          at java.base/java.nio.file.spi.FileSystemProvider.newInputStream(Unknown Source)
          at java.base/java.nio.file.Files.newInputStream(Unknown Source)
          at hudson.XmlFile.read(XmlFile.java:164)
          at jenkins.model.Nodes.load(Nodes.java:393)
          at jenkins.model.Nodes.load(Nodes.java:340)
          at jenkins.model.Jenkins$12.run(Jenkins.java:3511)
          at org.jvnet.hudson.reactor.TaskGraphBuilder$TaskImpl.run(TaskGraphBuilder.java:177)
          at org.jvnet.hudson.reactor.Reactor.runTask(Reactor.java:305)
          at jenkins.model.Jenkins$5.runTask(Jenkins.java:1175)
          at org.jvnet.hudson.reactor.Reactor$2.run(Reactor.java:221)
          at org.jvnet.hudson.reactor.Reactor$Node.run(Reactor.java:120)
          at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)
          at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
          at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
          at java.base/java.lang.Thread.run(Unknown Source)

            Assignee:
            FABRIZIO MANFREDI
            Reporter:
            Thomas
            Archiver:
            Jenkins Service Account

              Created:
              Updated:
              Archived: