Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-75890

When Controller restarts, Agent EC2 instances that are running (but idle) are terminated (deleted)

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • ec2-plugin
    • None
    • Jenkins 2.504.2, Amazon EC2 plugin Version 1949.vf7e0fa_c3b_5b_c

      Our EC2 Agents have an Idle termination time of 5 minutes.

      So when I run a job on an Agent (EC2 instance), when the job finishes, the EC2 instance is running, and Jenkins shows the Agent as idle. This is all as expected.

      After 5 minutes, Jenkins stops the EC2 instance, and Jenkins shows the Agent as stopped. This is all as expected.

      If I /safeRestart Jenkins, then when Jenkins comes back up, the EC2 instance remains there in a stopped state, and Jenkins shows the Agent as stopped. This is all as expected.

      However, if I do /safeRestart and an Agent is idle but still running (i.e. if the Idle termination time has not yet expired), then Jenkins terminates (deletes) the EC2 instance, and removes the Agent from its list.

      This behaviour is not expected. The EC2 instance should just stay available. For us, terminating the instance is a problem, because we have EBS volumes that we would like to keep.

      As far as I can remember, this has always been the behavior, i.e. I don't think it is a recent regression.

      From the plugin docs:

       Having the instances be stopped instead of terminated is useful when you are using EBS volumes and want to keep them mounted for the life of the instance and reuse the instance for long periods of time.

      Here is the Jenkins job log showing what happens when the Controller is restarted. Here, the i-0ee50c13c584084b1 instance was running and idle at startup.

      2025-07-16 09:04:47.389+0000 [id=1]     INFO    hudson.WebAppMain#contextInitialized: Jenkins home directory: /var/jenkins_home found at: EnvVars.masterEnvVars.get("JENKINS_HOME")
      2025-07-16 09:04:47.548+0000 [id=1]     INFO    o.e.j.s.handler.ContextHandler#doStart: Started oeje9n.ContextHandler$CoreContextHandler@66434cc8{Jenkins v2.504.3,/,b=file:///var/jenkins_home/war/,a=AVAILABLE,h=oeje9n.ContextHandler$CoreContextHandler$CoreToNestedHandler@42f22995{STARTED}}
      2025-07-16 09:04:47.573+0000 [id=1]     INFO    o.e.j.server.AbstractConnector#doStart: Started ServerConnector@baf1bb3{HTTP/1.1, (http/1.1)}{0.0.0.0:8080}
      2025-07-16 09:04:47.598+0000 [id=1]     INFO    org.eclipse.jetty.server.Server#doStart: Started oejs.Server@52719fb6{STARTING}[12.0.22,sto=0] @3068ms
      2025-07-16 09:04:47.604+0000 [id=31]    INFO    winstone.Logger#logInternal: Winstone Servlet Engine running: controlPort=disabled
      2025-07-16 09:04:47.865+0000 [id=30]    INFO    jenkins.model.Jenkins#<init>: Starting version 2.504.3
      2025-07-16 09:04:48.036+0000 [id=37]    INFO    jenkins.InitReactorRunner$1#onAttained: Started initialization
      2025-07-16 09:04:48.288+0000 [id=39]    INFO    jenkins.InitReactorRunner$1#onAttained: Listed all plugins
      2025-07-16 09:04:55.187+0000 [id=38]    INFO    jenkins.InitReactorRunner$1#onAttained: Prepared all plugins
      2025-07-16 09:04:55.269+0000 [id=38]    INFO    jenkins.InitReactorRunner$1#onAttained: Started all plugins
      2025-07-16 09:04:56.343+0000 [id=36]    INFO    h.p.b.g.GlobalTimeOutConfiguration#load: global timeout not set
      2025-07-16 09:04:56.481+0000 [id=37]    INFO    jenkins.InitReactorRunner$1#onAttained: Augmented all extensions
      2025-07-16 09:04:59.785+0000 [id=39]    INFO    h.p.ec2.EC2RetentionStrategy#start: Ignoring start request for EC2 (TS) - Standard(x86_64) (i-03557a73cf8ada490) during Jenkins startup due to EC2 instance state of STOPPED
      2025-07-16 09:04:59.923+0000 [id=39]    INFO    h.p.ec2.EC2RetentionStrategy#start: Start requested for EC2 (OED) - Pipeline(x86_64)-Matthew (i-0ee50c13c584084b1)
      2025-07-16 09:04:59.930+0000 [id=55]    INFO    hudson.plugins.ec2.EC2Cloud#log: Launching instance: i-0ee50c13c584084b1
      2025-07-16 09:05:00.007+0000 [id=55]    INFO    o.a.s.c.u.s.AbstractSecurityProviderRegistrar#getOrCreateProvider: getOrCreateProvider(EdDSA) created instance of net.i2p.crypto.eddsa.EdDSASecurityProvider
      2025-07-16 09:05:00.011+0000 [id=39]    INFO    h.p.ec2.EC2RetentionStrategy#start: Ignoring start request for EC2 (TS) - Standard(x86_64) (i-000d334fc37f7031a) during Jenkins startup due to EC2 instance state of STOPPED
      2025-07-16 09:05:00.121+0000 [id=39]    INFO    h.p.ec2.EC2RetentionStrategy#start: Ignoring start request for EC2 (TS) - Standard(x86_64) (i-0053488e76e41bcee) during Jenkins startup due to EC2 instance state of STOPPED
      2025-07-16 09:05:00.156+0000 [id=55]    INFO    hudson.plugins.ec2.EC2Cloud#log: bootstrap()
      2025-07-16 09:05:00.157+0000 [id=55]    INFO    hudson.plugins.ec2.EC2Cloud#log: Getting keypair...
      2025-07-16 09:05:00.222+0000 [id=39]    INFO    h.p.ec2.EC2RetentionStrategy#start: Ignoring start request for EC2 (TS) - Standard(x86_64) (i-0c948e0f1fe6c4ccb) during Jenkins startup due to EC2 instance state of STOPPED
      2025-07-16 09:05:00.283+0000 [id=37]    INFO    jenkins.InitReactorRunner$1#onAttained: System config loaded
      2025-07-16 09:05:01.051+0000 [id=36]    INFO    jenkins.InitReactorRunner$1#onAttained: System config adapted
      2025-07-16 09:05:01.111+0000 [id=36]    INFO    jenkins.InitReactorRunner$1#onAttained: Loaded all jobs
      2025-07-16 09:05:01.116+0000 [id=37]    INFO    jenkins.InitReactorRunner$1#onAttained: Configuration for all jobs updated
      2025-07-16 09:05:01.137+0000 [id=39]    INFO    j.util.groovy.GroovyHookScript#execute: Executing /var/jenkins_home/init.groovy.d/10-doQuietDown-on-restart.groovy
      2025-07-16 09:05:02.004+0000 [id=39]    INFO    j.util.groovy.GroovyHookScript#execute: Executing /var/jenkins_home/init.groovy.d/20-content-security-policy.groovy
      2025-07-16 09:05:02.033+0000 [id=39]    INFO    jenkins.InitReactorRunner$1#onAttained: Completed initialization
      2025-07-16 09:05:02.379+0000 [id=30]    INFO    h.p.ec2.EC2RetentionStrategy#internalCheck: Startup timeout of EC2 (OED) - Pipeline(x86_64)-Matthew (i-0ee50c13c584084b1) after 1196371 milliseconds (timeout: 360000 milliseconds), instance status: RUNNING
      2025-07-16 09:05:02.380+0000 [id=30]    INFO    h.plugins.ec2.EC2AbstractSlave#launchTimeout: EC2 instance failed to launch: i-0ee50c13c584084b1
      2025-07-16 09:05:02.444+0000 [id=55]    INFO    hudson.plugins.ec2.EC2Cloud#log: Using private key TransformService-DEV-ControllerToAgent (SHA-1 fingerprint 8f:d8:87:8f:d9:2f:3b:23:46:f9:81:08:3f:11:dd:09)
      2025-07-16 09:05:02.445+0000 [id=55]    INFO    hudson.plugins.ec2.EC2Cloud#log: Authenticating as jenkins
      2025-07-16 09:05:02.544+0000 [id=55]    INFO    hudson.plugins.ec2.EC2Cloud#log: Connecting to 10.10.19.175 on port 22, with timeout 10000.
      2025-07-16 09:05:02.591+0000 [id=55]    INFO    o.a.s.c.i.DefaultIoServiceFactoryFactory#getIoServiceProvider: Using MinaServiceFactoryFactory
      2025-07-16 09:05:02.616+0000 [id=55]    INFO    o.a.s.c.c.h.ConfigFileHostEntryResolver#resolveEffectiveResolver: resolveEffectiveResolver(jenkins@10.10.19.175:22/null) no configurationfile at /var/jenkins_home/.ssh/config
      2025-07-16 09:05:02.796+0000 [id=55]    INFO    hudson.plugins.ec2.EC2Cloud#log: Connected via SSH.
      2025-07-16 09:05:02.850+0000 [id=30]    INFO    o.j.p.g.j.JobFinderImpersonater$JobItemListener#onLoaded: Loaded 2 jobs in cache
      2025-07-16 09:05:02.851+0000 [id=30]    INFO    hudson.lifecycle.Lifecycle#onReady: Jenkins is fully up and running
      2025-07-16 09:05:02.886+0000 [id=65]    INFO    h.plugins.ec2.EC2OndemandSlave#lambda$terminate$0: Terminated EC2 instance (terminated): i-0ee50c13c584084b1
      2025-07-16 09:05:03.104+0000 [id=55]    WARNING hudson.plugins.ec2.EC2Cloud#log: Authentication failed. Trying again...
      2025-07-16 09:05:03.106+0000 [id=68]    WARNING o.a.s.c.u.logging.LoggingUtils#warn: exceptionCaught(ClientSessionImpl[jenkins@/10.10.19.175:22])[state=Opened] SshException: Server key did not validate
      2025-07-16 09:05:03.111+0000 [id=68]    INFO    o.a.s.c.s.helpers.SessionHelper#disconnect: Disconnecting(ClientSessionImpl[jenkins@/10.10.19.175:22]): SSH2_DISCONNECT_HOST_KEY_NOT_VERIFIABLE - Server key did not validate
      2025-07-16 09:05:03.419+0000 [id=65]    INFO    h.plugins.ec2.EC2OndemandSlave#lambda$terminate$0: Removed EC2 instance from jenkins controller: i-0ee50c13c584084b1
      

            thoulen FABRIZIO MANFREDI
            mwebber Matthew Webber
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: