Loading...

This issue is archived. You can view it, but you can't modify it. Learn more

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Major
Component/s: ec2-plugin
Labels:
- issue-exported-to-github
Environment:
EC2 plugin 2043.v483cc2854116

Controller startup after outage/crash with orphaned agents leads to build queue completely blocked.

The impact scales linearly with the number of orphaned agents: 10 agents: +4 minutes (25x10)

Steps to Reproduce

1. Create many EC2 agents via EC2 plugin
2. Stop Jenkins controller
3. Manually delete the EC2 instances from AWS console (instances no longer exist in AWS)
4. Start Jenkins controller

Actual Result

All Queue operations blocked for: 25 seconds × number of orphaned agents
Some liveness probes may check the queue status, and time out because any queue operation is blocked. This will be interpreted as the instance being unhealthy, potentially causing automation to restart the instance

Expected Result

Queue operations continue normally (negligible held time)
Orphaned agents are handled asynchronously or skipped

Root Cause

EC2RetentionStrategy.internalCheck() calls CloudHelper.getInstanceWithRetry() which holds +25s in case the instance if not found (orphan).

Call chain:

ComputerRetentionWork.doAperiodicRun  runs every 1 minute
Queue.withLock()                      [LOCK]
EC2RetentionStrategy.check()
EC2RetentionStrategy.internalCheck()
EC2Computer.getState()                for each instance
CloudHelper.getInstanceWithRetry()
Thread.sleep(5000)                    × 5 retries = 25 seconds MAX

Issue probably introduced in ~~JENKINS-54071~~ , PR . As it started using retries in retention checks.

Workarround

1. Manually delete `$JENKINS_HOME/nodes/` before startup after outages
2. Enable `cleanUpOrphanedNodes: true` in EC2 cloud config (only available in latest versions), it will reduces frequency but doesn't prevent blocking

is caused by

JENKINS-54071 EC2-plugin not spooling up stopped nodes

Closed

links to

CloudBees-internal issue

Assignee:: FABRIZIO MANFREDI
Reporter:: Albert

Created:: 2025-12-05 10:40
Updated:: 2025-12-06 19:25
Archived:: 2025-12-06 19:25

Details

Description

Steps to Reproduce

Actual Result

Expected Result

Root Cause

Workarround

Attachments

Issue Links

Activity

People

Dates