I've observed that a few of our Jenkins jobs, especially those scheduled as cron jobs, are occasionally getting stuck. This happens more frequently since these jobs run more often but we are running into this issue at least once a week. Here is what the console output looks like when the issue occurs:
[Pipeline] { [Pipeline] stage [Pipeline] { (Declarative: Checkout SCM) [Pipeline] checkout The recommended git tool is: git
or some jobs with
The recommended git tool is: NONE
Note: These jobs runs perfectly fine almost all the time.
In our setup, we have a policy for dynamic workers to terminate after a certain number of runs (approximately 30-50). We're using
- Jenkins: 2.263.4
- OpenJDK 11 on both the master and worker nodes, all running on Ubuntu 20.04.
Here's what I've noticed specifically with workers:
- When a new worker spins up, and the first job gets executed, if it hangs as described, the subsequent job (the second one) also gets stuck.
- Even when I attempt to terminate these stuck jobs, any further jobs assigned to that agent encounter similar issues. It's important to note that this problem doesn't seem exclusive to the git plugin; it also occurs with simple shell command jobs.
- A temporary fix seems to be manually disconnecting the agent and then reconnecting it.
This leads me to suspect that something might be going wrong during the creation of the agent.
I've looked for related bugs but haven't found anything that matches our issue.
There was one it was closed https://issues.jenkins.io/browse/JENKINS-71759.
Some additional info:
Jenkins masters and workers are running on EC2. No containers.
Jenkins is installed and configured using ansible.
Agents are created dynamically using ec2 cloud plugin.
Jenkins masters created and terminates multiple workers/agents in a day.
Plugins used:
amazon-ecr:1.6 ansible:1.1 ansicolor:1.0.0 artifactory:3.10.6 aws-device-farm:1.30 aws-secrets-manager-credentials-provider:0.5.3 blueocean:1.24.6 build-blocker-plugin:1.7.7 build-name-setter:2.1.0 build-timeout:1.20 build-user-vars-plugin:1.7 command-launcher:1.6 configuration-as-code:1.51 configurationslicing:1.52 convert-to-pipeline:1.0 copyartifact:1.46 cron_column:1.4 datadog:2.11.0 delivery-pipeline-plugin:1.4.2 discard-old-build:1.05 docker-build-publish:1.3.3 ec2:1.58 email-ext:2.82 enhanced-old-build-discarder:1.4 extended-choice-parameter:0.82 extended-read-permission:3.2 external-monitor-job:1.7 ez-templates:1.3.4 ghprb:1.42.2 git-parameter:0.9.13 google-login:1.6 groovy:2.3 hockeyapp:1.5.1 http_request:1.9.0 icon-shim:2.0.3 jenkins-multijob-plugin:1.36 job-dsl:1.77 job-restrictions:0.8 jobConfigHistory:2.27 jobgenerator:1.22 ldap:1.26 multiple-scms:0.6 notification:1.14 pagerduty:0.7.0 pam-auth:1.6 parameterized-scheduler:1.0 prometheus:2.0.10 promoted-builds:3.9.1 rake:1.8.0 rebuild:1.32 role-strategy:3.1.1 saml:1.1.7 shelve-project-plugin:3.1 simple-theme-plugin:0.6 slack:2.48 slave-setup:1.10 ssh-agent:1.22 ssh-slaves:1.31.5 ssh:2.6.1 terraform:1.0.10 trilead-api:1.0.13 uno-choice:2.5.6 windows-slaves:1.8 working-hours:1.1 ws-cleanup:0.39 xml-job-to-job-dsl:0.1.13