Details
-
Bug
-
Status: In Progress (View Workflow)
-
Minor
-
Resolution: Unresolved
-
None
-
Test env:
* Jenkins 2.138.2 LTS
* Jenkins master is a docker image based on jenkins/jenkins:2.138.2-alpine
* Specific plugins are baked into the image using /usr/local/bin/install-plugins.sh and plugins.txt shown below
* Master runs on AWS ECS.
* Slaves are AWS EC2 instances
plugins.txt:
active-directory:2.8
ansicolor:0.5.2
aws-cloudwatch-logs-publisher:1.2.0
build-timeout:1.19
cloudbees-folder:6.6
credentials-binding:1.16
command-launcher:1.2
docker-workflow:1.17
ec2:1.40.1
git-client:2.7.3
git:3.9.1
gitlab-plugin:1.5.10
htmlpublisher:1.17
jdk-tool:1.1
pipeline-utility-steps:2.1.0
role-strategy:2.9.0
ssh-agent:1.17
ssh-slaves:1.28.1
timestamper:1.8.10
workflow-aggregator:2.6
ws-cleanup:0.35Test env: * Jenkins 2.138.2 LTS * Jenkins master is a docker image based on jenkins/jenkins:2.138.2-alpine * Specific plugins are baked into the image using /usr/local/bin/install-plugins.sh and plugins.txt shown below * Master runs on AWS ECS. * Slaves are AWS EC2 instances plugins.txt: active-directory:2.8 ansicolor:0.5.2 aws-cloudwatch-logs-publisher:1.2.0 build-timeout:1.19 cloudbees-folder:6.6 credentials-binding:1.16 command-launcher:1.2 docker-workflow:1.17 ec2:1.40.1 git-client:2.7.3 git:3.9.1 gitlab-plugin:1.5.10 htmlpublisher:1.17 jdk-tool:1.1 pipeline-utility-steps:2.1.0 role-strategy:2.9.0 ssh-agent:1.17 ssh-slaves:1.28.1 timestamper:1.8.10 workflow-aggregator:2.6 ws-cleanup:0.35
Description
Testing Jenkins 2.138.2 LTS, Jenkins pipelines that use sh intermittently throw the following message in the console log …
sh: line 1: 4449 Terminated sleep 3
… and sometimes this …
sh: line 1: 13136 Terminated { while [ ( -d /proc/$pid -o ! -d /proc/$$ ) -a -d '/home/ec2-user/workspace/admin-smoke-test@tmp/durable-523481b0' -a ! -f '/home/ec2-user/workspace/admin-smoke-test@tmp/durable-523481b0/jenkins-result.txt' ]; do touch '/home/ec2-user/workspace/admin-smoke-test@tmp/durable-523481b0/jenkins-log.txt
'; sleep 3;{{done; }}}
Jenkins master runs from a Docker image based on jenkins/jenkins:2.138.2-alpine with specific plugins baked into the image by /usr/local/bin/install-plugins.sh
The message originates in durable-task-plugin, which must be a dependency of one of the plugins.txt plugins.
Two important observations:
1) The issue does not occur when starting with the base jenkins/jenkins:2.138.2-alpine image and manually installing plugins via UI. That might suggest the issue is around how install-plugins.sh installs plugins and/or dependencies.
2) The issue does not occur on our production image, which is also 2.138.2-alpine + plugins built 2018-10-11. Rebuilding the the same image from the same Dockerfile results in different installed plugins. Makes me think results using install-plugins.sh are not deterministic.
Attachments
Issue Links
- is caused by
-
JENKINS-55867 sh step termination is never detected if the wrapper process is killed
-
- Open
-
The sleep 3 process was introduced when the heartbeat check feature was added in version 1.16 of durable-task, which translates to 2.18 of workflow-durable-task-step, which translates to 2.27 of workflow-job.
Unless I missed a comment, it looks like the pipeline behaves as expected outside of "Terminated sleep 3" line.
At least in parogui’s example, this might be a race condition between the sleep 3 process and when the script completes. Like dnusbaum mentioned earlier, the sleep 3 process is used to touch the output log file to show the script is still alive. However, when the script completes, it will write to a separate result file. A watcher service is checking for that result file every 100ms. Once that result file is found, results are transmitted and everything related to that specific step’s workspace is purged. It might be possible that the output file gets cleaned up right after the sleep 3 process checks if the file still exists, but before it gets touched again?
There is a new release of durable-task (1.31) that removes the sleep 3 process so this line won't pop up anymore.
Update: I have not been able to reproduce this issue, so I can't say for certain if this issue is resolved. Technically, it should, but it could be possible that the new version just changes the behavior of this bug.