-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
Jenkins is running in docker on an ubuntu 18.04 host in GCP
Jenkins ver. 2.150.2
Google compute engine plugin 2.0.0 (latest available)
Using an instance template to spin up n1-standard-4 VMs using SSDs
Jenkins master process is started with the following JVM Options for faster response when workers are needed. (Still experimenting with the right values).
JAVA_OPTS="-Dhudson.slaves.NodeProvisioner.MARGIN=50 -Dhudson.slaves.NodeProvisioner.MARGIN0=0.85"Jenkins is running in docker on an ubuntu 18.04 host in GCP Jenkins ver. 2.150.2 Google compute engine plugin 2.0.0 (latest available) Using an instance template to spin up n1-standard-4 VMs using SSDs Jenkins master process is started with the following JVM Options for faster response when workers are needed. (Still experimenting with the right values). JAVA_OPTS="-Dhudson.slaves.NodeProvisioner.MARGIN=50 -Dhudson.slaves.NodeProvisioner.MARGIN0=0.85"
The problem is that certain job runs are not passed the parameters they should be getting and they fail.
This seems to happens more when 100 or more job runs are triggered concurrently by a trigger. Sometimes with as little as 50.
The failed jobs seem to get an executor on a worker, and start running but fail because the parameters they should have received aren't there. So the processing that expects valid parameters values fails.
To test this setup I setup 2 Pipeline groovy jobs. The content of both jobs is attached.
Parent job
- runs on a worker provisioned by the GCE plugin
- is triggered with some parameters, including 1 that tells it how many instances of the child job to trigger
Child job
- is triggered multiple times by the parent job and passed some parameters
- each run of this job requires its own worker
- does something reasonably simple
- verifies the worker it is running on is ready by checking for a file (this is just verifying that any necessary build caches like gradle, npm, pip, are on the worker)
- using the parameters passed in by the parent job, it tries to download a file from GCS
- then it sleeps for 15 mins, just to hold the worker - When this job fails, it is because the parameters it should have received are missing and it can't do the download of an artifact from GCS
I have attached screen shots of what the parameters page looks like on a successful run as well as an unsuccessful run.