[JENKINS-62158] Bad performance on EC2 instance for first build

Type: Bug
Resolution: Unresolved
Priority: Major
Component/s: ec2-plugin, pipeline
Labels:
Environment:
Jenkins ver. 2.220

Similar Issues:
Powered by SuggestiMate

Show

I have a pipeline project which should run at a EC2 instance node.

I have configured an EC2 connection and starting EC2 t3.medium Windows 10 instances automatically. This all works fine.

But, the first build at an EC2 instance always performs very bad (slow!!). The next build at the same instance (without rebott etc) is much more faster.

@Library('BMS-Libraries')
import static bms.mail.Email.*
import static bms.nexus.Nexus.*
import static bms.utils.Utils.*

node('AWS_VS2017') {
		stage('Cleanup Build Machine'){
			//deleting current workspace directory
			deleteDir()
		}
		
		stage('Preparing Build machine...'){
	retrieveAndExtractBuildTools(this)
		}

//Do some more .......
}

I attached a screenshot of the runtime of the different pipeline steps.

I connected via RDP to the instance during first build nad task-manager didn't display a high CPU or Memory consumption

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

image-2020-10-16-12-40-47-178.png
30 kB
2020-10-16 04:40
image-2020-10-07-18-15-21-103.png
25 kB
2020-10-07 10:15
image-2020-10-07-18-14-18-176.png
39 kB
2020-10-07 10:14
image-2020-10-07-18-11-16-079.png
68 kB
2020-10-07 10:11
image-2020-10-07-18-09-09-475.png
83 kB
2020-10-07 10:09
image-2020-10-07-18-06-28-059.png
54 kB
2020-10-07 10:06
AgentPerformance.png
58 kB
2020-05-11 21:38
2020-05-04 16_31_11-Window.jpg
47 kB
2020-05-04 14:33

Daniel Hoerner created issue - 2020-05-04 14:36

Daniel Hoerner made changes - 2020-05-04 15:28

Description

Original: I have a pipeline project which should run at a EC2 instance node.

I have configured an EC2 connection and starting EC2 t3.medium Windows 10 instances automatically. This all works fine.

But, the first build at an EC2 instance always performs very bad (slow!!). The next build at the same instance (without rebott etc) is much more faster.

{code:java}
@Library('BMS-Libraries')
import static bms.mail.Email.*
import static bms.nexus.Nexus.*
import static bms.utils.Utils.*node('AWS_VS2017') {
stage('Cleanup Build Machine'){
//deleting current workspace directory
deleteDir()
}

stage('Preparing Build machine...'){ retrieveAndExtractBuildTools(this)
}

//Do some more .......
}
{code}
I attached a screenshot of the runtime of the different pipeline steps.

I connected via RDP to the instance during first build nad task-manager didn't display a high CPU or Memory consumption

New: I have a pipeline project which should run at a EC2 instance node.

I have configured an EC2 connection and starting EC2 t3.medium Windows 10 instances automatically. This all works fine.

But, the first build at an EC2 instance always performs very bad (slow!!). The next build at the same instance (without rebott etc) is much more faster.

{code:java}
@Library('BMS-Libraries')
import static bms.mail.Email.*
import static bms.nexus.Nexus.*
import static bms.utils.Utils.*

node('AWS_VS2017') {
stage('Cleanup Build Machine'){
//deleting current workspace directory
deleteDir()
}

stage('Preparing Build machine...'){
retrieveAndExtractBuildTools(this)
}

//Do some more .......
}
{code}
I attached a screenshot of the runtime of the different pipeline steps.

I connected via RDP to the instance during first build nad task-manager didn't display a high CPU or Memory consumption

Ed Thorne made changes - 2020-05-11 21:38

Attachment

New: AgentPerformance.png [ 51181 ]

Ed Thorne added a comment - 2020-05-11 21:58

I don't know that this is limited to the EC2 plugin. I'm seeing a similar issue with a simple Linux JNLP agent. The first job that runs on the agent takes considerably longer than it normally should. Here's an image that shows my results.

Builds 31 and 36 are after the agent has been rebooted. Each step is doing essentially the same operations:

sh 'env'
sh w/simple multi-line command (pwd, ls -al, for loop with print/sleep)
writeFile the multi-line command to disk to be used as input for sshScript
sshScript to a remote instance and execute the same multi-line command

The main difference is that the first two steps run on the master node while the third runs on a remote JNLP agent.

For builds 31 and 36 the execution timings show that it takes almost 20 seconds for a 'sh' step to be loaded and started. The 'sshScript' that follows takes about three minutes from the end of the prior 'sh' step completing until output is logged. Under normal circumstances these operations take about two seconds or less to log some form of activity.

Observing the output of 'top' and checking CloudWatch metrics for the instance I don't see high resource usage or anything that would explain why this first job after reboot is suffering from such horrible performance.

Ed Thorne added a comment - 2020-05-11 21:58 I don't know that this is limited to the EC2 plugin. I'm seeing a similar issue with a simple Linux JNLP agent. The first job that runs on the agent takes considerably longer than it normally should. Here's an image that shows my results. Builds 31 and 36 are after the agent has been rebooted. Each step is doing essentially the same operations: sh 'env' sh w/simple multi-line command (pwd, ls -al, for loop with print/sleep) writeFile the multi-line command to disk to be used as input for sshScript sshScript to a remote instance and execute the same multi-line command The main difference is that the first two steps run on the master node while the third runs on a remote JNLP agent. For builds 31 and 36 the execution timings show that it takes almost 20 seconds for a 'sh' step to be loaded and started. The 'sshScript' that follows takes about three minutes from the end of the prior 'sh' step completing until output is logged. Under normal circumstances these operations take about two seconds or less to log some form of activity. Observing the output of 'top' and checking CloudWatch metrics for the instance I don't see high resource usage or anything that would explain why this first job after reboot is suffering from such horrible performance.

Ed Thorne added a comment - 2020-05-12 13:30

I forgot to mention. This is Jenkins 2.234 with Pipeline 2.6 and SSH Pipeline Steps 2.0.0.

Ed Thorne added a comment - 2020-05-12 13:30 I forgot to mention. This is Jenkins 2.234 with Pipeline 2.6 and SSH Pipeline Steps 2.0.0.

James Green added a comment - 2020-05-28 13:24

I'm not sure we are seeing the same bug, but recently (last couple of weeks) our ec2 builds are taking a lot longer too. Always the first build of an ec2 instance, never subsequent builds.

The big change is upgrading this plugin. According to the agent logs (accessible from the Jenkins web console), the Jenkins master is now awaiting the EC2 instance console output to print the ssh fingerprints to verify the expected keys ahead of connecting. This is acknowledged to take potentially minutes to wait on.

We'd love to know if there is a workaround for this but we're not familiar with the authentication system in use.

One way or another, I'm being approached by staff members using Jenkins complaining that this is now far too slow. I'm open to suggestions.

James Green added a comment - 2020-05-28 13:24 I'm not sure we are seeing the same bug, but recently (last couple of weeks) our ec2 builds are taking a lot longer too. Always the first build of an ec2 instance, never subsequent builds. The big change is upgrading this plugin. According to the agent logs (accessible from the Jenkins web console), the Jenkins master is now awaiting the EC2 instance console output to print the ssh fingerprints to verify the expected keys ahead of connecting. This is acknowledged to take potentially minutes to wait on. We'd love to know if there is a workaround for this but we're not familiar with the authentication system in use. One way or another, I'm being approached by staff members using Jenkins complaining that this is now far too slow. I'm open to suggestions.

Ramon Leon added a comment - 2020-07-16 10:23

First time Jenkins builds a job in an EC2 instance there is a process which doesn't happen on subsequent connections:

the instance has to be created by AWS
the instance initiate
Jenkins creates an init script
Jenkins installs the JVM
Jenkins installs open-ssh clients
Jenkins copies the remote client library
Jenkins launches the client on the instance

All these steps are not done on next builds.

On latest releases of the EC2 plugin we've included a new security step to avoid MitM attacks. This step waits for the output console of the instance (linux ones) to be ready and the plugin reads the SSH Key to guarantee the machine the plugin is connecting to is the expected one. This steps adds some more time to the initial setup. It depends on the time for the console to be ready, but it is usually likely 5 minutes.

You can avoid this new gap by lowering the security level to Accept New or Off. None of these security strategies wait for the console to be ready, but they have some security implications. We've provided a wide range of strategies to allow every administrator to decide which one best fits her/his environment. All is documented in the Plugin documentation: https://github.com/jenkinsci/ec2-plugin/#security

Ramon Leon added a comment - 2020-07-16 10:23 First time Jenkins builds a job in an EC2 instance there is a process which doesn't happen on subsequent connections: the instance has to be created by AWS the instance initiate Jenkins creates an init script Jenkins installs the JVM Jenkins installs open-ssh clients Jenkins copies the remote client library Jenkins launches the client on the instance All these steps are not done on next builds. On latest releases of the EC2 plugin we've included a new security step to avoid MitM attacks . This step waits for the output console of the instance (linux ones) to be ready and the plugin reads the SSH Key to guarantee the machine the plugin is connecting to is the expected one. This steps adds some more time to the initial setup. It depends on the time for the console to be ready, but it is usually likely 5 minutes. You can avoid this new gap by lowering the security level to Accept New or Off . None of these security strategies wait for the console to be ready, but they have some security implications. We've provided a wide range of strategies to allow every administrator to decide which one best fits her/his environment. All is documented in the Plugin documentation: https://github.com/jenkinsci/ec2-plugin/#security

Ramon Leon made changes - 2020-07-16 10:23

Assignee

Original: FABRIZIO MANFREDI [ thoulen ]

New: Ramon Leon [ mramonleon ]

Ramon Leon made changes - 2020-07-16 10:24

Resolution		New: Not A Defect [ 7 ]
Status	Original: Open [ 1 ]	New: Resolved [ 5 ]

Daniel Hoerner added a comment - 2020-07-16 15:24 - edited

mramonleon the issue is not the startup of the AWS instance. The build is slow after the instance was started (see screenshot 2020-05-04%2016_31_11-Window.jpg). The first step is already running at the slave AWS instance, it's a little bit slower, but this is ok. But the second step (Preparing Build machine) is much more slower, and at this time, all steps you described were already performed!)

Daniel Hoerner added a comment - 2020-07-16 15:24 - edited mramonleon the issue is not the startup of the AWS instance. The build is slow after the instance was started (see screenshot 2020-05-04%2016_31_11-Window.jpg). The first step is already running at the slave AWS instance, it's a little bit slower, but this is ok. But the second step (Preparing Build machine) is much more slower, and at this time, all steps you described were already performed!)

Assignee:: Ramon Leon

Reporter:: Daniel Hoerner

Votes:: 2 Vote for this issue

Watchers:: 11 Start watching this issue

Created:: 2020-05-04 14:36

Updated:: 2023-04-20 14:34

Jenkins

Details

Description

Attachments

Attachments

Activity

Collapse comment: Ed Thorne added a comment - 2020-05-11 21:58

Expand comment: Ed Thorne added a comment - 2020-05-11 21:58

Collapse comment: Ed Thorne added a comment - 2020-05-12 13:30

Expand comment: Ed Thorne added a comment - 2020-05-12 13:30

Collapse comment: James Green added a comment - 2020-05-28 13:24

Expand comment: James Green added a comment - 2020-05-28 13:24

Collapse comment: Ramon Leon added a comment - 2020-07-16 10:23

Expand comment: Ramon Leon added a comment - 2020-07-16 10:23

Collapse comment: Daniel Hoerner added a comment - 2020-07-16 15:24, Edited by Daniel Hoerner - 2020-07-16 15:26

Expand comment: Daniel Hoerner added a comment - 2020-07-16 15:24, Edited by Daniel Hoerner - 2020-07-16 15:26

People

Dates