Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-35982

EC2 Plugin: High CPU in hudson.plugins.ec2.win.winrm.WindowsProcess

      A Jenkins master running the EC2 Plugin will exhibit high CPU usage. High CPU Analysis will show threads with a stack trace similar to:

      "input copy: java -jar C:\Windows\Temp\slave.jar" #53906 daemon prio=5 os_prio=0 tid=0x00007fab61963800 nid=0x1860 runnable [0x00007fab1288f000]
      java.lang.Thread.State: RUNNABLE
      at java.lang.Throwable.fillInStackTrace(Native Method)
      at java.lang.Throwable.fillInStackTrace(Throwable.java:783)
      - locked <0x00000006b9e757f8> (a java.io.IOException)
      at java.lang.Throwable.<init>(Throwable.java:265)
      at java.lang.Exception.<init>(Exception.java:66)
      at java.io.IOException.<init>(IOException.java:58)
      at java.io.PipedInputStream.read(PipedInputStream.java:310)
      - locked <0x00000005583223c0> (a java.io.PipedInputStream)
      at java.io.PipedInputStream.read(PipedInputStream.java:377)
      - locked <0x00000005583223c0> (a java.io.PipedInputStream)
      at java.io.InputStream.read(InputStream.java:101)
      at hudson.plugins.ec2.win.winrm.WindowsProcess$2.run(WindowsProcess.java:124)
      

      Which appears to be related to this code in the EC2 plugin.

          [JENKINS-35982] EC2 Plugin: High CPU in hudson.plugins.ec2.win.winrm.WindowsProcess

          Mika Karjalainen added a comment - - edited

          Running jvmtop with --profile option gave me this analysis:

          Profiling PID 1524: /usr/share/jenkins/jenkins.war --webroot
          
            99.22% (    76.60s) hudson.plugins.ec2.win.winrm.WindowsProcess$2.run()
             0.39% (     0.30s) ....sonyericsson.jenkins.plugins.bfa.model.FailureReader()
             0.16% (     0.12s) org.kohsuke.stapler.export.NotExportableException.<init>()
             0.04% (     0.03s) hudson.remoting.ObjectInputStreamEx.resolveClass()
             0.03% (     0.02s) ....amazonaws.http.protocol.SdkHttpRequestExecutor.doRec()
             0.02% (     0.02s) hudson.remoting.ChunkedOutputStream.sendFrame()
             0.02% (     0.02s) hudson.plugins.ec2.win.winrm.WinRMClient.sendRequest()
             0.01% (     0.01s) hudson.util.IOUtils.copy()
             0.01% (     0.01s) hudson.remoting.Command.readFrom()
             0.01% (     0.01s) winstone.BoundedExecutorService$1.run()
             0.01% (     0.01s) hudson.remoting.FlightRecorderInputStream.read()
             0.01% (     0.01s) ....google.common.util.concurrent.ForwardingExecutorServ()
             0.00% (     0.00s) ....thoughtworks.xstream.io.xml.XmlFriendlyNameCoder.dec()
             0.00% (     0.00s) okio.Okio$2.read()
             0.00% (     0.00s) hudson.Util.tryOnceDeleteFile()
             0.00% (     0.00s) ....kohsuke.stapler.Function$InstanceFunction.getParamet()
             0.00% (     0.00s) com.jcraft.jzlib.InfCodes.inflate_fast()
             0.00% (     0.00s) hudson.PluginWrapper.getVersionOf()
             0.00% (     0.00s) com.thoughtworks.xstream.io.path.PathTracker.pushElement()
             0.00% (     0.00s) jenkins.util.VirtualFile$FileVF.isIllegalSymlink()
          

          During the analysis I did not have any ec2 windows agents running so this seems a bit weird. Running Jenkins 2.32.1 and EC2 plugin 1.36.

          Mika Karjalainen added a comment - - edited Running jvmtop with --profile option gave me this analysis: Profiling PID 1524: /usr/share/jenkins/jenkins.war --webroot 99.22% ( 76.60s) hudson.plugins.ec2.win.winrm.WindowsProcess$2.run() 0.39% ( 0.30s) ....sonyericsson.jenkins.plugins.bfa.model.FailureReader() 0.16% ( 0.12s) org.kohsuke.stapler.export.NotExportableException.<init>() 0.04% ( 0.03s) hudson.remoting.ObjectInputStreamEx.resolveClass() 0.03% ( 0.02s) ....amazonaws.http.protocol.SdkHttpRequestExecutor.doRec() 0.02% ( 0.02s) hudson.remoting.ChunkedOutputStream.sendFrame() 0.02% ( 0.02s) hudson.plugins.ec2.win.winrm.WinRMClient.sendRequest() 0.01% ( 0.01s) hudson.util.IOUtils.copy() 0.01% ( 0.01s) hudson.remoting.Command.readFrom() 0.01% ( 0.01s) winstone.BoundedExecutorService$1.run() 0.01% ( 0.01s) hudson.remoting.FlightRecorderInputStream.read() 0.01% ( 0.01s) ....google.common.util.concurrent.ForwardingExecutorServ() 0.00% ( 0.00s) ....thoughtworks.xstream.io.xml.XmlFriendlyNameCoder.dec() 0.00% ( 0.00s) okio.Okio$2.read() 0.00% ( 0.00s) hudson.Util.tryOnceDeleteFile() 0.00% ( 0.00s) ....kohsuke.stapler.Function$InstanceFunction.getParamet() 0.00% ( 0.00s) com.jcraft.jzlib.InfCodes.inflate_fast() 0.00% ( 0.00s) hudson.PluginWrapper.getVersionOf() 0.00% ( 0.00s) com.thoughtworks.xstream.io.path.PathTracker.pushElement() 0.00% ( 0.00s) jenkins.util.VirtualFile$FileVF.isIllegalSymlink() During the analysis I did not have any ec2 windows agents running so this seems a bit weird. Running Jenkins 2.32.1 and EC2 plugin 1.36.

          Johannes Ebke added a comment -

          I observed this issue after the disconnect of Windows agents.

          Looking at the code; it seems obvious that the Jenkins thread is stuck in a busy loop at https://github.com/jenkinsci/ec2-plugin/blob/0dc221934cbd087b2819b5660e9b778208c9f2dc/src/main/java/hudson/plugins/ec2/win/winrm/WindowsProcess.java#L126 since there is no escape for broken pipes. If spurious IOExceptions are really a problem there, as the comment claims, the retries should probably be limited to a sensible number.

          Johannes Ebke added a comment - I observed this issue after the disconnect of Windows agents. Looking at the code; it seems obvious that the Jenkins thread is stuck in a busy loop at https://github.com/jenkinsci/ec2-plugin/blob/0dc221934cbd087b2819b5660e9b778208c9f2dc/src/main/java/hudson/plugins/ec2/win/winrm/WindowsProcess.java#L126 since there is no escape for broken pipes. If spurious IOExceptions are really a problem there, as the comment claims, the retries should probably be limited to a sensible number.

          Johannes Ebke added a comment - - edited

          Looking at the github forks of this plugin; I found
          https://github.com/jenkinsci/ec2-plugin/compare/master...karliszigurs:fix-intermittent-pipe-failure-in-windows
          which would probably fix this issue.

          Edit: The PR https://github.com/jenkinsci/ec2-plugin/pull/263 containing this change was already merged. Thanks!

          Johannes Ebke added a comment - - edited Looking at the github forks of this plugin; I found https://github.com/jenkinsci/ec2-plugin/compare/master...karliszigurs:fix-intermittent-pipe-failure-in-windows which would probably fix this issue. Edit: The PR https://github.com/jenkinsci/ec2-plugin/pull/263 containing this change was already merged. Thanks!

          We are now running Jenkins 2.121.2 and ec2-plugin 1.39 and still see this issue. Two days after Jenkins restart CPU load has climbed up to 60% and top 5 threads sorted by cpu time are all 'input copy: java -jar C:\Windows\Temp\slave.jar'

          Mika Karjalainen added a comment - We are now running Jenkins 2.121.2 and ec2-plugin 1.39 and still see this issue. Two days after Jenkins restart CPU load has climbed up to 60% and top 5 threads sorted by cpu time are all 'input copy: java -jar C:\Windows\Temp\slave.jar'

          Raihaan Shouhell added a comment - xmiklis can you try with the latest snapshot? https://repo.jenkins-ci.org/incrementals/org/jenkins-ci/plugins/ec2/1.45-rc1024.cafa05a12c08/

          Sharad Upadhyaya added a comment - - edited

          I am also facing similar issue consistently with Windows EC2 instances:

          Trying to launch 30 Windows EC2 agents using EC2Plugin.

          During the process of connecting Jenkins process in master hits high CPU around 155%.

          All agents gets connected but then eventually some starts getting offline with following error:

          Connected with WinRM.
          Creating tmp directory if it does not exist
          remoting.jar sent remotely. Bootstrapping it
          Launching via WinRM:java -Dfile.encoding=UTF-8 -jar C:\Windows\Temp\remoting.jar -workDir C:\work
          <===[JENKINS REMOTING CAPACITY]===>Remoting version: 4.5
          This is a Windows agent
          ERROR: Connection terminated
          hudson.remoting.FastPipedInputStream$ClosedBy: The pipe was closed at...
           at hudson.remoting.FastPipedOutputStream.error(FastPipedOutputStream.java:101)
           at hudson.remoting.FastPipedOutputStream.close(FastPipedOutputStream.java:90)
           at hudson.plugins.ec2.util.Closeables.closeQuietly(Closeables.java:23)
           at hudson.plugins.ec2.win.winrm.WindowsProcess$2.run(WindowsProcess.java:146)
          Caused: java.io.IOException: Pipe is already closed
           at hudson.remoting.FastPipedOutputStream.write(FastPipedOutputStream.java:156)
           at hudson.remoting.FastPipedOutputStream.write(FastPipedOutputStream.java:140)
           at hudson.remoting.ChunkedOutputStream.sendFrame(ChunkedOutputStream.java:89)
           at hudson.remoting.ChunkedOutputStream.sendBreak(ChunkedOutputStream.java:62)
           at hudson.remoting.ChunkedCommandTransport.writeBlock(ChunkedCommandTransport.java:46)
           at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.write(AbstractSynchronousByteArrayCommandTransport.java:46)
           at hudson.remoting.Channel.send(Channel.java:766)
           at hudson.remoting.Channel.close(Channel.java:1488)
           at hudson.remoting.Channel.close(Channel.java:1455)
           at hudson.slaves.SlaveComputer.closeChannel(SlaveComputer.java:874)
           at hudson.slaves.SlaveComputer.access$100(SlaveComputer.java:110)
           at hudson.slaves.SlaveComputer$2.run(SlaveComputer.java:765)
           at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
           at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
           at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
           at java.util.concurrent.FutureTask.run(FutureTask.java:266)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)
          onOnline: class org.jenkinsci.modules.slave_installer.impl.ComputerListenerImpl reported an exception: hudson.remoting.RequestAbortedException: java.io.IOException: Pipe is already closed
          
          

           

          Got following logs in Jenkins master system logs:

          ouch, STDIN exception for java -Dfile.encoding=UTF-8 -jar C:\Windows\Temp\remoting.jar -workDir C:\work
          java.net.SocketTimeoutException: connect timed out
           at java.net.PlainSocketImpl.socketConnect(Native Method)
           at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
           at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
           at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
           at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
           at java.net.Socket.connect(Socket.java:607)
           at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:368)
           at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142)
          Caused: org.apache.http.conn.ConnectTimeoutException: Connect to 10.252.11.49:5986 [/10.252.11.49] failed: connect timed out
           at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:151)
           at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:374)
           at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:393)
           at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)
           at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
           at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
           at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
           at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
           at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
           at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
           at hudson.plugins.ec2.win.winrm.WinRMClient.sendRequest(WinRMClient.java:262)
          Caused: hudson.plugins.ec2.win.winrm.RuntimeIOException: I/O Exception Connect to 10.252.11.49:5986 [/10.252.11.49] failed: connect timed out
           at hudson.plugins.ec2.win.winrm.WinRMClient.sendRequest(WinRMClient.java:321)
           at hudson.plugins.ec2.win.winrm.WinRMClient.sendRequest(WinRMClient.java:237)
           at hudson.plugins.ec2.win.winrm.WinRMClient.sendInput(WinRMClient.java:127)
           at hudson.plugins.ec2.win.winrm.WindowsProcess$2.run(WindowsProcess.java:141)
          
          

           

          Works fine when trying to connect only 10-15 instances in one go.

          Seems performance issue with EC2Plugin making hard to use with Windows instances.

          Jenkins version: 2.263.1

          EC2Plugin version: 1.56

          Sharad Upadhyaya added a comment - - edited I am also facing similar issue consistently with Windows EC2 instances: Trying to launch 30 Windows EC2 agents using EC2Plugin. During the process of connecting Jenkins process in master hits high CPU around 155%. All agents gets connected but then eventually some starts getting offline with following error: Connected with WinRM. Creating tmp directory if it does not exist remoting.jar sent remotely. Bootstrapping it Launching via WinRM:java -Dfile.encoding=UTF-8 -jar C:\Windows\Temp\remoting.jar -workDir C:\work <===[JENKINS REMOTING CAPACITY]===>Remoting version: 4.5 This is a Windows agent ERROR: Connection terminated hudson.remoting.FastPipedInputStream$ClosedBy: The pipe was closed at... at hudson.remoting.FastPipedOutputStream.error(FastPipedOutputStream.java:101) at hudson.remoting.FastPipedOutputStream.close(FastPipedOutputStream.java:90) at hudson.plugins.ec2.util.Closeables.closeQuietly(Closeables.java:23) at hudson.plugins.ec2.win.winrm.WindowsProcess$2.run(WindowsProcess.java:146) Caused: java.io.IOException: Pipe is already closed at hudson.remoting.FastPipedOutputStream.write(FastPipedOutputStream.java:156) at hudson.remoting.FastPipedOutputStream.write(FastPipedOutputStream.java:140) at hudson.remoting.ChunkedOutputStream.sendFrame(ChunkedOutputStream.java:89) at hudson.remoting.ChunkedOutputStream.sendBreak(ChunkedOutputStream.java:62) at hudson.remoting.ChunkedCommandTransport.writeBlock(ChunkedCommandTransport.java:46) at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.write(AbstractSynchronousByteArrayCommandTransport.java:46) at hudson.remoting.Channel.send(Channel.java:766) at hudson.remoting.Channel.close(Channel.java:1488) at hudson.remoting.Channel.close(Channel.java:1455) at hudson.slaves.SlaveComputer.closeChannel(SlaveComputer.java:874) at hudson.slaves.SlaveComputer.access$100(SlaveComputer.java:110) at hudson.slaves.SlaveComputer$2.run(SlaveComputer.java:765) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) onOnline: class org.jenkinsci.modules.slave_installer.impl.ComputerListenerImpl reported an exception: hudson.remoting.RequestAbortedException: java.io.IOException: Pipe is already closed   Got following logs in Jenkins master system logs: ouch, STDIN exception for java -Dfile.encoding=UTF-8 -jar C:\Windows\Temp\remoting.jar -workDir C:\work java.net.SocketTimeoutException: connect timed out at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:607) at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:368) at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142) Caused: org.apache.http.conn.ConnectTimeoutException: Connect to 10.252.11.49:5986 [/10.252.11.49] failed: connect timed out at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:151) at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:374) at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:393) at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236) at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186) at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56) at hudson.plugins.ec2.win.winrm.WinRMClient.sendRequest(WinRMClient.java:262) Caused: hudson.plugins.ec2.win.winrm.RuntimeIOException: I/O Exception Connect to 10.252.11.49:5986 [/10.252.11.49] failed: connect timed out at hudson.plugins.ec2.win.winrm.WinRMClient.sendRequest(WinRMClient.java:321) at hudson.plugins.ec2.win.winrm.WinRMClient.sendRequest(WinRMClient.java:237) at hudson.plugins.ec2.win.winrm.WinRMClient.sendInput(WinRMClient.java:127) at hudson.plugins.ec2.win.winrm.WindowsProcess$2.run(WindowsProcess.java:141)   Works fine when trying to connect only 10-15 instances in one go. Seems performance issue with EC2Plugin making hard to use with Windows instances. Jenkins version: 2.263.1 EC2Plugin version: 1.56

          Sharad Upadhyaya added a comment - - edited

          raihaan Could you please suggest if there's any workaround for this issue or you have any fix for this issue? Thanks

          Sharad Upadhyaya added a comment - - edited raihaan  Could you please suggest if there's any workaround for this issue or you have any fix for this issue? Thanks

            raihaan Raihaan Shouhell
            jtsweet James Sweet
            Votes:
            7 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated: