• Icon: Bug Bug
    • Resolution: Not A Defect
    • Icon: Major Major
    • ec2-plugin
    • None

      We have Jenkins + EC2 plugin running both Windows and Linux instances in AWS. from time to time  instances being terminated during building a job. 
      Observations: 

      1. Happens only on Windows nodes. Linux works perfectly
      2. Happens only in off working hours as they are defined in section
        [Only apply minimum number of instances during specific time range]

      Configuration details

      • Jenkins : 2.303.3 , EC2 Plugin : 1.66
      • Auto scale , From: 06:00 To: 21:00
      • the "Minimum number of instances" is 0
      • the "Minimum number of spare instances" is 6

       

      Jenkins Job Log:

       

      2022-02-03 05:00:42    5: [ RUN      ] ****
      2022-02-03 05:00:42    5: [ RUN      ] ****
      2022-02-03 05:00:42    5: [       OK ] ****
      2022-02-03 05:00:44    5: [ RUN      ] ****
      2022-02-03 05:00:44    Terminating on signal SIGTERM(15)
      2022-02-03 05:00:44 FATAL: command execution failed
      2022-02-03 05:00:48 java.io.EOFException
      2022-02-03 05:00:48  at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2798).......
      2022-02-03 05:00:48 Caused: java.io.IOException: Backing channel 'EC2 (AWS) - eu-west1b-windows (i-0a6a1e55947b1e6fa)' is disconnected.
      

      Jenkins Server log : 

      --
      2022-02-03 05:17:04.095+0000 [id=40]    INFO    hudson.plugins.ec2.SlaveTemplate#logProvisionInfo: SlaveTemplate{description='eu-west1b-windows', labels='aws_win'}. checkInstance: i-03e553095dcff6c00.. false - found existing corresponding Jenkins agent: i-03e553095dcff6c00
      --
      2022-02-03 05:17:15.928+0000 [id=46]    INFO    hudson.plugins.ec2.SlaveTemplate#logProvisionInfo: SlaveTemplate{description='eu-west1b-windows', labels='aws_win'}. checkInstance: i-03e553095dcff6c00.. false - found existing corresponding Jenkins agent: i-03e553095dcff6c00
      --
      2022-02-03 05:17:38.354+0000 [id=46]    INFO    hudson.plugins.ec2.SlaveTemplate#logProvisionInfo: SlaveTemplate{description='eu-west1b-windows', labels='aws_win'}. checkInstance: i-03e553095dcff6c00.. false - found existing corresponding Jenkins agent: i-03e553095dcff6c00
      --
      2022-02-03 05:27:45.532+0000 [id=44]   INFO    hudson.plugins.ec2.SlaveTemplate#logProvisionInfo: SlaveTemplate{description='eu-west1b-windows', labels='aws_win'}. checkInstance: i-03e553095dcff6c00.. false - found existing corresponding Jenkins agent: i-03e553095dcff6c00
      --
      2022-02-03 06:03:12.419+0000 [id=6376824]      INFO    h.p.ec2.EC2RetentionStrategy#internalCheck: Idle timeout of EC2 (AWS) - eu-west1b-windows (i-03e553095dcff6c00) after 60 idle minutes, instance statusRUNNING
      2022-02-03 06:03:12.419+0000 [id=6376824]      INFO    h.plugins.ec2.EC2AbstractSlave#idleTimeout: EC2 instance idle time expired: i-03e553095dcff6c00
      2022-02-03 06:03:12.724+0000 [id=6377196]      INFO    h.plugins.ec2.EC2OndemandSlave#lambda$terminate$0: Terminated EC2 instance (terminated): i-03e553095dcff6c00
      --
      2022-02-03 06:04:13.537+0000 [id=6377017]      INFO    h.p.ec2.EC2RetentionStrategy#internalCheck: Idle timeout of EC2 (AWS) - eu-west1b-windows (i-03e553095dcff6c00) after 61 idle minutes, instance statusSHUTTING_DOWN
      2022-02-03 06:04:13.537+0000 [id=6377017]      INFO    h.plugins.ec2.EC2AbstractSlave#idleTimeout: EC2 instance idle time expired: i-03e553095dcff6c00
      --
      2022-02-03 06:06:39.956+0000 [id=6377196]      INFO    h.plugins.ec2.EC2OndemandSlave#lambda$terminate$0: Removed EC2 instance from jenkins controller: i-03e553095dcff6c00
      
      

       

       

       

          [JENKINS-67730] EC2 Plugin: Terminate Win Instances during execution.

          xZanon none added a comment -

          One observation: usually, this problem happens only during the night, when we are running automation jobs using a large number of nodes.
          We are launching ~ 200 small instances in 20 minutes, and then we are releasing them in a period of 60 minutes, after 3 hours of work.
          But the master node seems not busy, according to CPU / RAM usage, and there are no issues with network or disk as well.
          All nodes are created properly. All nodes are working well. Only random windows node is getting killed by EC2 plugin. And from logs looks like the node is killed due to "idle time expired", but this is not the case, cos there is a running job on this node. Is there anything else that we can do to :
          a) work around this problem
          b) provide more information that can help solve the issue.

          xZanon none added a comment - One observation: usually, this problem happens only during the night, when we are running automation jobs using a large number of nodes. We are launching ~ 200 small instances in 20 minutes, and then we are releasing them in a period of 60 minutes, after 3 hours of work. But the master node seems not busy, according to CPU / RAM usage, and there are no issues with network or disk as well. All nodes are created properly. All nodes are working well. Only random windows node is getting killed by EC2 plugin. And from logs looks like the node is killed due to "idle time expired", but this is not the case, cos there is a running job on this node. Is there anything else that we can do to : a) work around this problem b) provide more information that can help solve the issue.

          I don't maintain the EC2 plugin, but your log snippets seem to cover non-overlapping time spans and separate instances. The Jenkins Job Log has i-0a6a1e55947b1e6fa and events from 05:00:42 to 05:00:48. The Jenkins Server Log has i-03e553095dcff6c00 and events from 05:17:04.095 to 06:06:39.956. Although the Jenkins Server Log shows "idle time expired", that happens long after the SIGTERM in Jenkins Job Log.

          Kalle Niemitalo added a comment - I don't maintain the EC2 plugin, but your log snippets seem to cover non-overlapping time spans and separate instances. The Jenkins Job Log has i-0a6a1e55947b1e6fa and events from 05:00:42 to 05:00:48. The Jenkins Server Log has i-03e553095dcff6c00 and events from 05:17:04.095 to 06:06:39.956. Although the Jenkins Server Log shows "idle time expired", that happens long after the SIGTERM in Jenkins Job Log.

          xZanon none added a comment -

          Hello, 

          Please close this one.
          The problem was with another plugin, config-history  creating few million files and trying to deduplicate new records on top of them. 
          after cleaning the folders and restart the Jenkins everything works as intended. 

          Regards, 

          xZanon none added a comment - Hello,  Please close this one. The problem was with another plugin, config-history  creating few million files and trying to deduplicate new records on top of them.  after cleaning the folders and restart the Jenkins everything works as intended.  Regards, 

          xZanon none added a comment -

          Does not require fix. 
          Problem is with other plugin creating tons of files and trying to keep reference to them.

          xZanon none added a comment - Does not require fix.  Problem is with other plugin creating tons of files and trying to keep reference to them.

            thoulen FABRIZIO MANFREDI
            xzanon xZanon none
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: