Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-49021

Stopped but not suspended Azure VM Agent is not restarted

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • _unsorted
    • Jenkins ver. 2.89.2 with Azure VM Agents Plugin ver. 0.6.0

      We're making use of the Azure VM Agents Plugin to create and maintain agents based on a custom .vhd file.
      The key info for that custom image is:

      • Windows OS
      • JNLP connection to Jenkins Master
      • "Shutdown Only (Do Not Delete) After Retention Time" option is enabled

      Such an agent node can have two flags shown in the Jenkins sidebar: "offline" and "suspended". While everything runs as expected, we do not manually start/stop the agent VMs.
      However, the stop (i.e. deallocate) command triggered by the Azure VM Agents Plugin once the retention time is up results in the JNLP connection to get closed before Jenkins marks the node as "suspended". When a job that is supposed to be built on that agents is triggered, the node (which is shown as "offline", but not "suspended") is never started by the plugin. The job waits indefinitely until it is cancelled or the agents is being started manually (i.e. via the Azure Portal or CLI).

      This doesn't happen all the time. Sometimes, the node is marked as "suspended" before the JNLP connection is closed and the agent is being started the next time it is required – i.e. as expected.

          [JENKINS-49021] Stopped but not suspended Azure VM Agent is not restarted

          Carsten Wickner created issue -
          Carsten Wickner made changes -
          Description Original: We're making use of the Azure VM Agents Plugin to create and maintain agents based on a custom {{.vhd}} file.
          The custom image contains a Windows that registers itself via JNLP at the Jenkins Master.
          The "Shutdown Only (Do Not Delete) After Retention Time" option is enabled.

          Such an agent node can have two flags shown in the Jenkins sidebar: "offline" and "suspended". While everything runs as expected, we do not manually start/stop the agent VMs.
          However, the stop (i.e. deallocate) command triggered by the Azure VM Agents Plugin results in the JNLP connection to get closed before Jenkins marks the node as "suspended". When the a job that is supposed to be built on that agents is triggered, the node (which is shown as "offline", but not "suspended") is never started by the plugin. The job waits indefinitely until it is cancelled or the agents is being started manually (i.e. via the Azure Portal or CLI).

          This doesn't happen all the time. Sometimes, the node is marked as "suspended" before the JNLP connection is closed and the agent is being started the next time it is required -- i.e. as expected.
          New: We're making use of the Azure VM Agents Plugin to create and maintain agents based on a custom {{.vhd}} file.
          The custom image contains a Windows that registers itself via JNLP at the Jenkins Master.
          The "Shutdown Only (Do Not Delete) After Retention Time" option is enabled.

          Such an agent node can have two flags shown in the Jenkins sidebar: "offline" and "suspended". While everything runs as expected, we do not manually start/stop the agent VMs.
          However, the stop (i.e. deallocate) command triggered by the Azure VM Agents Plugin once the retention time is up results in the JNLP connection to get closed before Jenkins marks the node as "suspended". When a job that is supposed to be built on that agents is triggered, the node (which is shown as "offline", but not "suspended") is never started by the plugin. The job waits indefinitely until it is cancelled or the agents is being started manually (i.e. via the Azure Portal or CLI).

          This doesn't happen all the time. Sometimes, the node is marked as "suspended" before the JNLP connection is closed and the agent is being started the next time it is required -- i.e. as expected.
          Carsten Wickner made changes -
          Description Original: We're making use of the Azure VM Agents Plugin to create and maintain agents based on a custom {{.vhd}} file.
          The custom image contains a Windows that registers itself via JNLP at the Jenkins Master.
          The "Shutdown Only (Do Not Delete) After Retention Time" option is enabled.

          Such an agent node can have two flags shown in the Jenkins sidebar: "offline" and "suspended". While everything runs as expected, we do not manually start/stop the agent VMs.
          However, the stop (i.e. deallocate) command triggered by the Azure VM Agents Plugin once the retention time is up results in the JNLP connection to get closed before Jenkins marks the node as "suspended". When a job that is supposed to be built on that agents is triggered, the node (which is shown as "offline", but not "suspended") is never started by the plugin. The job waits indefinitely until it is cancelled or the agents is being started manually (i.e. via the Azure Portal or CLI).

          This doesn't happen all the time. Sometimes, the node is marked as "suspended" before the JNLP connection is closed and the agent is being started the next time it is required -- i.e. as expected.
          New: We're making use of the Azure VM Agents Plugin to create and maintain agents based on a custom {{.vhd}} file.
          The key info for that custom image is:
          * Windows OS
          * JNLP connection to Jenkins Master
          * "Shutdown Only (Do Not Delete) After Retention Time" option is enabled

          Such an agent node can have two flags shown in the Jenkins sidebar: "offline" and "suspended". While everything runs as expected, we do not manually start/stop the agent VMs.
          However, the stop (i.e. deallocate) command triggered by the Azure VM Agents Plugin once the retention time is up results in the JNLP connection to get closed before Jenkins marks the node as "suspended". When a job that is supposed to be built on that agents is triggered, the node (which is shown as "offline", but not "suspended") is never started by the plugin. The job waits indefinitely until it is cancelled or the agents is being started manually (i.e. via the Azure Portal or CLI).

          This doesn't happen all the time. Sometimes, the node is marked as "suspended" before the JNLP connection is closed and the agent is being started the next time it is required -- i.e. as expected.

          Chenyang Liu added a comment -

          Does this issue happen only in the new version(0.6.0)?

          Chenyang Liu added a comment - Does this issue happen only in the new version(0.6.0)?

          The "Shutdown Only (Do Not Delete) After Retention Time" option didn't deallocate the VMs prior to version 0.6.0 (but only triggered an OS shutdown).
          Since that didn't help in avoiding costs, I never used the option with an older version of the plugin.

          Carsten Wickner added a comment - The "Shutdown Only (Do Not Delete) After Retention Time" option didn't deallocate the VMs prior to version 0.6.0 (but only triggered an OS shutdown). Since that didn't help in avoiding costs, I never used the option with an older version of the plugin.
          Azure DevOps made changes -
          Assignee Original: Azure DevOps [ azure_devops ] New: Chenyang Liu [ zackliu ]
          Pui Chee Chan made changes -
          Labels New: azure-jenkins in-progress

          As we add more Azure VM's this issue start to be blocker for our CI. 

          Jakub Michalec added a comment - As we add more Azure VM's this issue start to be blocker for our CI. 

          AzureVMCloudRetensionStrategy: check: Idle timeout reached for agent: jenkins-27ba90, action: shutdown
          Mar 19, 2018 8:43:09 AM INFO com.microsoft.azure.vmagent.AzureVMCloudRetensionStrategy check
          AzureVMCloudRetensionStrategy: check: Idle timeout reached for agent: jenkins-d971b0, action: shutdown
          Mar 19, 2018 8:43:09 AM INFO com.microsoft.azure.vmagent.AzureVMCloudRetensionStrategy check
          AzureVMCloudRetensionStrategy: check: Idle timeout reached for agent: jenkins-dccd40, action: shutdown
          Mar 19, 2018 8:43:09 AM INFO com.microsoft.azure.vmagent.AzureVMCloudRetensionStrategy$1 call
          AzureVMCloudRetensionStrategy: going to idleTimeout agent: jenkins-27ba90
          Mar 19, 2018 8:43:09 AM INFO com.microsoft.azure.vmagent.AzureVMAgent shutdown
          AzureVMAgent: shutdown: agent jenkins-27ba90 is always shut down
          Mar 19, 2018 8:43:09 AM INFO com.microsoft.azure.vmagent.AzureVMCloudRetensionStrategy$1 call
          AzureVMCloudRetensionStrategy: going to idleTimeout agent: jenkins-dccd40
          Mar 19, 2018 8:43:09 AM INFO com.microsoft.azure.vmagent.AzureVMCloudRetensionStrategy$1 call
          AzureVMCloudRetensionStrategy: going to idleTimeout agent: jenkins-d971b0
          Mar 19, 2018 8:43:09 AM INFO com.microsoft.azure.vmagent.AzureVMAgent shutdown
          AzureVMAgent: shutdown: agent jenkins-dccd40 is always shut down
          Mar 19, 2018 8:43:09 AM INFO com.microsoft.azure.vmagent.AzureVMAgent shutdown
          AzureVMAgent: shutdown: agent jenkins-d971b0 is always shut down

           

          but all VM's keep running, I need to manually close them down.

           

          Jakub Michalec added a comment - AzureVMCloudRetensionStrategy: check: Idle timeout reached for agent: jenkins-27ba90, action: shutdown Mar 19, 2018 8:43:09 AM INFO com.microsoft.azure.vmagent.AzureVMCloudRetensionStrategy check AzureVMCloudRetensionStrategy: check: Idle timeout reached for agent: jenkins-d971b0, action: shutdown Mar 19, 2018 8:43:09 AM INFO com.microsoft.azure.vmagent.AzureVMCloudRetensionStrategy check AzureVMCloudRetensionStrategy: check: Idle timeout reached for agent: jenkins-dccd40, action: shutdown Mar 19, 2018 8:43:09 AM INFO com.microsoft.azure.vmagent.AzureVMCloudRetensionStrategy$1 call AzureVMCloudRetensionStrategy: going to idleTimeout agent: jenkins-27ba90 Mar 19, 2018 8:43:09 AM INFO com.microsoft.azure.vmagent.AzureVMAgent shutdown AzureVMAgent: shutdown: agent jenkins-27ba90 is always shut down Mar 19, 2018 8:43:09 AM INFO com.microsoft.azure.vmagent.AzureVMCloudRetensionStrategy$1 call AzureVMCloudRetensionStrategy: going to idleTimeout agent: jenkins-dccd40 Mar 19, 2018 8:43:09 AM INFO com.microsoft.azure.vmagent.AzureVMCloudRetensionStrategy$1 call AzureVMCloudRetensionStrategy: going to idleTimeout agent: jenkins-d971b0 Mar 19, 2018 8:43:09 AM INFO com.microsoft.azure.vmagent.AzureVMAgent shutdown AzureVMAgent: shutdown: agent jenkins-dccd40 is always shut down Mar 19, 2018 8:43:09 AM INFO com.microsoft.azure.vmagent.AzureVMAgent shutdown AzureVMAgent: shutdown: agent jenkins-d971b0 is always shut down   but all VM's keep running, I need to manually close them down.  

          Chenyang Liu added a comment -

          carstenenglert We try to fix this issue, but I tried many time, it always works well. So, please provide me more details. Does the issue happen more frequently when there are more nodes? Will the issue happen when using SSH?

          Chenyang Liu added a comment - carstenenglert  We try to fix this issue, but I tried many time, it always works well. So, please provide me more details. Does the issue happen more frequently when there are more nodes? Will the issue happen when using SSH?

            zackliu Chenyang Liu
            carstenenglert Carsten Wickner
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: