Uploaded image for project: 'Infrastructure'
  1. Infrastructure
  2. INFRA-1695

Azure Vm agent aren't provisionned

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      ci.jenkins.io doesn't provision agents anymore
      This seems to be caused by the fact that sometimes agent aren't correctly deleted and are stuck in a 'failed' state.
      Those 'failed' agents are taken into account in the limit of Jenkins agent vms (configured by the plugin).
      This morning we had 65 broken agents which was the hard limit configured from Jenkins.

      Remark: We also have 'broken' agents in trusted.ci

        Attachments

          Activity

          olblak Olivier Vernin created issue -
          olblak Olivier Vernin made changes -
          Field Original Value New Value
          Status Open [ 1 ] In Progress [ 3 ]
          olblak Olivier Vernin made changes -
          Description ci.jenkins.io doesn't provision agents anymore
          This seems to be caused by the fact that *sometimes* agent aren't correctly deleted and are stuck in a 'failed' state.
          Those 'failed' agents are taken into account in the limit of Jenkins agent vms (configured by the plugin).
          This morning we had 65 broken agents which was the hard limit configured from Jenkins.

          Remark: We also have 'broken' agent in trusted.ci
          ci.jenkins.io doesn't provision agents anymore
          This seems to be caused by the fact that *sometimes* agent aren't correctly deleted and are stuck in a 'failed' state.
          Those 'failed' agents are taken into account in the limit of Jenkins agent vms (configured by the plugin).
          This morning we had 65 broken agents which was the hard limit configured from Jenkins.

          Remark: We also have 'broken' agents in trusted.ci
          Hide
          olblak Olivier Vernin added a comment - - edited

          I provided to the azure support, a list of virtual machines that need to be deleted.
          But of course this won't prevent the issue to appear again in the futur if the root cause is not identify.
          I also opened a ticket for the plugin JENKINS-52317

          Show
          olblak Olivier Vernin added a comment - - edited I provided to the azure support, a list of virtual machines that need to be deleted. But of course this won't prevent the issue to appear again in the futur if the root cause is not identify. I also opened a ticket for the plugin JENKINS-52317
          Hide
          olblak Olivier Vernin added a comment -

          News from Microsoft


          Hello Olivier;

          Greetings from Microsoft.

          Thank you for your time on the call.

          Please note that the current case update is as follows: The engineering team has identified the cause of the issue to be related to storage accounts where the vhds for the VM’s are hosted, the storage account in inaccessible and to fix the issue we will be deploying a hotfix from the backend.

          As these steps do involve a lot of RND and testing so we should be able to roll out the hotfix expected time frame is by Friday.

          We will keep you posted on the development as and when they appear.

          We thank you once again for your valuable patience and cooperation.

          Please do email me in case if you need any further help on the case.


          Show
          olblak Olivier Vernin added a comment - News from Microsoft Hello Olivier; Greetings from Microsoft. Thank you for your time on the call. Please note that the current case update is as follows: The engineering team has identified the cause of the issue to be related to storage accounts where the vhds for the VM’s are hosted, the storage account in inaccessible and to fix the issue we will be deploying a hotfix from the backend. As these steps do involve a lot of RND and testing so we should be able to roll out the hotfix expected time frame is by Friday. We will keep you posted on the development as and when they appear. We thank you once again for your valuable patience and cooperation. Please do email me in case if you need any further help on the case.
          Hide
          olblak Olivier Vernin added a comment -

          I was able to delete all failed VM, and I cleaned up the node list on ci.jenkins.io

          Show
          olblak Olivier Vernin added a comment - I was able to delete all failed VM, and I cleaned up the node list on ci.jenkins.io
          olblak Olivier Vernin made changes -
          Resolution Fixed [ 1 ]
          Status In Progress [ 3 ] Resolved [ 5 ]
          olblak Olivier Vernin made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            Assignee:
            olblak Olivier Vernin
            Reporter:
            olblak Olivier Vernin
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: