Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-64112

VM agents plugin can't wake up nodes those are suspended occassionally

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Duplicate
    • Component/s: _unsorted
    • Labels:
      None
    • Similar Issues:

      Description

      Null exception occurs like below log

      Unexpected exception encountered while provisioning agent ltaz-jenkins-node7ef250java.lang.NullPointerException        at com.microsoft.azure.vmagent.AzureVMCloud.getLockForAgent(AzureVMCloud.java:1017)        at com.microsoft.azure.vmagent.AzureVMCloud.lambda$provision$1(AzureVMCloud.java:682)        at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)        at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)        at java.util.concurrent.FutureTask.run(FutureTask.java:266)        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)        at java.lang.Thread.run(Thread.java:748)  

      https://github.com/jenkinsci/azure-vm-agents-plugin/blob/dev/src/main/java/com/microsoft/azure/vmagent/AzureVMCloud.java#L1016

      Could you please have a look this case?

      Thank you. 

        Attachments

          Activity

          Hide
          rakz Zoltan added a comment -

          Happens also on my side plugin version 1.5.1 Jenkins version Jenkins 2.263.1.

          Actually what I observed that with smaller retention time it is able to wake up the agent.

          After a full shutdown or a retention time with 60 minutes I also got a NP:

          java.lang.NullPointerException
          	at com.microsoft.azure.vmagent.AzureVMCloud.getLockForAgent(AzureVMCloud.java:1017)
          	at com.microsoft.azure.vmagent.AzureVMCloud.lambda$provision$1(AzureVMCloud.java:682)
          	at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
          	at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
          	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          	at java.lang.Thread.run(Thread.java:748)
          

          Only workaround is delete agent (which works), cancel build, trigger a new one which will successfully provision a new agent.

          Would be really good to have automated workaround as the retention startegy without delete is exactly what I need for my agent to remain with local maven and docker caches.

          Show
          rakz Zoltan added a comment - Happens also on my side plugin version 1.5.1 Jenkins version  Jenkins 2.263.1 . Actually what I observed that with smaller retention time it is able to wake up the agent. After a full shutdown or a retention time with 60 minutes I also got a NP: java.lang.NullPointerException at com.microsoft.azure.vmagent.AzureVMCloud.getLockForAgent(AzureVMCloud.java:1017) at com.microsoft.azure.vmagent.AzureVMCloud.lambda$provision$1(AzureVMCloud.java:682) at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46) at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang. Thread .run( Thread .java:748) Only workaround is delete agent (which works), cancel build, trigger a new one which will successfully provision a new agent. Would be really good to have automated workaround as the retention startegy without delete is exactly what I need for my agent to remain with local maven and docker caches.
          Hide
          rakz Zoltan added a comment -

          Hi just looking into the code, and starting from transient modifier would it be enough to, add initializer to readResolve

          https://github.com/jenkinsci/azure-vm-agents-plugin/blob/60f68e98405973d80ca0e31ee557d0ce2e26ad24/src/main/java/com/microsoft/azure/vmagent/AzureVMCloud.java#L187

           

          at line before azureClients are also initliazed

          https://github.com/jenkinsci/azure-vm-agents-plugin/blob/60f68e98405973d80ca0e31ee557d0ce2e26ad24/src/main/java/com/microsoft/azure/vmagent/AzureVMCloud.java#L197

          like:

           

          agentLocks = new HashMap<>();

          A simple synchronized call on the object fails and with transient objects that is common when deserializing.

          Is a possible way for me to contribute and maybe create a preview release?

           

          Show
          rakz Zoltan added a comment - Hi just looking into the code, and starting from transient modifier would it be enough to, add initializer to readResolve https://github.com/jenkinsci/azure-vm-agents-plugin/blob/60f68e98405973d80ca0e31ee557d0ce2e26ad24/src/main/java/com/microsoft/azure/vmagent/AzureVMCloud.java#L187   at line before azureClients are also initliazed https://github.com/jenkinsci/azure-vm-agents-plugin/blob/60f68e98405973d80ca0e31ee557d0ce2e26ad24/src/main/java/com/microsoft/azure/vmagent/AzureVMCloud.java#L197 like:   agentLocks = new HashMap<>(); A simple synchronized call on the object fails and with transient objects that is common when deserializing. Is a possible way for me to contribute and maybe create a preview release?  
          Hide
          rakz Zoltan added a comment -

          Hi, in the meantime I was just so curious, that I removed the final modifier and added this init line to the readResolve method.

          I can confirm now it is working. kangjin jun if you want to test just download the current master add that line where clients are also reinitialized and run an mvn package. In the target folder install the .hpi file in jenkins manage plugin adavnce tab by uploading it.

          Show
          rakz Zoltan added a comment - Hi, in the meantime I was just so curious, that I removed the final modifier and added this init line to the readResolve method. I can confirm now it is working. kangjin jun  if you want to test just download the current master add that line where clients are also reinitialized and run an mvn package. In the target folder install the .hpi file in jenkins manage plugin adavnce tab by uploading it.
          Hide
          timja Tim Jacomb added a comment -

          All issues have been transferred to GitHub.

          See https://github.com/jenkinsci/azure-vm-agents-plugin/issues

          Search the issue title to find it.

          (This is a bulk comment and can't link to the specific issue)

          Show
          timja Tim Jacomb added a comment - All issues have been transferred to GitHub. See https://github.com/jenkinsci/azure-vm-agents-plugin/issues Search the issue title to find it. (This is a bulk comment and can't link to the specific issue)

            People

            Assignee:
            azure_devops Azure DevOps
            Reporter:
            kangjin98 kangjin jun
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: