We set the retention time for one of our images to 1 hour, but the plan itself takes up to 4 hours.
The node gets removed after the retention time but the node is not idle nor failing. After the node removal, the plan is left running and not doing anything forever (we have plans running for days).
Is a huge problem since the nodes we spin up are very expensive so we want to keep them alive as less as possible to perform the build and shutdown if there are no more builds queuing. So our current workaround of setting the retention time higher than the build duration is more than sub-optimal.
See attached pictures for more info.
I just read a comment by Jack Chen pointing a possible problem having a test instance of Jenkins using the same configuration.
https://wiki.jenkins.io/display/JENKINS/Azure+VM+Agents+plugin
I will try disconnecting the second instance from Azure to see if the problem persist and drop here my findings for the record.