NOTE: I've added a reproducer test case, and this issue matches my experience from testing it locally when I see the rollback work, and then still re-upgrade to the tainted level after going away from it previously. But I'm still not 100% sure of the diagnosis, so keep this in mind below.
When an instance taints a given update level (after a failure to upgrade), then it correctly gets the previous non tainted UL.
But then, when asking the backend again, passing the previous UL as parameter (i.e. the good one, the one before the just tainted one), then the backend will still serve an answer to upgrade to the tainted level.
From a user standpoint, this will surface through the following behavior:
- Imagine we are on UL 50.
- UL51 is pushed
- Instance updates to UL51 and is unable to restart
- Instance informs the backend UL51 was a failure (aka taints it)
- Instance goes back to UL50 from UL51
- Instance updates to UL51, fails...
- Loop back to a few lines above, and so on. I.e. the instance will do UL50->UL51->UL50->UL51, etc.
Once per-instance tainted, a given UL should never be proposed again to a given instance.