I have a similar issue that happens occasionally but often enough to be a nuisance.
The Jenkins pipeline log throws the message:
"Failed to get job status from Tower: Unexpected error code returned (503)"
Both AWX and Jenkins run within Openshift.
AWX is typically not very busy - we rarely have more than 3 jobs running at the same time.
The issue seems to happen most frequently while AWX is busy waiting for a deployment to Openshift to finish.
I have not found any useful logs either in AWX or in Jenkins, but maybe I'm not looking in the right location. The closest I found was this, and I don't know if it is related or not:
[Ansible-Tower] Building GET request to https://awxserver/api/v2/jobs/57602/
[Ansible-Tower] Forcing cert trust
[Ansible-Tower] Request completed with (503)
[Ansible-Tower] Deleting oAuth token 15396 for awx[Ansible-Tower] Forcing cert trust
[Ansible-Tower] Calling for oAuth token delete at https://awxserver/api/v2/tokens/15396/
[Ansible-Tower] Request completed with (200)
We are running AWX 9.
The deployment part on which AWX is waiting typically takes 10-15 minutes, the rest of the job that involves AWX takes maybe an additional 5 minutes.
We are using a pipeline, and we are not using async at this time.
I'd also appreciate a retry feature or some advice on how to figure out this issue as it fails our pipelines randomly even though the AWX job may complete successfully.