-
Bug
-
Resolution: Unresolved
-
Major
-
None
Hello maintainers,
We have several Multibranch Pipelines defined in our Jenkins instance. These pipelines are associated to several repositories in a private GitLab server. We have set these pipelines to observe these repos' Merge Requests (MRs).
Everything works most of the times but, from time to time, we see that a pipeline associated to a MR ceases to get triggered in response to events from GitLab. At the same time, new pipelines also cease to appear when new MRs are created for that project.
We also found that:
1. Repositories in this state show an Exception stack trace like the following under "Scan GitLab Project Log":
Legacy code started this job. No cause information is available [Sat Jan 11 15:52:05 UTC 2025] Starting branch indexing... ERROR: [Sat Jan 11 15:54:16 UTC 2025] Could not update folder level actions from source REDACTED-REPO-NAME [Sat Jan 11 15:54:16 UTC 2025] Finished branch indexing. Indexing took 2 min 11 sec FATAL: Failed to recompute children of REDACTED » REPO » NAME java.net.ConnectException: Connection timed out at java.base/sun.nio.ch.Net.connect0(Native Method) at java.base/sun.nio.ch.Net.connect(Net.java:589) at java.base/sun.nio.ch.Net.connect(Net.java:578) at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:583) at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327) at java.base/java.net.Socket.connect(Socket.java:757) at java.base/sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:304) at java.base/sun.net.NetworkClient.doConnect(NetworkClient.java:178) at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:531) at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:636) at java.base/sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:264) at java.base/sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:377) at java.base/sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:193) at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1255) at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1141) at java.base/sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:179) at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1693) at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1617) at java.base/java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:531) at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:307) at PluginClassLoader for jersey2-api//org.glassfish.jersey.client.internal.HttpUrlConnector._apply(HttpUrlConnector.java:449) at PluginClassLoader for jersey2-api//org.glassfish.jersey.client.internal.HttpUrlConnector.apply(HttpUrlConnector.java:286) Caused: javax.ws.rs.ProcessingException at PluginClassLoader for jersey2-api//org.glassfish.jersey.client.internal.HttpUrlConnector.apply(HttpUrlConnector.java:288) at PluginClassLoader for jersey2-api//org.glassfish.jersey.client.ClientRuntime.invoke(ClientRuntime.java:300) at PluginClassLoader for jersey2-api//org.glassfish.jersey.client.JerseyInvocation.lambda$invoke$0(JerseyInvocation.java:674) at PluginClassLoader for jersey2-api//org.glassfish.jersey.client.JerseyInvocation.call(JerseyInvocation.java:709) at PluginClassLoader for jersey2-api//org.glassfish.jersey.client.JerseyInvocation.lambda$runInScope$3(JerseyInvocation.java:703) at PluginClassLoader for jersey2-api//org.glassfish.jersey.internal.Errors.process(Errors.java:292) at PluginClassLoader for jersey2-api//org.glassfish.jersey.internal.Errors.process(Errors.java:274) at PluginClassLoader for jersey2-api//org.glassfish.jersey.internal.Errors.process(Errors.java:205) at PluginClassLoader for jersey2-api//org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:390) at PluginClassLoader for jersey2-api//org.glassfish.jersey.client.JerseyInvocation.runInScope(JerseyInvocation.java:703) at PluginClassLoader for jersey2-api//org.glassfish.jersey.client.JerseyInvocation.invoke(JerseyInvocation.java:673) at PluginClassLoader for jersey2-api//org.glassfish.jersey.client.JerseyInvocation$Builder.method(JerseyInvocation.java:413) at PluginClassLoader for jersey2-api//org.glassfish.jersey.client.JerseyInvocation$Builder.get(JerseyInvocation.java:313) at PluginClassLoader for gitlab-api//org.gitlab4j.api.GitLabApiClient.get(GitLabApiClient.java:416) at PluginClassLoader for gitlab-api//org.gitlab4j.api.GitLabApiClient.get(GitLabApiClient.java:404) at PluginClassLoader for gitlab-api//org.gitlab4j.api.AbstractApi.get(AbstractApi.java:214) Caused: org.gitlab4j.api.GitLabApiException: java.net.ConnectException: Connection timed out at PluginClassLoader for gitlab-api//org.gitlab4j.api.AbstractApi.handle(AbstractApi.java:700) at PluginClassLoader for gitlab-api//org.gitlab4j.api.AbstractApi.get(AbstractApi.java:216) at PluginClassLoader for gitlab-api//org.gitlab4j.api.ProjectApi.getProject(ProjectApi.java:748) at PluginClassLoader for gitlab-api//org.gitlab4j.api.ProjectApi.getProject(ProjectApi.java:680) at PluginClassLoader for gitlab-branch-source//io.jenkins.plugins.gitlabbranchsource.GitLabSCMSource.getGitlabProject(GitLabSCMSource.java:214) Caused: java.lang.IllegalStateException: Failed to retrieve project redacted/repo/name at PluginClassLoader for gitlab-branch-source//io.jenkins.plugins.gitlabbranchsource.GitLabSCMSource.getGitlabProject(GitLabSCMSource.java:219) at PluginClassLoader for gitlab-branch-source//io.jenkins.plugins.gitlabbranchsource.GitLabSCMSource.getGitlabProject(GitLabSCMSource.java:206) at PluginClassLoader for gitlab-branch-source//io.jenkins.plugins.gitlabbranchsource.GitLabSCMSource.retrieveActions(GitLabSCMSource.java:612) at PluginClassLoader for scm-api//jenkins.scm.api.SCMSource.fetchActions(SCMSource.java:847) at PluginClassLoader for branch-api//jenkins.branch.MultiBranchProject.computeChildren(MultiBranchProject.java:611) at PluginClassLoader for cloudbees-folder//com.cloudbees.hudson.plugins.folder.computed.ComputedFolder.updateChildren(ComputedFolder.java:269) at PluginClassLoader for cloudbees-folder//com.cloudbees.hudson.plugins.folder.computed.FolderComputation.run(FolderComputation.java:167) at PluginClassLoader for branch-api//jenkins.branch.MultiBranchProject$BranchIndexing.run(MultiBranchProject.java:1057) at hudson.model.ResourceController.execute(ResourceController.java:101) at hudson.model.Executor.run(Executor.java:446) Finished: FAILURE
This hints that the "broken link" between the pipeline and the GitLab project starts occurring after a connection error between GitLab and Jenkins. These connection timeouts are sporadic and should be expected, as we have a somewhat widespread infrastructure.
2. Comparing the config.xml file of a working VS a non-triggering pipeline, we see that the following elements are missing in the non-triggering one:
diff config.xml config-affected.xml 77c77,79 < <sshRemote>REDACTED-PROJECT-REMOTE-SSH</sshRemote> < <httpRemote>REDACTED-PROJECT-REMOTE-HTTP</httpRemote> < <projectId>999</projectId> --- > <projectId>0</projectId>
3. We found a weak workaround for this: just clicking "Scan GitLab Project Now" in the Multibranch Pipeline UI makes these missing entries reappear in the config.xml file and the pipeline starts working as expected again.
This is not great, as we have hundreds of users, who keep asking what is going on when the pipelines are not triggered, and we keep on having to direct them about that workaround.
4. Coincidentally (or not), repositories in a "broken link" state also display apparently incomplete or inconsistent logs in the "Multibranch Pipeline Events" page. We looked through these log lines in this plugin's source-code, and ended up in the GitLabSCMSource#retrieve(SCMSourceCriteria, SCMHeadObserver, SCMHeadEvent<?>, TaskListener) method.
Looking through this method, we seem to be pretty sure that, when there are connection errors anywhere while executing it:
4.1 The execution of the method is interrupted, falling into the catch block at its end.
4.2 Most suspiciously: when getGitlabProject(gitLabApi) is called, at the very beginning of the retrieve() method, the sshRemote, httpRemote, and the projectId members can be null (those that we see as unfilled in the config.xml file). Consequently, when owner.save() is called at the end of the function (in the finally block), the pipeline's config.xml file is written with those members still unset.
5. To work around and validate a fix to this, we patched the gitlab-branch-source-plugin so that at all points where it communicates with GitLab and exceptions may be thrown, those are caught and the config.xml file is not written with an incomplete state.
We have been running with the attached patch for one month now and we have seen no instances of the same problem occurring again — before, we would see it happening every day.
I am also creating a pull request in the GitHub project with the patch so you can analyse it more properly.
All of this was the result of some investigation on our part: we think we have headed in the right direction and the proposed patch is sound. We would appreciate any guidance if you find something is not right in our reasoning and/or the patch is inappropriate.
Thank you!