Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-70163

GitHub Branch Source clients are caching proxy configuration DNS / IPs

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • None
    • github-branch-source:1696.v3a_7603564d04
    • github-branch-source:1728.v859147241f49

      When using GitHub Branch Source with a Jenkins Proxy configuration, if the IP of the proxy server changes GitHub Branch Source still uses the previous IP.

      This has been reproduced in Kubernetes with a squid proxy deployment and service deployed in k8s.

      • Set the Jenkins proxy to the Kubernetes Service DNS of the proxy and port
      • Then I do the following:
      • build a multibranch job and wait for it to complete (retrieve the Jenkinsfile and does a checkout)
      • delete the squid proxy pods and service
      • recreate the squid service
      • build the same multibranch job

      We see a SocketTimeoutException.

      java.net.SocketTimeoutException: connect timed out
      	at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
      	at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:412)
      	at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:255)
      	at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:237)
      	at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
      	at java.base/java.net.Socket.connect(Socket.java:609)
      	at okhttp3.internal.platform.Platform.connectSocket(Platform.kt:128)
      	at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.kt:295)
      	at okhttp3.internal.connection.RealConnection.connectTunnel(RealConnection.kt:261)
      	at okhttp3.internal.connection.RealConnection.connect(RealConnection.kt:201)
      	at okhttp3.internal.connection.ExchangeFinder.findConnection(ExchangeFinder.kt:226)
      	at okhttp3.internal.connection.ExchangeFinder.findHealthyConnection(ExchangeFinder.kt:106)
      	at okhttp3.internal.connection.ExchangeFinder.find(ExchangeFinder.kt:74)
      	at okhttp3.internal.connection.RealCall.initExchange$okhttp(RealCall.kt:255)
      	at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.kt:32)
      	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
      	at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.kt:95)
      	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
      	at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.kt:83)
      	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
      	at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.kt:76)
      	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
      	at org.kohsuke.github.extras.okhttp3.ObsoleteUrlFactory$UnexpectedException.lambda$static$0(ObsoleteUrlFactory.java:1365)
      	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
      	at okhttp3.internal.connection.RealCall.getResponseWithInterceptorChain$okhttp(RealCall.kt:201)
      	at okhttp3.internal.connection.RealCall.execute(RealCall.kt:154)
      	at org.kohsuke.github.extras.okhttp3.ObsoleteUrlFactory$OkHttpURLConnection.getResponse(ObsoleteUrlFactory.java:671)
      	at org.kohsuke.github.extras.okhttp3.ObsoleteUrlFactory$OkHttpURLConnection.getResponseCode(ObsoleteUrlFactory.java:702)
      	at org.kohsuke.github.extras.okhttp3.ObsoleteUrlFactory$DelegatingHttpsURLConnection.getResponseCode(ObsoleteUrlFactory.java:1064)
      	at org.kohsuke.github.internal.GitHubConnectorHttpConnectorAdapter.send(GitHubConnectorHttpConnectorAdapter.java:87)
      	at org.kohsuke.github.GitHubClient.sendRequest(GitHubClient.java:384)
      Caused: org.kohsuke.github.HttpException: Server returned HTTP response code: -1, message: 'null' for URL: https://api.github.com/
      	at org.kohsuke.github.GitHubClient.interpretApiError(GitHubClient.java:548)
      	at org.kohsuke.github.GitHubClient.sendRequest(GitHubClient.java:399)
      	at org.kohsuke.github.GitHubClient.fetch(GitHubClient.java:121)
      	at org.kohsuke.github.GitHubClient.checkApiUrlValidity(GitHubClient.java:323)
      	at org.kohsuke.github.GitHub.checkApiUrlValidity(GitHub.java:1244)
      	at org.jenkinsci.plugins.github_branch_source.ApiRateLimitChecker.verifyConnection(ApiRateLimitChecker.java:199)
      	at org.jenkinsci.plugins.github_branch_source.Connector$GitHubConnection.verifyConnection(Connector.java:794)
      	at org.jenkinsci.plugins.github_branch_source.Connector.connect(Connector.java:447)
      	at org.jenkinsci.plugins.github_branch_source.GitHubSCMSource.retrieve(GitHubSCMSource.java:1679)
      	at jenkins.scm.api.SCMSource.fetch(SCMSource.java:582)
      	at org.jenkinsci.plugins.workflow.multibranch.SCMBinder.create(SCMBinder.java:101)
      	at org.jenkinsci.plugins.workflow.job.WorkflowRun.run(WorkflowRun.java:311)
      	at hudson.model.ResourceController.execute(ResourceController.java:107)
      	at hudson.model.Executor.run(Executor.java:449)
      

      Restarting the controller fixes this.

      Notes: this is reproducible whether the okhttp disk cache is disabled with org.jenkinsci.plugins.github_branch_source.GitHubSCMSource.cacheSize=0 or not.

          [JENKINS-70163] GitHub Branch Source clients are caching proxy configuration DNS / IPs

          Attaching some groovy script that I used to troubleshoot this. Testing okhttpclient, github-api client and GH Connector in "isolation". I am only able to reproduce the problem in the case of the GH Branch Source Connector. That lead me to the conclusion that this was a GH Branch Source issue.
          I eventually built a snapshot where I removed the cached client maps at https://github.com/Dohbedoh/github-branch-source-plugin/commit/6901d5344c9352eaf37d90682fbf8b279c8ed5c5 and with this I am not able to reproduce anymore. Hopefully that helps narrow this down.

          Allan BURDAJEWICZ added a comment - Attaching some groovy script that I used to troubleshoot this. Testing okhttpclient, github-api client and GH Connector in "isolation". I am only able to reproduce the problem in the case of the GH Branch Source Connector. That lead me to the conclusion that this was a GH Branch Source issue. I eventually built a snapshot where I removed the cached client maps at https://github.com/Dohbedoh/github-branch-source-plugin/commit/6901d5344c9352eaf37d90682fbf8b279c8ed5c5 and with this I am not able to reproduce anymore. Hopefully that helps narrow this down.

          Further note: some users for whom I started to troubleshooted this are also seeing {{java.net.NoRouteToHostException: No route to host (Host unreachable)}. I was not able to reproduce this particular exception, maybe it related to the environment (DNS server / etc..), though the symptoms are the same (also consistently reproduced when recycling the proxy deployment that causes a change of IP) and the issue also resolved by restarting Jenkins.

          Allan BURDAJEWICZ added a comment - Further note: some users for whom I started to troubleshooted this are also seeing {{java.net.NoRouteToHostException: No route to host (Host unreachable)}. I was not able to reproduce this particular exception, maybe it related to the environment (DNS server / etc..), though the symptoms are the same (also consistently reproduced when recycling the proxy deployment that causes a change of IP) and the issue also resolved by restarting Jenkins.

          I see several issue around this:

          • we keep a base client that has a reference to the ProxyConfiguration. This is done on the JVM startup. It looks like changes to the proxy configuration would not be picked up by it. The JenkinsOkHttpClient definitely set the proxy when we create the new builder. And not dynamically in the ProxySelector. So we can’t rely on keeping a base client like this.
          • all the cached client in the cache maps are not invalidated in case the ProxyConfiguration changes in Jenkins.

          Those are easily addressable. It’s about changes or configuration in Jenkins. But not really what interest us here.

          The other side of the problem and, what the issue is about here, are environmental changes. In particular routes that the OkHttpClient stack might be caching / tracking and that have changed. And disabling the disk cache (cache of request/response) has no effect. So this is really about the okhttp client. And those are difficult to handle at higher level. I am not sure why okhttp does not deal with it in the context of the GitHub client. Because in the context of the Kubernetes plugin and the k8s client, we don’t seem to face this problem. Maybe fabric8 does something about this…

          We could disable - or allow disabling - the cache of the GitHub connection though I am not sure of the impact in large environments. The revalidation / rate api limit checks are also tremendously reduced by the current mechanism. That help to save calls against the GitHub API… Even the GitHub object also caches users / organizations for example. So just removing this is not an option.

          I was wondering if maybe we could use Okhttp Interceptors to handle this. It could help knowing when to invalidate our cache. But we would need a retry mechanism as well. Though at GH branch Source level ? really ? That seems wrong. Retries are already done in the github-api…

          Allan BURDAJEWICZ added a comment - I see several issue around this: we keep a base client that has a reference to the ProxyConfiguration . This is done on the JVM startup. It looks like changes to the proxy configuration would not be picked up by it. The JenkinsOkHttpClient definitely set the proxy when we create the new builder. And not dynamically in the ProxySelector . So we can’t rely on keeping a base client like this. all the cached client in the cache maps are not invalidated in case the ProxyConfiguration changes in Jenkins. Those are easily addressable. It’s about changes or configuration in Jenkins. But not really what interest us here. The other side of the problem and, what the issue is about here, are environmental changes. In particular routes that the OkHttpClient stack might be caching / tracking and that have changed. And disabling the disk cache (cache of request/response) has no effect. So this is really about the okhttp client. And those are difficult to handle at higher level. I am not sure why okhttp does not deal with it in the context of the GitHub client. Because in the context of the Kubernetes plugin and the k8s client, we don’t seem to face this problem. Maybe fabric8 does something about this… We could disable - or allow disabling - the cache of the GitHub connection though I am not sure of the impact in large environments. The revalidation / rate api limit checks are also tremendously reduced by the current mechanism. That help to save calls against the GitHub API… Even the GitHub object also caches users / organizations for example. So just removing this is not an option. I was wondering if maybe we could use Okhttp Interceptors to handle this. It could help knowing when to invalidate our cache. But we would need a retry mechanism as well. Though at GH branch Source level ? really ? That seems wrong. Retries are already done in the github-api…

          Testing OkHttp 3.x and 4.x in isolation, this is actually reproducible. And only happens with the DNS resolution changes for the proxy (does not happen when DNS resolution changes for the request URL and not suing proxy for example). Therefore I have created a project and instructions to reproduce this and open an issue with okhttp to understand what is going on (have them confirm the bug, or tell us what is expected from a consumer perspective): https://github.com/square/okhttp/issues/7698.

          I can see how we could workaround that problem, invalidating records in our cached clients maps when we detects such kind of exception AND a proxy is in used, probably using interceptors to detect this. But, due to the abstractions between GH Branch Source / Github API down to OkHttp, not 100% yet how we can handle this flawlessly... So that clients get recycled automatically and the GitHub Branch Source looks unimpacted.

          Allan BURDAJEWICZ added a comment - Testing OkHttp 3.x and 4.x in isolation, this is actually reproducible. And only happens with the DNS resolution changes for the proxy (does not happen when DNS resolution changes for the request URL and not suing proxy for example). Therefore I have created a project and instructions to reproduce this and open an issue with okhttp to understand what is going on (have them confirm the bug, or tell us what is expected from a consumer perspective): https://github.com/square/okhttp/issues/7698 . I can see how we could workaround that problem, invalidating records in our cached clients maps when we detects such kind of exception AND a proxy is in used, probably using interceptors to detect this. But, due to the abstractions between GH Branch Source / Github API down to OkHttp, not 100% yet how we can handle this flawlessly... So that clients get recycled automatically and the GitHub Branch Source looks unimpacted.

          As a workaround, one may use reflection to clear the cached clients:

          Class<?> innerClazz = Class.forName("org.jenkinsci.plugins.github_branch_source.Connector\$GitHubConnection");
          java.lang.reflect.Method privateMethod = innerClazz.getDeclaredMethod("removeAllUnused", long.class);
          privateMethod.setAccessible(true);
          privateMethod.invoke(null, System.currentTimeMillis());
          

          Allan BURDAJEWICZ added a comment - As a workaround, one may use reflection to clear the cached clients: Class <?> innerClazz = Class .forName( "org.jenkinsci.plugins.github_branch_source.Connector\$GitHubConnection" ); java.lang.reflect.Method privateMethod = innerClazz.getDeclaredMethod( "removeAllUnused" , long .class); privateMethod.setAccessible( true ); privateMethod.invoke( null , System .currentTimeMillis());

          The previous script would not clear the clients if they are in use. The following would force clear:

          import java.lang.reflect.Field
          
          Class<?> connectorClazz = org.jenkinsci.plugins.github_branch_source.Connector.class
          Field connectionsField = connectorClazz.getDeclaredField("connections");
          connectionsField.setAccessible(true);
          Map connections = (Map) connectionsField.get(null);
          Field reverseLookupField = connectorClazz.getDeclaredField("reverseLookup");
          reverseLookupField.setAccessible(true);
          Map reverseLookup = (Map) reverseLookupField.get(null);
          
          println "Before cleanup"
          println " * Connections cache size:" + connections.size()
          println " * Reverse Lookup cache size:" + reverseLookup.size()
          
          connections.each { entry ->
            def record = entry.getValue()
            try {
              if (record.cache != null && record.cleanupCacheFolder) {
                record.cache.delete();
                record.cache.close();
              }
            } catch (IOException e) {
              println "Exception removing cache directory for connection ${entry.getKey()}: " + e.getMessage()
            }
          }
          connections.clear()
          reverseLookup.clear()
          
          println "After cleanup"
          println "* Connections cache size:" + connections.size()
          println "* Reverse Lookup cache size:" + reverseLookup.size()
          return
          

          Allan BURDAJEWICZ added a comment - The previous script would not clear the clients if they are in use. The following would force clear: import java.lang.reflect.Field Class <?> connectorClazz = org.jenkinsci.plugins.github_branch_source.Connector.class Field connectionsField = connectorClazz.getDeclaredField( "connections" ); connectionsField.setAccessible( true ); Map connections = (Map) connectionsField.get( null ); Field reverseLookupField = connectorClazz.getDeclaredField( "reverseLookup" ); reverseLookupField.setAccessible( true ); Map reverseLookup = (Map) reverseLookupField.get( null ); println "Before cleanup" println " * Connections cache size:" + connections.size() println " * Reverse Lookup cache size:" + reverseLookup.size() connections.each { entry -> def record = entry.getValue() try { if (record.cache != null && record.cleanupCacheFolder) { record.cache.delete(); record.cache.close(); } } catch (IOException e) { println "Exception removing cache directory for connection ${entry.getKey()}: " + e.getMessage() } } connections.clear() reverseLookup.clear() println "After cleanup" println "* Connections cache size:" + connections.size() println "* Reverse Lookup cache size:" + reverseLookup.size() return

          Per https://github.com/square/okhttp/issues/7698, this is an issue with the Proxy Selector.. but also GitHub Branch Source:

          Allan BURDAJEWICZ added a comment - Per https://github.com/square/okhttp/issues/7698 , this is an issue with the Proxy Selector.. but also GitHub Branch Source: okhttp-api should not keep record of a InetSocketAddress at https://github.com/jenkinsci/okhttp-api-plugin/blob/4.10.0-132.v7a_7b_91cef39c/src/main/java/io/jenkins/plugins/okhttp/api/internals/JenkinsProxySelector.java#L28 unless it uses InetSocketAddress.createUnresolved . Okhttp recommend keeping and reusing an instance of the client so that makes sense. Another solution would be to create a new Proxy object on the fly at every call to ProxySelector#select github-branch-source: we redundantly override the proxy at https://github.com/jenkinsci/github-branch-source-plugin/blob/1703.vd5a_2b_29c6cdc/src/main/java/org/jenkinsci/plugins/github_branch_source/Connector.java#L484-L486 .

          Allan BURDAJEWICZ added a comment - https://github.com/jenkinsci/github-branch-source-plugin/pull/699 https://github.com/jenkinsci/okhttp-api-plugin/pull/159

            allan_burdajewicz Allan BURDAJEWICZ
            allan_burdajewicz Allan BURDAJEWICZ
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: