Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-50597

Verify behavior of timeouts, interrupts, and network disconnections in S3 storage

      svanoort reminds me that we need to examine the behavior of this plugin with respect to timeouts and network failures and the like. Specifically, we can classify anomalous events as follows:

      • Network failures, throwing an exception from some socket call typically.
      • Network hangs (perhaps due to misconfigured TCP settings), whereby a socket call just blocks indefinitely (java.io versions are typically immune to interruption except by Thread.stop, alas).
      • User-initiated interrupt: Stop button is clicked.
      • System-initiated interrupt, such as via the timeout step.

      The code which would be impacted by such events can also be classified:

      • Master-side S3 metadata calls made in the course of a build, such as for archiveArtifacts, typically inside SynchronousNonBlockingStepExecution.
      • Master-side S3 metadata calls made in the context of a build but not inside a build step:
        • artifact & stash deletion during log rotation of old builds
        • stash deletion at the end of a build
        • artifact & stash copy during checkpoint resumption
      • Master-side S3 metadata calls made completely outside the context of a build:
        • artifact browsing from classic UI
        • same but from Blue Ocean
      • Agent-side URL GET or POST calls made from a build step.

      Draft acceptance criteria:

      • Build steps may hang or fail due to network issues, but timeout or manual interrupts must be honored promptly. (retry can be used for critical builds when there is an advance expectation of problems; checkpoints can also be used for manual intervention.)
      • Operations associated with a build but outside the context of a build step must apply some reasonable timeout, and if this is exceeded, either fail or issue a warning, according to the nature of the API.
      • Operations associated with an HTTP request thread in classic UI may block on the network, though if some reasonable timeout is exceeded an HTTP error should be returned and the thread returned to the pool.
      • Blue Ocean behavior is TBD. Ideally these REST calls would be asynchronous and not block rendering of the Artifacts tab.

          [JENKINS-50597] Verify behavior of timeouts, interrupts, and network disconnections in S3 storage

          Sam Van Oort added a comment -

          jglick If you apply it at the stream level, it lets you distinguish between error cases and cases where something is slow because you're simply transferring a lot of data. Much less prone to false failures and it doesn't require user tweaking to avoid failing builds / API calls.

          Also pretty easy to implement!

          Sam Van Oort added a comment - jglick If you apply it at the stream level, it lets you distinguish between error cases and cases where something is slow because you're simply transferring a lot of data. Much less prone to false failures and it doesn't require user tweaking to avoid failing builds / API calls. Also pretty easy to implement!

          Jesse Glick added a comment - - edited

          That would work for downloads, but not for uploads.

          Where the context is a build step running agent-side code, any whole-operation timeout is merely a final fallback in case you have neglected to use a general build timeout. The value should be chosen to be longer than any plausible legitimate operation—say, an hour.

          For cases where the artifact transfer is happening on the master-side then a shorter timeout is appropriate, to prevent accidental DoS. Anyway this case should be rare: for example, an HTTP service thread bundling a set of artifacts to handle a *zip*-format URI, which cannot generally be supported by redirecting to an external URL.

          Jesse Glick added a comment - - edited That would work for downloads, but not for uploads. Where the context is a build step running agent-side code, any whole-operation timeout is merely a final fallback in case you have neglected to use a general build timeout. The value should be chosen to be longer than any plausible legitimate operation—say, an hour. For cases where the artifact transfer is happening on the master-side then a shorter timeout is appropriate, to prevent accidental DoS. Anyway this case should be rare: for example, an HTTP service thread bundling a set of artifacts to handle a *zip* -format URI, which cannot generally be supported by redirecting to an external URL.

          Code changed in jenkins
          User: Jesse Glick
          Path:
          pom.xml
          src/main/java/io/jenkins/plugins/artifact_manager_jclouds/JCloudsArtifactManager.java
          src/test/java/io/jenkins/plugins/artifact_manager_jclouds/NetworkTest.java
          http://jenkins-ci.org/commit/artifact-manager-s3-plugin/073804dbe7b716c9c9265615b775b8cd80c1a9f2
          Log:
          JENKINS-50597 Retry uploads after 5xx server errors or low-level network errors.

          *NOTE:* This service been marked for deprecation: https://developer.github.com/changes/2018-04-25-github-services-deprecation/

          Functionality will be removed from GitHub.com on January 31st, 2019.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: pom.xml src/main/java/io/jenkins/plugins/artifact_manager_jclouds/JCloudsArtifactManager.java src/test/java/io/jenkins/plugins/artifact_manager_jclouds/NetworkTest.java http://jenkins-ci.org/commit/artifact-manager-s3-plugin/073804dbe7b716c9c9265615b775b8cd80c1a9f2 Log: JENKINS-50597 Retry uploads after 5xx server errors or low-level network errors. * NOTE: * This service been marked for deprecation: https://developer.github.com/changes/2018-04-25-github-services-deprecation/ Functionality will be removed from GitHub.com on January 31st, 2019.

          Code changed in jenkins
          User: Carlos Sanchez
          Path:
          pom.xml
          src/main/java/io/jenkins/plugins/artifact_manager_jclouds/JCloudsArtifactManager.java
          src/test/java/io/jenkins/plugins/artifact_manager_jclouds/MockBlobStore.java
          src/test/java/io/jenkins/plugins/artifact_manager_jclouds/NetworkTest.java
          http://jenkins-ci.org/commit/artifact-manager-s3-plugin/d4ff5756df1ca62ab808c6842e7c509a7902653d
          Log:
          Merge pull request #34 from jenkinsci/network-JENKINS-50597

          JENKINS-50597 Network behavior tuning

          Compare: https://github.com/jenkinsci/artifact-manager-s3-plugin/compare/cb859635d11c...d4ff5756df1c
          *NOTE:* This service been marked for deprecation: https://developer.github.com/changes/2018-04-25-github-services-deprecation/

          Functionality will be removed from GitHub.com on January 31st, 2019.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Carlos Sanchez Path: pom.xml src/main/java/io/jenkins/plugins/artifact_manager_jclouds/JCloudsArtifactManager.java src/test/java/io/jenkins/plugins/artifact_manager_jclouds/MockBlobStore.java src/test/java/io/jenkins/plugins/artifact_manager_jclouds/NetworkTest.java http://jenkins-ci.org/commit/artifact-manager-s3-plugin/d4ff5756df1ca62ab808c6842e7c509a7902653d Log: Merge pull request #34 from jenkinsci/network- JENKINS-50597 JENKINS-50597 Network behavior tuning Compare: https://github.com/jenkinsci/artifact-manager-s3-plugin/compare/cb859635d11c...d4ff5756df1c * NOTE: * This service been marked for deprecation: https://developer.github.com/changes/2018-04-25-github-services-deprecation/ Functionality will be removed from GitHub.com on January 31st, 2019.

          Code changed in jenkins
          User: Carlos Sanchez
          Path:
          src/main/java/io/jenkins/plugins/artifact_manager_jclouds/JCloudsArtifactManager.java
          src/test/java/io/jenkins/plugins/artifact_manager_jclouds/NetworkTest.java
          http://jenkins-ci.org/commit/artifact-manager-s3-plugin/bb65de81dfd57fff12f5c8be30d19cfd11763742
          Log:
          Merge pull request #40 from jenkinsci/network-JENKINS-50597

          JENKINS-50597 Network behavior tuning II

          Compare: https://github.com/jenkinsci/artifact-manager-s3-plugin/compare/b09776a43157...bb65de81dfd5
          *NOTE:* This service been marked for deprecation: https://developer.github.com/changes/2018-04-25-github-services-deprecation/

          Functionality will be removed from GitHub.com on January 31st, 2019.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Carlos Sanchez Path: src/main/java/io/jenkins/plugins/artifact_manager_jclouds/JCloudsArtifactManager.java src/test/java/io/jenkins/plugins/artifact_manager_jclouds/NetworkTest.java http://jenkins-ci.org/commit/artifact-manager-s3-plugin/bb65de81dfd57fff12f5c8be30d19cfd11763742 Log: Merge pull request #40 from jenkinsci/network- JENKINS-50597 JENKINS-50597 Network behavior tuning II Compare: https://github.com/jenkinsci/artifact-manager-s3-plugin/compare/b09776a43157...bb65de81dfd5 * NOTE: * This service been marked for deprecation: https://developer.github.com/changes/2018-04-25-github-services-deprecation/ Functionality will be removed from GitHub.com on January 31st, 2019.

          Code changed in jenkins
          User: Carlos Sanchez
          Path:
          pom.xml
          src/main/java/io/jenkins/plugins/artifact_manager_jclouds/JCloudsArtifactManager.java
          src/test/java/io/jenkins/plugins/artifact_manager_jclouds/MockBlobStore.java
          src/test/java/io/jenkins/plugins/artifact_manager_jclouds/NetworkTest.java
          http://jenkins-ci.org/commit/artifact-manager-s3-plugin/2fece887119dd8ad512aa6213ddb6079908ebe6b
          Log:
          Merge pull request #41 from jenkinsci/network-JENKINS-50597

          JENKINS-50597 Network behavior tuning III

          Compare: https://github.com/jenkinsci/artifact-manager-s3-plugin/compare/bb65de81dfd5...2fece887119d
          *NOTE:* This service been marked for deprecation: https://developer.github.com/changes/2018-04-25-github-services-deprecation/

          Functionality will be removed from GitHub.com on January 31st, 2019.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Carlos Sanchez Path: pom.xml src/main/java/io/jenkins/plugins/artifact_manager_jclouds/JCloudsArtifactManager.java src/test/java/io/jenkins/plugins/artifact_manager_jclouds/MockBlobStore.java src/test/java/io/jenkins/plugins/artifact_manager_jclouds/NetworkTest.java http://jenkins-ci.org/commit/artifact-manager-s3-plugin/2fece887119dd8ad512aa6213ddb6079908ebe6b Log: Merge pull request #41 from jenkinsci/network- JENKINS-50597 JENKINS-50597 Network behavior tuning III Compare: https://github.com/jenkinsci/artifact-manager-s3-plugin/compare/bb65de81dfd5...2fece887119d * NOTE: * This service been marked for deprecation: https://developer.github.com/changes/2018-04-25-github-services-deprecation/ Functionality will be removed from GitHub.com on January 31st, 2019.

          Jesse Glick added a comment -

          From code inspection and such experiments as I can run, there are these basic cases:

          • Master creates a presigned URL (no network operation) and agent uploads to or downloads from it. We need to have custom code to handle network errors, hangs, and HTTP errors distinguishing 4xx (fatal) from 5xx (retryable).
          • Master makes a metadata call. jclouds itself handles timeouts and retries. While we could probably influence its strategies if we needed to, it seems to bake in reasonable defaults, so unless we observe some serious problem from the field, leave well enough alone.
          • Master downloads bits. This only happens in some relatively unusual cases from HTTP threads. Not obvious what the jclouds behavior is when there is, say, a network hang in the middle, but anyway this would at worst block on handler thread and probably the servlet container imposes some limits.

          Still checking Blue Ocean behavior.

          Jesse Glick added a comment - From code inspection and such experiments as I can run, there are these basic cases: Master creates a presigned URL (no network operation) and agent uploads to or downloads from it. We need to have custom code to handle network errors, hangs, and HTTP errors distinguishing 4xx (fatal) from 5xx (retryable). Master makes a metadata call. jclouds itself handles timeouts and retries. While we could probably influence its strategies if we needed to, it seems to bake in reasonable defaults, so unless we observe some serious problem from the field, leave well enough alone. Master downloads bits. This only happens in some relatively unusual cases from HTTP threads. Not obvious what the jclouds behavior is when there is, say, a network hang in the middle, but anyway this would at worst block on handler thread and probably the servlet container imposes some limits. Still checking Blue Ocean behavior.

          Jesse Glick added a comment -

          B.O. behavior seems less than ideal but OK. If you, say, disconnect your network prior to open the main page for a build, you get a brief delay while jclouds retries the connection, and then Run.getArtifactsUpTo warns you about the error. This seems to be done by PipelineStatePreloader but it does not seem to block the general page rendering unless I misread the Chrome timing graph. If you then go to the Artifacts tab, it tries again, this time from /blue/organizations/jenkins/smokes/detail/…/artifacts/. Actual artifact downloads use the classic URL which does a redirect, so that is fine.

          Jesse Glick added a comment - B.O. behavior seems less than ideal but OK. If you, say, disconnect your network prior to open the main page for a build, you get a brief delay while jclouds retries the connection, and then Run.getArtifactsUpTo warns you about the error. This seems to be done by PipelineStatePreloader but it does not seem to block the general page rendering unless I misread the Chrome timing graph. If you then go to the Artifacts tab, it tries again, this time from /blue/organizations/jenkins/smokes/detail/…/artifacts/ . Actual artifact downloads use the classic URL which does a redirect, so that is fine.

          Code changed in jenkins
          User: Carlos Sanchez
          Path:
          pom.xml
          src/main/java/io/jenkins/plugins/artifact_manager_jclouds/JCloudsArtifactManager.java
          src/main/java/io/jenkins/plugins/artifact_manager_jclouds/JCloudsVirtualFile.java
          src/test/java/io/jenkins/plugins/artifact_manager_jclouds/MockApiMetadata.java
          src/test/java/io/jenkins/plugins/artifact_manager_jclouds/NetworkTest.java
          src/test/java/io/jenkins/plugins/artifact_manager_s3/JCloudsArtifactManagerTest.java
          http://jenkins-ci.org/commit/artifact-manager-s3-plugin/0a012ef1c974fcde11328a5f66f6e58634f55fee
          Log:
          Merge pull request #42 from jenkinsci/network-JENKINS-50597

          JENKINS-50597 Network behavior tuning IV

          Compare: https://github.com/jenkinsci/artifact-manager-s3-plugin/compare/2561a7ad88ee...0a012ef1c974
          *NOTE:* This service been marked for deprecation: https://developer.github.com/changes/2018-04-25-github-services-deprecation/

          Functionality will be removed from GitHub.com on January 31st, 2019.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Carlos Sanchez Path: pom.xml src/main/java/io/jenkins/plugins/artifact_manager_jclouds/JCloudsArtifactManager.java src/main/java/io/jenkins/plugins/artifact_manager_jclouds/JCloudsVirtualFile.java src/test/java/io/jenkins/plugins/artifact_manager_jclouds/MockApiMetadata.java src/test/java/io/jenkins/plugins/artifact_manager_jclouds/NetworkTest.java src/test/java/io/jenkins/plugins/artifact_manager_s3/JCloudsArtifactManagerTest.java http://jenkins-ci.org/commit/artifact-manager-s3-plugin/0a012ef1c974fcde11328a5f66f6e58634f55fee Log: Merge pull request #42 from jenkinsci/network- JENKINS-50597 JENKINS-50597 Network behavior tuning IV Compare: https://github.com/jenkinsci/artifact-manager-s3-plugin/compare/2561a7ad88ee...0a012ef1c974 * NOTE: * This service been marked for deprecation: https://developer.github.com/changes/2018-04-25-github-services-deprecation/ Functionality will be removed from GitHub.com on January 31st, 2019.

          Jesse Glick added a comment -

          Main work done. Cannot close without approval from ikedam.

          Jesse Glick added a comment - Main work done. Cannot close without approval from ikedam .

            jglick Jesse Glick
            jglick Jesse Glick
            Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: