Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-50597

Verify behavior of timeouts, interrupts, and network disconnections in S3 storage

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      Sam Van Oort reminds me that we need to examine the behavior of this plugin with respect to timeouts and network failures and the like. Specifically, we can classify anomalous events as follows:

      • Network failures, throwing an exception from some socket call typically.
      • Network hangs (perhaps due to misconfigured TCP settings), whereby a socket call just blocks indefinitely (java.io versions are typically immune to interruption except by Thread.stop, alas).
      • User-initiated interrupt: Stop button is clicked.
      • System-initiated interrupt, such as via the timeout step.

      The code which would be impacted by such events can also be classified:

      • Master-side S3 metadata calls made in the course of a build, such as for archiveArtifacts, typically inside SynchronousNonBlockingStepExecution.
      • Master-side S3 metadata calls made in the context of a build but not inside a build step:
        • artifact & stash deletion during log rotation of old builds
        • stash deletion at the end of a build
        • artifact & stash copy during checkpoint resumption
      • Master-side S3 metadata calls made completely outside the context of a build:
        • artifact browsing from classic UI
        • same but from Blue Ocean
      • Agent-side URL GET or POST calls made from a build step.

      Draft acceptance criteria:

      • Build steps may hang or fail due to network issues, but timeout or manual interrupts must be honored promptly. (retry can be used for critical builds when there is an advance expectation of problems; checkpoints can also be used for manual intervention.)
      • Operations associated with a build but outside the context of a build step must apply some reasonable timeout, and if this is exceeded, either fail or issue a warning, according to the nature of the API.
      • Operations associated with an HTTP request thread in classic UI may block on the network, though if some reasonable timeout is exceeded an HTTP error should be returned and the thread returned to the pool.
      • Blue Ocean behavior is TBD. Ideally these REST calls would be asynchronous and not block rendering of the Artifacts tab.

        Attachments

          Issue Links

            Activity

            jglick Jesse Glick created issue -
            Hide
            svanoort Sam Van Oort added a comment -

            Huge thumbs-up for tracking this as something that needs to be part of the design, and I think the criteria sounds reasonable.

            I'd add to the Draft Acceptance criteria that

            • The non-step operations need to either have a reasonable default handling of direct network failures and throw an appropriately specific exception (hopefully not just a blind IOException, but some sort of subtype)

            This table of Java network exception types may be of assistance: http://vaibhavblogs.org/2012/12/common-java-networking-exceptions/

            Show
            svanoort Sam Van Oort added a comment - Huge thumbs-up for tracking this as something that needs to be part of the design, and I think the criteria sounds reasonable. I'd add to the Draft Acceptance criteria that The non-step operations need to either have a reasonable default handling of direct network failures and throw an appropriately specific exception (hopefully not just a blind IOException, but some sort of subtype) This table of Java network exception types may be of assistance: http://vaibhavblogs.org/2012/12/common-java-networking-exceptions/
            Hide
            svanoort Sam Van Oort added a comment -

            An optional approach for dealing with errors if we don't want to mandate a lot of Checked Exceptions may be to optionally accept a Handler object that may implement its own strategy of dealing with the various kinds of failures (i.e. it can do retries, logging, timeouts) – it lets us revise the approach in the future a bit more flexibly.

            Show
            svanoort Sam Van Oort added a comment - An optional approach for dealing with errors if we don't want to mandate a lot of Checked Exceptions may be to optionally accept a Handler object that may implement its own strategy of dealing with the various kinds of failures (i.e. it can do retries, logging, timeouts) – it lets us revise the approach in the future a bit more flexibly.
            Hide
            csanchez Carlos Sanchez added a comment - - edited

            I'll add that we need to make sure calls to AWS APIs are retried with backoff given my previous experience

            If an API request exceeds the API request rate for its category, the request returns the RequestLimitExceeded error code. To prevent this error, ensure that your application doesn't retry API requests at a high rate. You can do this by using care when polling and by using exponential backoff retries.

            https://docs.aws.amazon.com/AWSEC2/latest/APIReference/query-api-troubleshooting.html

            Show
            csanchez Carlos Sanchez added a comment - - edited I'll add that we need to make sure calls to AWS APIs are retried with backoff given my previous experience If an API request exceeds the API request rate for its category, the request returns the RequestLimitExceeded error code. To prevent this error, ensure that your application doesn't retry API requests at a high rate. You can do this by using care when polling and by using exponential backoff retries. https://docs.aws.amazon.com/AWSEC2/latest/APIReference/query-api-troubleshooting.html
            jglick Jesse Glick made changes -
            Field Original Value New Value
            Assignee Jesse Glick [ jglick ]
            Hide
            vivek Vivek Pandey added a comment -

            +1 for retries with exponential back off with circuit breaker pattern.

            Show
            vivek Vivek Pandey added a comment - +1 for retries with exponential back off with circuit breaker pattern.
            Hide
            jglick Jesse Glick added a comment -

            As mentioned in the issue description, I see no need for baking retries into the implementation for the steps invoked by the Pipeline script. You can already use retry (or waitUntil, which does exponential backoff) in any Pipeline for which the success of individual builds is so important that you prefer to keep going until AWS responds. Best to leave retry semantics (and timeout and checkpoints) in the hands of users, who are better placed to judge whether it is appropriate in a given context, and keep the build step itself simple and transparent.

            AWS calls made by the Jenkins master outside the build context are another matter. The user has no control over these, so they need to behave somehow sanely, including timeouts and perhaps also retries, depending on the criticality of the function.

            Show
            jglick Jesse Glick added a comment - As mentioned in the issue description, I see no need for baking retries into the implementation for the steps invoked by the Pipeline script. You can already use retry (or waitUntil , which does exponential backoff) in any Pipeline for which the success of individual builds is so important that you prefer to keep going until AWS responds. Best to leave retry semantics (and timeout and checkpoints) in the hands of users, who are better placed to judge whether it is appropriate in a given context, and keep the build step itself simple and transparent. AWS calls made by the Jenkins master outside the build context are another matter. The user has no control over these, so they need to behave somehow sanely, including timeouts and perhaps also retries, depending on the criticality of the function.
            Hide
            svanoort Sam Van Oort added a comment - - edited

            So... from discussion with Jesse Glick, my main concern is that:

            1. Any failures at the network layer result in a subtype of IOException rather than indefinite hangs, and that retry be supported implicitly if desired. Edit: also any failures need to be isolated to the request itself – they should not be able to impact other parts of the system, other builds, etc.
            2. Option to inject a custom handler for network-level failures that supports a Strategy for retry / timeout / fallBack

            Show
            svanoort Sam Van Oort added a comment - - edited So... from discussion with Jesse Glick , my main concern is that: 1. Any failures at the network layer result in a subtype of IOException rather than indefinite hangs, and that retry be supported implicitly if desired. Edit: also any failures need to be isolated to the request itself – they should not be able to impact other parts of the system, other builds, etc. 2. Option to inject a custom handler for network-level failures that supports a Strategy for retry / timeout / fallBack
            jglick Jesse Glick made changes -
            Remote Link This issue links to "PR 28 (Web Link)" [ 20729 ]
            jglick Jesse Glick made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            Hide
            jglick Jesse Glick added a comment -

            Master-side S3 metadata calls made in the course of a build, such as for archiveArtifacts

            Actually a bad example—this step makes no metadata calls. It only generates presigned URLs, which happens entirely offline, and then the agent makes PUT calls. Ditto stash and (with GET) unstash. The only step making master-side metadata calls is unarchive, which I think will do one cache-frame metadata operation (GetBucketLocation + ListBucket) to determine the list of files.

            Show
            jglick Jesse Glick added a comment - Master-side S3 metadata calls made in the course of a build, such as for archiveArtifacts Actually a bad example—this step makes no metadata calls. It only generates presigned URLs, which happens entirely offline, and then the agent makes PUT calls. Ditto stash and (with GET ) unstash . The only step making master-side metadata calls is unarchive , which I think will do one cache-frame metadata operation ( GetBucketLocation + ListBucket ) to determine the list of files.
            jglick Jesse Glick made changes -
            Assignee Jesse Glick [ jglick ]
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Jesse Glick
            Path:
            pom.xml
            src/main/java/io/jenkins/plugins/artifact_manager_jclouds/JCloudsArtifactManager.java
            src/test/java/io/jenkins/plugins/artifact_manager_jclouds/MockBlobStore.java
            src/test/java/io/jenkins/plugins/artifact_manager_jclouds/NetworkTest.java
            http://jenkins-ci.org/commit/artifact-manager-s3-plugin/6f9c2b2b0adcc678a7fc5dc30e7f95e79c5b6584
            Log:
            JENKINS-50597 First direct test of error handling: non-200 response codes from artifact upload.

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: pom.xml src/main/java/io/jenkins/plugins/artifact_manager_jclouds/JCloudsArtifactManager.java src/test/java/io/jenkins/plugins/artifact_manager_jclouds/MockBlobStore.java src/test/java/io/jenkins/plugins/artifact_manager_jclouds/NetworkTest.java http://jenkins-ci.org/commit/artifact-manager-s3-plugin/6f9c2b2b0adcc678a7fc5dc30e7f95e79c5b6584 Log: JENKINS-50597 First direct test of error handling: non-200 response codes from artifact upload.
            jglick Jesse Glick made changes -
            Remote Link This issue links to "PR 33 (Web Link)" [ 20740 ]
            Hide
            jglick Jesse Glick added a comment -

            Carlos Sanchez notes that there are some expected and documented failure modes for S3 such as for rate limiting which certainly require handling. For now, should suffice to bake in some sort of basic exponential backoff strategy with a (long) ultimate timeout; can always introduce a tunable systemwide parameter later as more services are developed.

            Show
            jglick Jesse Glick added a comment - Carlos Sanchez notes that there are some expected and documented failure modes for S3 such as for rate limiting which certainly require handling. For now, should suffice to bake in some sort of basic exponential backoff strategy with a (long) ultimate timeout; can always introduce a tunable systemwide parameter later as more services are developed.
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Jesse Glick
            Path:
            pom.xml
            src/main/java/io/jenkins/plugins/artifact_manager_jclouds/JCloudsArtifactManager.java
            src/test/java/io/jenkins/plugins/artifact_manager_jclouds/MockApiMetadata.java
            src/test/java/io/jenkins/plugins/artifact_manager_jclouds/MockApiMetadataTest.java
            src/test/java/io/jenkins/plugins/artifact_manager_jclouds/MockBlobStore.java
            src/test/java/io/jenkins/plugins/artifact_manager_jclouds/MockBlobStoreTest.java
            src/test/java/io/jenkins/plugins/artifact_manager_jclouds/NetworkTest.java
            http://jenkins-ci.org/commit/artifact-manager-s3-plugin/cb859635d11ca7c3a2822ee6608588bb3e9b1159
            Log:
            Merge pull request #33 from jenkinsci/jclouds-mock

            JENKINS-50597 Introduce mocking framework and start to resolve network reliability issues

            Compare: https://github.com/jenkinsci/artifact-manager-s3-plugin/compare/333330c5cfba...cb859635d11c
            *NOTE:* This service been marked for deprecation: https://developer.github.com/changes/2018-04-25-github-services-deprecation/

            Functionality will be removed from GitHub.com on January 31st, 2019.

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: pom.xml src/main/java/io/jenkins/plugins/artifact_manager_jclouds/JCloudsArtifactManager.java src/test/java/io/jenkins/plugins/artifact_manager_jclouds/MockApiMetadata.java src/test/java/io/jenkins/plugins/artifact_manager_jclouds/MockApiMetadataTest.java src/test/java/io/jenkins/plugins/artifact_manager_jclouds/MockBlobStore.java src/test/java/io/jenkins/plugins/artifact_manager_jclouds/MockBlobStoreTest.java src/test/java/io/jenkins/plugins/artifact_manager_jclouds/NetworkTest.java http://jenkins-ci.org/commit/artifact-manager-s3-plugin/cb859635d11ca7c3a2822ee6608588bb3e9b1159 Log: Merge pull request #33 from jenkinsci/jclouds-mock JENKINS-50597 Introduce mocking framework and start to resolve network reliability issues Compare: https://github.com/jenkinsci/artifact-manager-s3-plugin/compare/333330c5cfba...cb859635d11c * NOTE: * This service been marked for deprecation: https://developer.github.com/changes/2018-04-25-github-services-deprecation/ Functionality will be removed from GitHub.com on January 31st, 2019.
            Hide
            jglick Jesse Glick added a comment -

            Carlos Sanchez points me to guava-retrying which looks like a convenient way of implementing exponential backoff, if perhaps overkill.

            The official documentation for S3 appears to be here.

            Show
            jglick Jesse Glick added a comment - Carlos Sanchez points me to guava-retrying which looks like a convenient way of implementing exponential backoff, if perhaps overkill. The official documentation for S3 appears to be here .
            Hide
            jglick Jesse Glick added a comment -

            (FTR this fork is newer, but requires a newer version of Guava than Jenkins bundles.)

            Show
            jglick Jesse Glick added a comment - (FTR this fork is newer, but requires a newer version of Guava than Jenkins bundles.)
            Hide
            jglick Jesse Glick added a comment -

            A review of the recommendations as well as the error list confirms that we should retry on 5xx but not 4xx errors, which seems like generally good advice not limited to S3. Of course there can be other network errors that precede even receiving an HTTP response code.

            Show
            jglick Jesse Glick added a comment - A review of the recommendations as well as the error list confirms that we should retry on 5xx but not 4xx errors, which seems like generally good advice not limited to S3. Of course there can be other network errors that precede even receiving an HTTP response code.
            Hide
            svanoort Sam Van Oort added a comment -

            Jesse Glick You also need to be mindful of streams going silent in the middle too – not all critical network failures will result in a 4xx being returned or an IOException. A generalized Stream wrapper that catches if a single read/write operation blocks for a prolonged period without new content might be helpful for that (then you throw a new IOException subtype).

            Show
            svanoort Sam Van Oort added a comment - Jesse Glick You also need to be mindful of streams going silent in the middle too – not all critical network failures will result in a 4xx being returned or an IOException. A generalized Stream wrapper that catches if a single read/write operation blocks for a prolonged period without new content might be helpful for that (then you throw a new IOException subtype).
            jglick Jesse Glick made changes -
            Remote Link This issue links to "PR 34 (Web Link)" [ 20742 ]
            Hide
            jglick Jesse Glick added a comment -

            Probably most easily handled by just applying a timeout to the entire network operation, all the way from hostname resolution to receiving a status code and reading the response body.

            Show
            jglick Jesse Glick added a comment - Probably most easily handled by just applying a timeout to the entire network operation, all the way from hostname resolution to receiving a status code and reading the response body.
            Hide
            svanoort Sam Van Oort added a comment -

            Jesse Glick If you apply it at the stream level, it lets you distinguish between error cases and cases where something is slow because you're simply transferring a lot of data. Much less prone to false failures and it doesn't require user tweaking to avoid failing builds / API calls.

            Also pretty easy to implement!

            Show
            svanoort Sam Van Oort added a comment - Jesse Glick If you apply it at the stream level, it lets you distinguish between error cases and cases where something is slow because you're simply transferring a lot of data. Much less prone to false failures and it doesn't require user tweaking to avoid failing builds / API calls. Also pretty easy to implement!
            Hide
            jglick Jesse Glick added a comment - - edited

            That would work for downloads, but not for uploads.

            Where the context is a build step running agent-side code, any whole-operation timeout is merely a final fallback in case you have neglected to use a general build timeout. The value should be chosen to be longer than any plausible legitimate operation—say, an hour.

            For cases where the artifact transfer is happening on the master-side then a shorter timeout is appropriate, to prevent accidental DoS. Anyway this case should be rare: for example, an HTTP service thread bundling a set of artifacts to handle a *zip*-format URI, which cannot generally be supported by redirecting to an external URL.

            Show
            jglick Jesse Glick added a comment - - edited That would work for downloads, but not for uploads. Where the context is a build step running agent-side code, any whole-operation timeout is merely a final fallback in case you have neglected to use a general build timeout. The value should be chosen to be longer than any plausible legitimate operation—say, an hour. For cases where the artifact transfer is happening on the master-side then a shorter timeout is appropriate, to prevent accidental DoS. Anyway this case should be rare: for example, an HTTP service thread bundling a set of artifacts to handle a *zip* -format URI, which cannot generally be supported by redirecting to an external URL.
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Jesse Glick
            Path:
            pom.xml
            src/main/java/io/jenkins/plugins/artifact_manager_jclouds/JCloudsArtifactManager.java
            src/test/java/io/jenkins/plugins/artifact_manager_jclouds/NetworkTest.java
            http://jenkins-ci.org/commit/artifact-manager-s3-plugin/073804dbe7b716c9c9265615b775b8cd80c1a9f2
            Log:
            JENKINS-50597 Retry uploads after 5xx server errors or low-level network errors.

            *NOTE:* This service been marked for deprecation: https://developer.github.com/changes/2018-04-25-github-services-deprecation/

            Functionality will be removed from GitHub.com on January 31st, 2019.

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: pom.xml src/main/java/io/jenkins/plugins/artifact_manager_jclouds/JCloudsArtifactManager.java src/test/java/io/jenkins/plugins/artifact_manager_jclouds/NetworkTest.java http://jenkins-ci.org/commit/artifact-manager-s3-plugin/073804dbe7b716c9c9265615b775b8cd80c1a9f2 Log: JENKINS-50597 Retry uploads after 5xx server errors or low-level network errors. * NOTE: * This service been marked for deprecation: https://developer.github.com/changes/2018-04-25-github-services-deprecation/ Functionality will be removed from GitHub.com on January 31st, 2019.
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Carlos Sanchez
            Path:
            pom.xml
            src/main/java/io/jenkins/plugins/artifact_manager_jclouds/JCloudsArtifactManager.java
            src/test/java/io/jenkins/plugins/artifact_manager_jclouds/MockBlobStore.java
            src/test/java/io/jenkins/plugins/artifact_manager_jclouds/NetworkTest.java
            http://jenkins-ci.org/commit/artifact-manager-s3-plugin/d4ff5756df1ca62ab808c6842e7c509a7902653d
            Log:
            Merge pull request #34 from jenkinsci/network-JENKINS-50597

            JENKINS-50597 Network behavior tuning

            Compare: https://github.com/jenkinsci/artifact-manager-s3-plugin/compare/cb859635d11c...d4ff5756df1c
            *NOTE:* This service been marked for deprecation: https://developer.github.com/changes/2018-04-25-github-services-deprecation/

            Functionality will be removed from GitHub.com on January 31st, 2019.

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Carlos Sanchez Path: pom.xml src/main/java/io/jenkins/plugins/artifact_manager_jclouds/JCloudsArtifactManager.java src/test/java/io/jenkins/plugins/artifact_manager_jclouds/MockBlobStore.java src/test/java/io/jenkins/plugins/artifact_manager_jclouds/NetworkTest.java http://jenkins-ci.org/commit/artifact-manager-s3-plugin/d4ff5756df1ca62ab808c6842e7c509a7902653d Log: Merge pull request #34 from jenkinsci/network- JENKINS-50597 JENKINS-50597 Network behavior tuning Compare: https://github.com/jenkinsci/artifact-manager-s3-plugin/compare/cb859635d11c...d4ff5756df1c * NOTE: * This service been marked for deprecation: https://developer.github.com/changes/2018-04-25-github-services-deprecation/ Functionality will be removed from GitHub.com on January 31st, 2019.
            jglick Jesse Glick made changes -
            Remote Link This issue links to "PR 40 (Web Link)" [ 20774 ]
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Carlos Sanchez
            Path:
            src/main/java/io/jenkins/plugins/artifact_manager_jclouds/JCloudsArtifactManager.java
            src/test/java/io/jenkins/plugins/artifact_manager_jclouds/NetworkTest.java
            http://jenkins-ci.org/commit/artifact-manager-s3-plugin/bb65de81dfd57fff12f5c8be30d19cfd11763742
            Log:
            Merge pull request #40 from jenkinsci/network-JENKINS-50597

            JENKINS-50597 Network behavior tuning II

            Compare: https://github.com/jenkinsci/artifact-manager-s3-plugin/compare/b09776a43157...bb65de81dfd5
            *NOTE:* This service been marked for deprecation: https://developer.github.com/changes/2018-04-25-github-services-deprecation/

            Functionality will be removed from GitHub.com on January 31st, 2019.

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Carlos Sanchez Path: src/main/java/io/jenkins/plugins/artifact_manager_jclouds/JCloudsArtifactManager.java src/test/java/io/jenkins/plugins/artifact_manager_jclouds/NetworkTest.java http://jenkins-ci.org/commit/artifact-manager-s3-plugin/bb65de81dfd57fff12f5c8be30d19cfd11763742 Log: Merge pull request #40 from jenkinsci/network- JENKINS-50597 JENKINS-50597 Network behavior tuning II Compare: https://github.com/jenkinsci/artifact-manager-s3-plugin/compare/b09776a43157...bb65de81dfd5 * NOTE: * This service been marked for deprecation: https://developer.github.com/changes/2018-04-25-github-services-deprecation/ Functionality will be removed from GitHub.com on January 31st, 2019.
            jglick Jesse Glick made changes -
            Remote Link This issue links to "PR 41 (Web Link)" [ 20778 ]
            jglick Jesse Glick made changes -
            Link This issue relates to JENKINS-26148 [ JENKINS-26148 ]
            jglick Jesse Glick made changes -
            Link This issue depends on INFRA-1633 [ INFRA-1633 ]
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Carlos Sanchez
            Path:
            pom.xml
            src/main/java/io/jenkins/plugins/artifact_manager_jclouds/JCloudsArtifactManager.java
            src/test/java/io/jenkins/plugins/artifact_manager_jclouds/MockBlobStore.java
            src/test/java/io/jenkins/plugins/artifact_manager_jclouds/NetworkTest.java
            http://jenkins-ci.org/commit/artifact-manager-s3-plugin/2fece887119dd8ad512aa6213ddb6079908ebe6b
            Log:
            Merge pull request #41 from jenkinsci/network-JENKINS-50597

            JENKINS-50597 Network behavior tuning III

            Compare: https://github.com/jenkinsci/artifact-manager-s3-plugin/compare/bb65de81dfd5...2fece887119d
            *NOTE:* This service been marked for deprecation: https://developer.github.com/changes/2018-04-25-github-services-deprecation/

            Functionality will be removed from GitHub.com on January 31st, 2019.

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Carlos Sanchez Path: pom.xml src/main/java/io/jenkins/plugins/artifact_manager_jclouds/JCloudsArtifactManager.java src/test/java/io/jenkins/plugins/artifact_manager_jclouds/MockBlobStore.java src/test/java/io/jenkins/plugins/artifact_manager_jclouds/NetworkTest.java http://jenkins-ci.org/commit/artifact-manager-s3-plugin/2fece887119dd8ad512aa6213ddb6079908ebe6b Log: Merge pull request #41 from jenkinsci/network- JENKINS-50597 JENKINS-50597 Network behavior tuning III Compare: https://github.com/jenkinsci/artifact-manager-s3-plugin/compare/bb65de81dfd5...2fece887119d * NOTE: * This service been marked for deprecation: https://developer.github.com/changes/2018-04-25-github-services-deprecation/ Functionality will be removed from GitHub.com on January 31st, 2019.
            jglick Jesse Glick made changes -
            Remote Link This issue links to "PR 42 (Web Link)" [ 20787 ]
            Hide
            jglick Jesse Glick added a comment -

            From code inspection and such experiments as I can run, there are these basic cases:

            • Master creates a presigned URL (no network operation) and agent uploads to or downloads from it. We need to have custom code to handle network errors, hangs, and HTTP errors distinguishing 4xx (fatal) from 5xx (retryable).
            • Master makes a metadata call. jclouds itself handles timeouts and retries. While we could probably influence its strategies if we needed to, it seems to bake in reasonable defaults, so unless we observe some serious problem from the field, leave well enough alone.
            • Master downloads bits. This only happens in some relatively unusual cases from HTTP threads. Not obvious what the jclouds behavior is when there is, say, a network hang in the middle, but anyway this would at worst block on handler thread and probably the servlet container imposes some limits.

            Still checking Blue Ocean behavior.

            Show
            jglick Jesse Glick added a comment - From code inspection and such experiments as I can run, there are these basic cases: Master creates a presigned URL (no network operation) and agent uploads to or downloads from it. We need to have custom code to handle network errors, hangs, and HTTP errors distinguishing 4xx (fatal) from 5xx (retryable). Master makes a metadata call. jclouds itself handles timeouts and retries. While we could probably influence its strategies if we needed to, it seems to bake in reasonable defaults, so unless we observe some serious problem from the field, leave well enough alone. Master downloads bits. This only happens in some relatively unusual cases from HTTP threads. Not obvious what the jclouds behavior is when there is, say, a network hang in the middle, but anyway this would at worst block on handler thread and probably the servlet container imposes some limits. Still checking Blue Ocean behavior.
            Hide
            jglick Jesse Glick added a comment -

            B.O. behavior seems less than ideal but OK. If you, say, disconnect your network prior to open the main page for a build, you get a brief delay while jclouds retries the connection, and then Run.getArtifactsUpTo warns you about the error. This seems to be done by PipelineStatePreloader but it does not seem to block the general page rendering unless I misread the Chrome timing graph. If you then go to the Artifacts tab, it tries again, this time from /blue/organizations/jenkins/smokes/detail/…/artifacts/. Actual artifact downloads use the classic URL which does a redirect, so that is fine.

            Show
            jglick Jesse Glick added a comment - B.O. behavior seems less than ideal but OK. If you, say, disconnect your network prior to open the main page for a build, you get a brief delay while jclouds retries the connection, and then Run.getArtifactsUpTo warns you about the error. This seems to be done by PipelineStatePreloader but it does not seem to block the general page rendering unless I misread the Chrome timing graph. If you then go to the Artifacts tab, it tries again, this time from /blue/organizations/jenkins/smokes/detail/…/artifacts/ . Actual artifact downloads use the classic URL which does a redirect, so that is fine.
            jglick Jesse Glick made changes -
            Remote Link This issue links to "apache-httpcomponents-client-4-api PR 9 (Web Link)" [ 20799 ]
            jglick Jesse Glick made changes -
            Remote Link This issue links to "workflow-api PR 67 (Web Link)" [ 20800 ]
            jglick Jesse Glick made changes -
            Remote Link This issue links to "workflow-basic-steps PR 60 (Web Link)" [ 20801 ]
            jglick Jesse Glick made changes -
            Remote Link This issue links to "copyartifact PR 104 (Web Link)" [ 20802 ]
            jglick Jesse Glick made changes -
            Remote Link This issue links to "core PR 3480 (Web Link)" [ 20803 ]
            jglick Jesse Glick made changes -
            Remote Link This issue links to "jep PR 117 (Web Link)" [ 20804 ]
            jglick Jesse Glick made changes -
            Status In Progress [ 3 ] In Review [ 10005 ]
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Carlos Sanchez
            Path:
            pom.xml
            src/main/java/io/jenkins/plugins/artifact_manager_jclouds/JCloudsArtifactManager.java
            src/main/java/io/jenkins/plugins/artifact_manager_jclouds/JCloudsVirtualFile.java
            src/test/java/io/jenkins/plugins/artifact_manager_jclouds/MockApiMetadata.java
            src/test/java/io/jenkins/plugins/artifact_manager_jclouds/NetworkTest.java
            src/test/java/io/jenkins/plugins/artifact_manager_s3/JCloudsArtifactManagerTest.java
            http://jenkins-ci.org/commit/artifact-manager-s3-plugin/0a012ef1c974fcde11328a5f66f6e58634f55fee
            Log:
            Merge pull request #42 from jenkinsci/network-JENKINS-50597

            JENKINS-50597 Network behavior tuning IV

            Compare: https://github.com/jenkinsci/artifact-manager-s3-plugin/compare/2561a7ad88ee...0a012ef1c974
            *NOTE:* This service been marked for deprecation: https://developer.github.com/changes/2018-04-25-github-services-deprecation/

            Functionality will be removed from GitHub.com on January 31st, 2019.

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Carlos Sanchez Path: pom.xml src/main/java/io/jenkins/plugins/artifact_manager_jclouds/JCloudsArtifactManager.java src/main/java/io/jenkins/plugins/artifact_manager_jclouds/JCloudsVirtualFile.java src/test/java/io/jenkins/plugins/artifact_manager_jclouds/MockApiMetadata.java src/test/java/io/jenkins/plugins/artifact_manager_jclouds/NetworkTest.java src/test/java/io/jenkins/plugins/artifact_manager_s3/JCloudsArtifactManagerTest.java http://jenkins-ci.org/commit/artifact-manager-s3-plugin/0a012ef1c974fcde11328a5f66f6e58634f55fee Log: Merge pull request #42 from jenkinsci/network- JENKINS-50597 JENKINS-50597 Network behavior tuning IV Compare: https://github.com/jenkinsci/artifact-manager-s3-plugin/compare/2561a7ad88ee...0a012ef1c974 * NOTE: * This service been marked for deprecation: https://developer.github.com/changes/2018-04-25-github-services-deprecation/ Functionality will be removed from GitHub.com on January 31st, 2019.
            jglick Jesse Glick made changes -
            Link This issue relates to JENKINS-22637 [ JENKINS-22637 ]
            Hide
            jglick Jesse Glick added a comment -

            Main work done. Cannot close without approval from ikedam.

            Show
            jglick Jesse Glick added a comment - Main work done. Cannot close without approval from ikedam .
            jglick Jesse Glick made changes -
            Resolution Fixed [ 1 ]
            Status In Review [ 10005 ] Resolved [ 5 ]

              People

              Assignee:
              jglick Jesse Glick
              Reporter:
              jglick Jesse Glick
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: