Loading...

This issue is archived. You can view it, but you can't modify it. Learn more

XML

Word

Printable

Type: Task
Resolution: Fixed
Priority: Major
Component/s: artifact-manager-s3-plugin
Labels:
- issue-exported-to-github
- robustness

svanoort reminds me that we need to examine the behavior of this plugin with respect to timeouts and network failures and the like. Specifically, we can classify anomalous events as follows:

Network failures, throwing an exception from some socket call typically.
Network hangs (perhaps due to misconfigured TCP settings), whereby a socket call just blocks indefinitely (java.io versions are typically immune to interruption except by Thread.stop, alas).
User-initiated interrupt: Stop button is clicked.
System-initiated interrupt, such as via the timeout step.

The code which would be impacted by such events can also be classified:

Master-side S3 metadata calls made in the course of a build, such as for archiveArtifacts, typically inside SynchronousNonBlockingStepExecution.
Master-side S3 metadata calls made in the context of a build but not inside a build step:
- artifact & stash deletion during log rotation of old builds
- stash deletion at the end of a build
- artifact & stash copy during checkpoint resumption
Master-side S3 metadata calls made completely outside the context of a build:
- artifact browsing from classic UI
- same but from Blue Ocean
Agent-side URL GET or POST calls made from a build step.

Draft acceptance criteria:

Build steps may hang or fail due to network issues, but timeout or manual interrupts must be honored promptly. (retry can be used for critical builds when there is an advance expectation of problems; checkpoints can also be used for manual intervention.)
Operations associated with a build but outside the context of a build step must apply some reasonable timeout, and if this is exceeded, either fail or issue a warning, according to the nature of the API.
Operations associated with an HTTP request thread in classic UI may block on the network, though if some reasonable timeout is exceeded an HTTP error should be returned and the thread returned to the pool.
Blue Ocean behavior is TBD. Ideally these REST calls would be asynchronous and not block rendering of the Artifacts tab.