Loading...

This issue is archived. You can view it, but you can't modify it. Learn more

XML

Word

Printable

Type: Improvement
Resolution: Fixed
Priority: Minor
Component/s: artifact-manager-s3-plugin
Labels:
- issue-exported-to-github

The plugins it is robust, most of the network outages and difficulties do not affect
to the uploads, or only increases the times. Downloads are a little more weak
and network outages affects them breaking the download.

Network outages test

Scenario: we have a Jenkins instance with the S3 Artifact Manager Plugin installed,
we have a ToxiProxy service connected to a
Squid http proxy. We configured a role on ToxiProxy
that redirects port 8888 to 3128 squid service port, We configured the Jenkins instance
to use the the port 8888 as proxy (with java properties -Dhttp.proxyHost=127.0.0.1 -Dhttp.proxyPort=8888 -Dhttps.proxyHost=127.0.0.1 -Dhttps.proxyPort=8888).

Test Scripts

Big-file test

def file = "test.bin"

timestamps {
    node() {
      stage('Generating ${file}') {
        sh "[ -f ${file} ] || dd if=/dev/urandom of=${file} bs=10240 count=102400"
      }
      stage('Archive') {
        archiveArtifacts file
      }
      stage('Unarchive') {
        unarchive mapping: ["${file}": 'test.bin']
      }
    }
}

Small-files Test

timestamps {
    node() {
      stage('Setup') {
        for(def i = 1; i < 1; i++) {
          writeFile file: "test/test-${i}.txt", text: "test ${i}"
        }
      }
      stage('Archive') {
        archiveArtifacts "test/*"
      }
      stage('Unarchive') {
        dir('unarch') {
          deleteDir()
          unarchive mapping: ["test/": '.']
        }
      }
    }
}

Stash Test

timestamps {
    node() {
      stage('Setup') {
        for(def i = 1; i < 100; i++) {
          writeFile file: "test/test-${it}.txt", text: "test ${it}"
        }
        
      }
      stage('Archive') {
        stash name: 'stuff', includes: 'test/'
      }
      stage('Unarchive') {
        dir('unarch') {
          deleteDir()
          unstash name: 'stuff'
        }
      }
    }
}

Prepare the environment

export NET=172.18.5.0/24
export squid=172.18.5.10
export toxiproxy=172.18.5.11

export HOSTS="--add-host squid.example.com:${squid} \
  --add-host toxiproxy.example.com:${toxiproxy}"

docker network create --subnet=${NET} toxiNetwork

start the toxiproxy docker container

docker pull shopify/toxiproxy
docker run -it -p 8474:8474 -p 8888:8888 --rm --ip ${toxiproxy} --net toxiNetwork ${HOSTS} --name toxiproxy shopify/toxiproxy

start squid

docker run -d -p 3128:3128 --rm --ip ${squid} --net toxiNetwork ${HOSTS} --name squid minimum2scp/squid

create a configuration file with two redirection rules

cat <<EOF > toxiproxy.json
[{
  "name": "squid",
  "listen": "${toxiproxy}:8888",
  "upstream": "${squid}:3128"
}]
EOF

load the configuration in toxiproxy

curl -X POST http://127.0.0.1:8474/populate -d"@toxiproxy.json" && echo

this command enable/disable the Proxy, we will use it to simulate network outages.

curl -X POST http://127.0.0.1:8474/proxies/squid -d '{"enabled": true}' && echo

Simulate a TIME second continuous network outage

launch the jobs, then execute this script,
you have to change the variable TIME to test 1,5,10,30 seconds of outages
per 1 second of connection.

export TIME=1
while [ true ]; 
do
  curl -X POST http://127.0.0.1:8474/proxies/squid -d '{"enabled": true}' && echo
  sleep 1
  curl -X POST http://127.0.0.1:8474/proxies/squid -d '{"enabled": false}' && echo
  sleep ${TIME}
done

Simulate a TIME second isolated network outage

launch the jobs, then execute this script once on each job stage,
you have to change the variable TIME to test 1,5,10,30 seconds of outages.

export TIME=1

curl -X POST http://127.0.0.1:8474/proxies/squid -d '{"enabled": true}' && echo
sleep 1
curl -X POST http://127.0.0.1:8474/proxies/squid -d '{"enabled": false}' && echo
sleep ${TIME}
curl -X POST http://127.0.0.1:8474/proxies/squid -d '{"enabled": true}' && echo
sleep 1
curl -X POST http://127.0.0.1:8474/proxies/squid -d '{"enabled": false}' && echo
sleep ${TIME}
curl -X POST http://127.0.0.1:8474/proxies/squid -d '{"enabled": true}' && echo

Simulate latency

Create the toxic and excute the test jobs

curl -X POST http://127.0.0.1:8474/proxies/squid/toxics -d '{"name": "latency_squid", "type": "latency", "stream": "upstream", "toxicity": 1.0, "attributes": {"latency": 1000, "jitter": 1000} }' && echo

when you finished delete the toxic

curl -X DELETE http://127.0.0.1:8474/proxies/squid/toxics/latency_squid

Simulate bandwidth limitations

Create the toxic and excute the test jobs

curl -X POST http://127.0.0.1:8474/proxies/squid/toxics -d '{"name": "bandwidth_squid", "type": "bandwidth", "stream": "upstream", "toxicity": 1.0, "attributes": {"rate": 1024} }' && echo

when you finished delete the toxic

curl -X DELETE http://127.0.0.1:8474/proxies/squid/toxics/bandwidth_squid

Simulate slow_close

Create the toxic and excute the test jobs

curl -X POST http://127.0.0.1:8474/proxies/squid/toxics -d '{"name": "slowclose_squid", "type": "slow_close", "stream": "upstream", "toxicity": 1.0, "attributes": {"delay": 1000} }' && echo

when you finished delete the toxic

curl -X DELETE http://127.0.0.1:8474/proxies/squid/toxics/slowclose_squid

Test Results

Continuous networks outages

We simulate a network outage of N seconds, then we restore the network for one second,
and start a new network outage, we repeat this process until the job finished.

Big files we run a test job that archives a file of 1GB, and we try different network outage times

1 second - it fails consistenly
5 second - it fails consistenly
10 second - it fails consistenly
30 second - it fails consistenly

Small files

we run a test job that archive and unarchive a few files, and we try different network outage times

1 second - archive is not affected, unarchive fails 90% of times
5 second - archive is not affected, unarchive fails 90% of times
10 second - archive is not affected, unarchive fails 90% of times
30 second - archive is not affected, unarchive fails 90% of times

Stash

we run a test job that stash and unstash a few files, and we try different network outage times

1 second - stash is not affected, unstash fails 70% of times
5 second - stash is not affected, unstash fails 70% of times
10 second - stash is not affected, unstash fails 80% of times
30 second - stash is not affected, unstash fails 90% of times

Isolated network outages

We simulate a network outage of N seconds, then we restore the network for one second,
and start a new network outage, finally, we restore the network again.

Big files we run a test job that archives a file of 1GB, and we try different network outage times

1 second - it is not affected
5 second - it is not affected
10 second - it is not affected, we can see reties messages on the logs
30 second - it is not affected, we can see reties messages on the logs

Small files

we run a test job that archive and unarchive a few files, and we try different network outage times

1 second - it is not affected
5 second - archive is not affected, we can see reties messages on the logs. Unarchive fails consistenly.
10 second - archive is not affected, we can see reties messages on the logs. Unarchive fails consistenly.
30 second - archive is not affected, we can see reties messages on the logs. Unarchive fails consistenly.

Stash

we run a test job that stash and unstash a few files, and we try different network outage times

1 second - archive is not affected, we can see reties messages on the logs. Unarchive fails consistenly.
5 second - archive is not affected, we can see reties messages on the logs. Unarchive fails consistenly.
10 second - archive is not affected, we can see reties messages on the logs. Unarchive fails consistenly.
30 second - archive is not affected, we can see reties messages on the logs. Unarchive fails consistenly.

Latency and jitter

we create a toxics with latency and jitter.

1000 ms - not affected
10000 ms - increases times
30000 ms - fails consistenly

Limited bandwidth

we create a toxics to limit the bandwidth.

1 KB/s - increases times
100 KB/s - increases times
1024 KB/s - increases times
10240 KB/s - increases times

Slow close

we create a toxics that delay the TCP socket from closing until delay has elapsed.

1000 ms - not affected
10000 ms - increases times
30000 ms - increases times

links to

CloudBees Internal ARC-391

Assignee:: Ivan Fernandez Calvo
Reporter:: Ivan Fernandez Calvo

Created:: 2018-06-12 10:51
Updated:: 2025-12-02 20:32
Resolved:: 2020-11-23 09:48
Archived:: 2025-12-02 20:32

Details

Description

Network outages test

Test Scripts

Big-file test

Small-files Test

Stash Test

Prepare the environment

Simulate a TIME second continuous network outage

Simulate a TIME second isolated network outage

Simulate latency

Simulate bandwidth limitations

Simulate slow_close

Test Results

Continuous networks outages

Big files we run a test job that archives a file of 1GB, and we try different network outage times

Small files

Stash

Isolated network outages

Big files we run a test job that archives a file of 1GB, and we try different network outage times

Small files

Stash

Latency and jitter

Limited bandwidth

Slow close

Attachments

Issue Links

Activity

People

Dates