Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-43236

Spot instance termination leaves executors in temporary broken state

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • ec2-plugin
    • None
    • 1.36 ec2 plugin

      Sometimes when AWS terminates one of our spot instances, it gets into a state where Jenkins still thinks its a valid, available executor, but it is in the process of shutting down and therefore cannot fulfill any requests.  When this occurs, our entire backlog of tests rapidly flushes through that executor, failing all of them.

      Sometimes the executor is totally broken like so:

      00:00:00.002 Started by remote host 140.211.10.27
      00:00:00.002 [EnvInject] - Loading node environment variables.
      00:00:27.357 FATAL: java.io.IOException: Unexpected termination of the channel
      00:00:27.358 hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
      00:00:27.359 	at hudson.remoting.Request.abort(Request.java:303)
      00:00:27.360 	at hudson.remoting.Channel.terminate(Channel.java:863)
      00:00:27.360 	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:92)
      00:00:27.360 	at ......remote call to Testrunner (sir-tdd89gzm)(Native Method)
      00:00:27.361 	at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1433)
      00:00:27.361 	at hudson.remoting.Request.call(Request.java:172)
      00:00:27.361 	at hudson.remoting.Channel.call(Channel.java:796)
      00:00:27.362 	at hudson.FilePath.act(FilePath.java:1102)
      00:00:27.362 	at org.jenkinsci.plugins.envinject.service.EnvironmentVariablesNodeLoader.gatherEnvironmentVariablesNode(EnvironmentVariablesNodeLoader.java:48)
      00:00:27.363 	at org.jenkinsci.plugins.envinject.EnvInjectListener.loadEnvironmentVariablesNode(EnvInjectListener.java:80)
      00:00:27.363 	at org.jenkinsci.plugins.envinject.EnvInjectListener.setUpEnvironment(EnvInjectListener.java:42)
      00:00:27.364 	at hudson.model.AbstractBuild$AbstractBuildExecution.createLauncher(AbstractBuild.java:572)
      00:00:27.364 	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:492)
      00:00:27.365 	at hudson.model.Run.execute(Run.java:1720)
      00:00:27.365 	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
      00:00:27.365 	at hudson.model.ResourceController.execute(ResourceController.java:98)
      00:00:27.365 	at hudson.model.Executor.run(Executor.java:404)
      00:00:27.366 Caused by: java.io.IOException: Unexpected termination of the channel
      00:00:27.366 	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:73)
      00:00:27.367 Caused by: java.io.EOFException
      00:00:27.367 	at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2335)
      00:00:27.367 	at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2804)
      00:00:27.368 	at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:802)
      00:00:27.368 	at java.io.ObjectInputStream.<init>(ObjectInputStream.java:299)
      00:00:27.368 	at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:48)
      00:00:27.369 	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
      00:00:27.369 	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:59)
      00:00:27.370 ERROR: Step ‘Publish Checkstyle analysis results’ failed: no workspace for drupal_patches #8845
      00:00:27.371 ERROR: Step ‘Archive the artifacts’ failed: no workspace for drupal_patches #8845
      00:00:27.372 Checking console output
      00:00:27.373 ERROR: Step ‘Publish JUnit test result report’ failed: no workspace for drupal_patches #8845
      00:00:27.396 Finished: FAILURE
      

      Other times it seems like the instance is shutting down, which kills the docker daemon before it kills the jenkins executor availability:

      00:00:00.001 Started by remote host 140.211.10.27
      00:00:00.001 [EnvInject] - Loading node environment variables.
      00:00:00.007 Building remotely on Testrunner (sir-yjpgbk4n) (testrunner) in workspace /var/lib/drupalci/workspace
      00:00:00.018 [workspace] $ /bin/bash /tmp/hudson6328886945950906962.sh
      00:00:00.124 Cannot connect to the Docker daemon. Is the docker daemon running on this host?
      00:00:00.141 Cannot connect to the Docker daemon. Is the docker daemon running on this host?
      00:00:00.151 ++ id
      00:00:00.152 uid=1001(testbot) gid=1001(testbot) groups=1001(testbot),27(sudo),999(docker)
      00:00:00.153 ++ export COMPOSER_CACHE_DIR=/opt/drupalci/composer-cache
      00:00:00.153 ++ COMPOSER_CACHE_DIR=/opt/drupalci/composer-cache
      00:00:00.153 ++ echo https://www.drupal.org/pift-ci-job/635848
      00:00:00.153 https://www.drupal.org/pift-ci-job/635848
      00:00:00.154 ++ curl -w '\n' -s http://169.254.169.254/latest/meta-data/instance-type
      00:00:00.159 cc2.8xlarge
      00:00:00.159 ++ curl -w '\n' -s http://169.254.169.254/latest/meta-data/ami-id
      00:00:00.165 ami-3c42c35c
      00:00:00.166 ++ curl -w '\n' -s http://169.254.169.254/latest/meta-data/public-ipv4
      00:00:00.172 54.212.244.41
      00:00:00.172 ++ env
      00:00:00.172 ++ grep DCI
      00:00:00.173 DCI_CS_CoderVersion=8.2.8
      00:00:00.173 DCI_PHPVersion=php-5.6-apache:production
      00:00:00.173 DCI_JobType=simpletest
      00:00:00.174 DCI_CoreBranch=8.4.x
      00:00:00.174 DCI_Patch=rpc_endpoint_to_reset-2847708-24.patch,.
      00:00:00.174 DCI_Debug=FALSE
      00:00:00.174 DCI_ES_LintFailsTest=TRUE
      00:00:00.174 DCI_Fetch=https://www.drupal.org/files/issues/rpc_endpoint_to_reset-2847708-24.patch,.
      00:00:00.175 DCI_Concurrency=31
      00:00:00.175 DCI_CoreRepository=git://git.drupal.org/project/drupal.git
      00:00:00.175 DCI_DBVersion=mysql-5.5
      00:00:00.175 ++ env
      00:00:00.175 ++ grep -v DCI
      00:00:00.175 BUILD_URL=http://dispatcher-origin.drupalci.aws:8080/job/drupal_patches/8786/
      00:00:00.176 SHELL=/bin/bash
      00:00:00.176 HUDSON_SERVER_COOKIE=f9f94f9baaa33b04
      00:00:00.176 SSH_CLIENT=172.31.42.62 35896 22
      00:00:00.176 BUILD_TAG=jenkins-drupal_patches-8786
      00:00:00.177 ROOT_BUILD_CAUSE=REMOTECAUSE
      00:00:00.177 JOB_URL=http://dispatcher-origin.drupalci.aws:8080/job/drupal_patches/
      00:00:00.177 WORKSPACE=/var/lib/drupalci/workspace
      00:00:00.177 USER=testbot
      00:00:00.177 ROOT_BUILD_CAUSE_REMOTECAUSE=true
      00:00:00.178 COMPOSER_CACHE_DIR=/opt/drupalci/composer-cache
      00:00:00.178 JENKINS_HOME=/usr/local/jenkins
      00:00:00.178 MAIL=/var/mail/testbot
      00:00:00.178 PATH=/usr/local/bin:/usr/bin:/bin:/usr/games
      00:00:00.178 PWD=/var/lib/drupalci/workspace
      00:00:00.178 HUDSON_URL=http://dispatcher-origin.drupalci.aws:8080/
      00:00:00.179 LANG=en_US.UTF-8
      00:00:00.179 JOB_NAME=drupal_patches
      00:00:00.179 BUILD_CAUSE_REMOTECAUSE=true
      00:00:00.179 BUILD_DISPLAY_NAME=#8786
      00:00:00.179 BUILD_ID=8786
      00:00:00.179 BUILD_CAUSE=REMOTECAUSE
      00:00:00.179 JENKINS_URL=http://dispatcher-origin.drupalci.aws:8080/
      00:00:00.180 Drupal_JobID=https://www.drupal.org:635848
      00:00:00.180 JOB_BASE_NAME=drupal_patches
      00:00:00.180 SHLVL=3
      00:00:00.180 HOME=/home/testbot
      00:00:00.180 EXECUTOR_NUMBER=0
      00:00:00.180 JENKINS_SERVER_COOKIE=f9f94f9baaa33b04
      00:00:00.181 NODE_LABELS=Testrunner (sir-yjpgbk4n) testrunner
      00:00:00.181 LOGNAME=testbot
      00:00:00.181 SSH_CONNECTION=172.31.42.62 35896 172.31.0.168 22
      00:00:00.181 HUDSON_HOME=/usr/local/jenkins
      00:00:00.181 NODE_NAME=Testrunner (sir-yjpgbk4n)
      00:00:00.181 BUILD_NUMBER=8786
      00:00:00.182 Testrunner_Branch=production
      00:00:00.182 HUDSON_COOKIE=90c8c0be-3081-4c6c-a8c6-e79cc8e16f5c
      00:00:00.182 _=/usr/bin/env
      00:00:00.182 ++ cd /opt/drupalci/testrunner
      00:00:00.182 ++ git fetch --all --tags
      00:00:00.183 Fetching origin
      00:00:00.275 ++ git checkout production
      00:00:00.278 Already on 'production'
      00:00:00.278 Your branch is up-to-date with 'origin/production'.
      00:00:00.278 ++ git pull --rebase
      00:00:00.377 Current branch production is up to date.
      00:00:00.379 ++ docker pull drupalci/php-5.6-apache:production
      00:00:00.387 Warning: failed to get default registry endpoint from daemon (Cannot connect to the Docker daemon. Is the docker daemon running on this host?). Using system default: https://index.docker.io/v1/
      00:00:00.388 Cannot connect to the Docker daemon. Is the docker daemon running on this host?
      00:00:00.394 Build step 'Execute shell' marked build as failure
      00:00:00.458 [CHECKSTYLE] Collecting checkstyle analysis files...
      00:00:00.501 [CHECKSTYLE] Finding all files that match the pattern jenkins-drupal_patches-8786/artifacts/*/checkstyle.xml
      00:00:00.504 [CHECKSTYLE] Computing warning deltas based on reference build #8777
      00:00:00.504 Archiving artifacts
      00:00:00.507 Checking console output
      00:00:00.507 Recording test results
      00:00:00.510 ERROR: Step ‘Publish JUnit test result report’ failed: No test report files were found. Configuration error?
      00:00:00.533 Finished: FAILURE

       

       

          [JENKINS-43236] Spot instance termination leaves executors in temporary broken state

          There are no comments yet on this issue.

            francisu Francis Upton
            mixologic Ryan Aslett
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: