Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-55078

No longer able to start ECS slaves

    XMLWordPrintable

Details

    • v1.19

    Description

      We use the ECS plugin to start dynamic slaves when starting jobs. This worked fine, but suddenly stopped working.
      On the UI, all we get is: ‘Jenkins’ doesn’t have label <label>
      The different labels are still in the Jenkins settings however.
      If I look at the log it shows me the following, but it doesn't give any reason or explanation, that would help me understand, what could be causing this.
      Dec 07, 2018 1:02:29 PM com.cloudbees.jenkins.plugins.amazonecs.ECSLauncher runECSTask WARNING: [ECS-wlxkv]: Failure to run task with definition arn:aws:ecs:eu-central-1:x:task-definition/ECS-mgmt:7281 on ECS cluster arn:aws:ecs:eu-central-1:x:cluster/default
      Dec 07, 2018 1:02:29 PM com.cloudbees.jenkins.plugins.amazonecs.ECSLauncher runECSTask WARNING: [ECS-wlxkv]: Failure reason=ATTRIBUTE, arn=arn:aws:ecs:eu-central-1:x:container-instance/7b1c9cc6-7eb8-43fc-81cc-5a9d59d09d84
      Dec 07, 2018 1:02:29 PM com.cloudbees.jenkins.plugins.amazonecs.ECSLauncher runECSTask WARNING: [ECS-wlxkv]: Failure reason=ATTRIBUTE, arn=arn:aws:ecs:eu-central-1:x:container-instance/8f76d62a-398f-47b0-a883-50f329b41e8c
      Dec 07, 2018 1:02:29 PM com.cloudbees.jenkins.plugins.amazonecs.ECSLauncher runECSTask WARNING: [ECS-wlxkv]: Failure reason=ATTRIBUTE, arn=arn:aws:ecs:eu-central-1:x:container-instance/f3e2fddb-17ff-4fe4-9446-eaba4b05b298
      Dec 07, 2018 1:02:29 PM com.cloudbees.jenkins.plugins.amazonecs.ECSLauncher launch WARNING: [ECS-wlxkv]: Error in provisioning; agent=com.cloudbees.jenkins.plugins.amazonecs.ECSSlave[ECS-wlxkv] hudson.AbortException: Failed to run agent container ECS-wlxkv at com.cloudbees.jenkins.plugins.amazonecs.ECSLauncher.runECSTask(ECSLauncher.java:227) at com.cloudbees.jenkins.plugins.amazonecs.ECSLaunche
      Dec 07, 2018 1:02:29 PM com.cloudbees.jenkins.plugins.amazonecs.ECSLauncher launch
      WARNING: [ECS-wlxkv]: Error in provisioning; agent=com.cloudbees.jenkins.plugins.amazonecs.ECSSlave[ECS-wlxkv]
      hudson.AbortException: Failed to run agent container ECS-wlxkv
      at com.cloudbees.jenkins.plugins.amazonecs.ECSLauncher.runECSTask(ECSLauncher.java:227)
      at com.cloudbees.jenkins.plugins.amazonecs.ECSLauncher.launch(ECSLauncher.java:108)
      at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:294)
      at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
      at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      at java.lang.Thread.run(Thread.java:748)
      Dec 07, 2018 1:02:29 PM com.cloudbees.jenkins.plugins.amazonecs.ECSSlave _terminate INFO: [ECS-wlxkv]: Terminating ECS Task: null

      Attachments

        Issue Links

          Activity

            autarchprinceps autarch princeps created issue -
            autarchprinceps autarch princeps made changes -
            Field Original Value New Value
            Comment [ This is what I think the slave log says in correspondence:
            13:07:48
            Warning: JnlpProtocol3 is disabled by default, use JNLP_PROTOCOL_OPTS to alter the behavior

            13:07:50
            Dec 07, 2018 1:07:50 PM hudson.remoting.jnlp.Main createEngine

            13:07:50
            INFO: Setting up agent: ECS-ea9fff4b4f18

            13:07:50
            Dec 07, 2018 1:07:50 PM hudson.remoting.jnlp.Main$CuiListener <init>

            13:07:50
            INFO: Jenkins agent is running in headless mode.

            13:07:50
            Dec 07, 2018 1:07:50 PM hudson.remoting.Engine startEngine

            13:07:50
            INFO: Using Remoting version: 3.27

            13:07:50
            Dec 07, 2018 1:07:50 PM hudson.remoting.Engine startEngine

            13:07:50
            WARNING: No Working Directory. Using the legacy JAR Cache location: /home/jenkins/.jenkins/cache/jars

            13:07:50
            Dec 07, 2018 1:07:50 PM hudson.remoting.jnlp.Main$CuiListener status

            13:07:50
            INFO: Locating server among [https://jenkins.prod.mgmt.aws-x.com/]

            13:07:52
            Dec 07, 2018 1:07:52 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve

            13:07:52
            INFO: Remoting server accepts the following protocols: [JNLP4-connect, Ping]

            13:07:52
            Dec 07, 2018 1:07:52 PM hudson.remoting.jnlp.Main$CuiListener status

            13:07:52
            INFO: Agent discovery successful

            13:07:52
            Agent address: jenkins.prod.mgmt.aws-x.com

            13:07:52
            Agent port: 50000

            13:07:52
            Identity: 65:c7:02:f6:43:aa:38:15:a0:6d:f6:79:89:64:62:4c

            13:07:52
            Dec 07, 2018 1:07:52 PM hudson.remoting.jnlp.Main$CuiListener status

            13:07:52
            INFO: Handshaking

            13:07:52
            Dec 07, 2018 1:07:52 PM hudson.remoting.jnlp.Main$CuiListener status

            13:07:52
            INFO: Connecting to jenkins.prod.mgmt.aws-x.com:50000

            13:07:52
            Dec 07, 2018 1:07:52 PM hudson.remoting.jnlp.Main$CuiListener status

            13:07:52
            INFO: Trying protocol: JNLP4-connect

            13:07:53
            Dec 07, 2018 1:07:53 PM hudson.remoting.jnlp.Main$CuiListener status

            13:07:53
            INFO: Remote identity confirmed: 65:c7:02:f6:43:aa:38:15:a0:6d:f6:79:89:64:62:4c

            13:08:02
            Dec 07, 2018 1:08:02 PM hudson.remoting.jnlp.Main$CuiListener status

            13:08:02
            INFO: Connected

            13:08:26
            Dec 07, 2018 1:08:26 PM org.jenkinsci.remoting.util.AnonymousClassWarnings warn

            13:08:26
            WARNING: Attempt to (de-)serialize anonymous class org.jenkinsci.plugins.envinject.EnvInjectComputerListener$2; see: https://jenkins.io/redirect/serialization-of-anonymous-classes/

            13:08:27
            Dec 07, 2018 1:08:27 PM org.jenkinsci.remoting.util.AnonymousClassWarnings warn

            13:08:27
            WARNING: Attempt to (de-)serialize anonymous class org.jenkinsci.plugins.gitclient.Git$1; see: https://jenkins.io/redirect/serialization-of-anonymous-classes/

            13:08:47
            Dec 07, 2018 1:08:47 PM org.jenkinsci.remoting.util.AnonymousClassWarnings warn

            13:08:47
            WARNING: Attempt to (de-)serialize anonymous class org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1; see: https://jenkins.io/redirect/serialization-of-anonymous-classes/ ]

            I was able to fix the issue for me, by downgrading the plugin, so that seems to indicate a bug of some kind

            autarchprinceps autarch princeps added a comment - I was able to fix the issue for me, by downgrading the plugin, so that seems to indicate a bug of some kind

            I spoke with the AWS Support and they said the problem was the addition of the attribute restraint ecs.capability.execution-role-awslogs. Previously the task definition only had com.amazonaws.ecs.capability.logging-driver.awslogs, which was sufficient to successfully report to CloudWatch Logs.
            The reason for that is, that before it didn't set a task execution role. if you specify none, now it suddenly assumes the role is named ecsTaskExecutionRole, if none is set. If you don't have that role setup, which is completely unnecessary if you work with EC2 and not Fargate, it will fail like this.
            I will try to create this role and update the plugin again, but it is a terrible idea to just do incompatible changes like this. Supporting Fargate and therefore adding Task Execution Role is great, but the default shouldn't just change for the rest of us.
            Also, it is now impossible to even force it to run without task execution role, even if that is what you want.

            autarchprinceps autarch princeps added a comment - I spoke with the AWS Support and they said the problem was the addition of the attribute restraint ecs.capability.execution-role-awslogs. Previously the task definition only had com.amazonaws.ecs.capability.logging-driver.awslogs, which was sufficient to successfully report to CloudWatch Logs. The reason for that is, that before it didn't set a task execution role. if you specify none, now it suddenly assumes the role is named ecsTaskExecutionRole, if none is set. If you don't have that role setup, which is completely unnecessary if you work with EC2 and not Fargate, it will fail like this. I will try to create this role and update the plugin again, but it is a terrible idea to just do incompatible changes like this. Supporting Fargate and therefore adding Task Execution Role is great, but the default shouldn't just change for the rest of us. Also, it is now impossible to even force it to run without task execution role, even if that is what you want.

            Even creating the role doesn't make the new task definition work. It still fails at the capabilities.

            autarchprinceps autarch princeps added a comment - Even creating the role doesn't make the new task definition work. It still fails at the capabilities.
            sgraber Stefan Graber added a comment -

            Same here... starting tasks with ecs-plugin 1.18 does not work anymore. Downgrade to 1.17 works.

            sgraber Stefan Graber added a comment - Same here... starting tasks with ecs-plugin 1.18 does not work anymore. Downgrade to 1.17 works.
            pgarbe Philipp Garbe made changes -
            Assignee Jan Roehrich [ roehrijn2 ] Philipp Garbe [ pgarbe ]

            Hey, I also have the issue and almost have it traced down with a working workaround but not yet a Fix. Hope i can provide a PR soon.

            webrat Andreas Sieferlinger added a comment - Hey, I also have the issue and almost have it traced down with a working workaround but not yet a Fix. Hope i can provide a PR soon.
            pgarbe Philipp Garbe added a comment - See https://github.com/jenkinsci/amazon-ecs-plugin/pull/78
            pgarbe Philipp Garbe made changes -
            Released As v1.19
            Resolution Fixed [ 1 ]
            Status Open [ 1 ] Fixed but Unreleased [ 10203 ]
            pgarbe Philipp Garbe made changes -
            Link This issue is duplicated by JENKINS-54893 [ JENKINS-54893 ]
            pgarbe Philipp Garbe made changes -
            Status Fixed but Unreleased [ 10203 ] Resolved [ 5 ]
            pgarbe Philipp Garbe made changes -
            Status Resolved [ 5 ] Closed [ 6 ]

            People

              pgarbe Philipp Garbe
              autarchprinceps autarch princeps
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: