[JENKINS-53790] Kubernetes plugin shows failing templates to only admins

Type: Improvement
Resolution: Fixed
Priority: Major
Component/s: kubernetes-plugin
Labels:
None

Similar Issues:
Powered by SuggestiMate

Show
Released As:
kubernetes 1.24.0

Background:
We started leveraging the Kubernetes plugin to define agents using kubernetes templates. This is a great new feature but allows non admins to generate new templates even within their pipelines. But since these non admins do not have access to the Kubernetes back end or the logging within Jenkins, they do not see when or why one of these templates fails

Issue:
When a non-admin user creates a k8s template which is badly formed they are unable to see that the container/pod is failing because it is just "waiting on $LABEL"

Steps to reproduce:

Create a pipeline job
Create a template in that job with a badly defined docker image name
Watch the job fail to start because it can not find its label
If you are not an admin you can not see why the container/pod is failing to start because you can not access the k8s logs or the `Manage Jenkins> System Log` area of Jenkins to create a custom logger and see the cause for failure
Resolution:
We need a way in the job or similar to see why the container is failing to start, perhaps just a return code from Kubernetes. Or we need to not allow them to define templates on a job level so that non-admins can not create templates at all.

is duplicated by

JENKINS-56396 A pipeline with a bad containerTemplate does not fail

Closed

is related to

JENKINS-53205 Validate Pipeline Pod Template Values

Open

links to

PR 440

Alex Taylor added a comment - 2018-09-26 13:50

Right now the only workaround is to do some strange-ness within a pipeline where you create a pipeline and parallel the template along with a groovy script(In my case I included it within a global shared library).

So here are the steps I used:
1. Create a system log called "Kubernetes Log" with the `org.csanchez.jenkins.plugins.kubernetes` logger set to ALL
2. Create a pipeline job with a parallel statement which runs the template and loops through the following script until the template spins up:

import java.util.logging.Level
import java.util.logging.Logger
import hudson.logging.LogRecorderManager
import hudson.logging.LogRecorder
import java.util.logging.LogRecord
import hudson.util.RingBufferLogHandler;


def AgentName= searchString
List<LogRecord> records = new ArrayList<LogRecord>();

//Grabs the log manager
LogRecorderManager mgr = Jenkins.instance.getLog();

//Grabs the records
mgr.logRecorders.each{
   if (it.getValue().getName() == "KubernetesLog")
   {
     records = it.getValue().getLogRecords()
   }
 }

//Iterates over the record messages looking for the agent name
for (LogRecord r : records) {
  if (r.getMessage().contains(AgentName)){
    println(r.getMessage().toString())
  }
}

//Clears the logger
mgr.logRecorders.each{
   if (it.getValue().getName() == "KubernetesLog")
   {
     it.getValue().doClear()
   }
 }

This will show the messages relating to `searchString` which should be the name of your template.

Alex Taylor added a comment - 2018-09-26 13:50 Right now the only workaround is to do some strange-ness within a pipeline where you create a pipeline and parallel the template along with a groovy script(In my case I included it within a global shared library). So here are the steps I used: 1. Create a system log called "Kubernetes Log" with the `org.csanchez.jenkins.plugins.kubernetes` logger set to ALL 2. Create a pipeline job with a parallel statement which runs the template and loops through the following script until the template spins up: import java.util.logging.Level import java.util.logging.Logger import hudson.logging.LogRecorderManager import hudson.logging.LogRecorder import java.util.logging.LogRecord import hudson.util.RingBufferLogHandler; def AgentName= searchString List<LogRecord> records = new ArrayList<LogRecord>(); //Grabs the log manager LogRecorderManager mgr = Jenkins.instance.getLog(); //Grabs the records mgr.logRecorders.each{ if (it.getValue().getName() == "KubernetesLog" ) { records = it.getValue().getLogRecords() } } //Iterates over the record messages looking for the agent name for (LogRecord r : records) { if (r.getMessage().contains(AgentName)){ println(r.getMessage().toString()) } } //Clears the logger mgr.logRecorders.each{ if (it.getValue().getName() == "KubernetesLog" ) { it.getValue().doClear() } } This will show the messages relating to `searchString` which should be the name of your template.

Pierson Yieh added a comment - 2019-02-07 23:16 - edited

We implemented a change to the kubernetes-plugin that will check the message of containers in the Waiting state. The message will contain the String "Back-off pulling image" when it can't locate the docker image (e.g. a badly defined docker image). We then grab the corresponding build job from the Jenkins Queue, print a message to the build's console output to notify users that they've specified a bad docker image, then cancel the build. Canceling the job and not simply labeling it as failed was necessary or else Jenkins would continuously re-try to create the Kubernetes pod using the bad docker image and fail.
Our solution solves the problem of customers not knowing why their job is stuck in a perpetual limbo due to a bad docker image and not knowing / having permissions to view the kubernetes logs, as well as the problem of jobs being stuck in the aforementioned perpetual waiting state due to a malformed docker image.

We are currently in the process of refining it and will submit a formal PR once that's ready. Any suggestions and comments would be appreciated.

Pierson Yieh added a comment - 2019-02-07 23:16 - edited We implemented a change to the kubernetes-plugin that will check the message of containers in the Waiting state. The message will contain the String "Back-off pulling image" when it can't locate the docker image (e.g. a badly defined docker image). We then grab the corresponding build job from the Jenkins Queue, print a message to the build's console output to notify users that they've specified a bad docker image, then cancel the build. Canceling the job and not simply labeling it as failed was necessary or else Jenkins would continuously re-try to create the Kubernetes pod using the bad docker image and fail. Our solution solves the problem of customers not knowing why their job is stuck in a perpetual limbo due to a bad docker image and not knowing / having permissions to view the kubernetes logs, as well as the problem of jobs being stuck in the aforementioned perpetual waiting state due to a malformed docker image. We are currently in the process of refining it and will submit a formal PR once that's ready. Any suggestions and comments would be appreciated.

Pierson Yieh added a comment - 2019-03-04 22:35 - edited

UPDATE

Here is the PR with our suggested changes: https://github.com/jenkinsci/kubernetes-plugin/pull/440

Pierson Yieh added a comment - 2019-03-04 22:35 - edited UPDATE Here is the PR with our suggested changes: https://github.com/jenkinsci/kubernetes-plugin/pull/440

Jenkins

Details

Description

Attachments

Issue Links

Activity

Collapse comment: Alex Taylor added a comment - 2018-09-26 13:50

Expand comment: Alex Taylor added a comment - 2018-09-26 13:50

Collapse comment: Pierson Yieh added a comment - 2019-02-07 23:16, Edited by Pierson Yieh - 2019-02-07 23:17

Expand comment: Pierson Yieh added a comment - 2019-02-07 23:16, Edited by Pierson Yieh - 2019-02-07 23:17

Collapse comment: Pierson Yieh added a comment - 2019-03-04 22:35, Edited by Pierson Yieh - 2019-03-04 22:35

Expand comment: Pierson Yieh added a comment - 2019-03-04 22:35, Edited by Pierson Yieh - 2019-03-04 22:35

People

Dates