Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-53790

Kubernetes plugin shows failing templates to only admins

    • Icon: Improvement Improvement
    • Resolution: Fixed
    • Icon: Major Major
    • kubernetes-plugin
    • None
    • kubernetes 1.24.0

      Background:
      We started leveraging the Kubernetes plugin to define agents using kubernetes templates. This is a great new feature but allows non admins to generate new templates even within their pipelines. But since these non admins do not have access to the Kubernetes back end or the logging within Jenkins, they do not see when or why one of these templates fails

      Issue:
      When a non-admin user creates a k8s template which is badly formed they are unable to see that the container/pod is failing because it is just "waiting on $LABEL"

      Steps to reproduce:

      Create a pipeline job
      Create a template in that job with a badly defined docker image name
      Watch the job fail to start because it can not find its label
      If you are not an admin you can not see why the container/pod is failing to start because you can not access the k8s logs or the `Manage Jenkins> System Log` area of Jenkins to create a custom logger and see the cause for failure
      Resolution:
      We need a way in the job or similar to see why the container is failing to start, perhaps just a return code from Kubernetes. Or we need to not allow them to define templates on a job level so that non-admins can not create templates at all.

          [JENKINS-53790] Kubernetes plugin shows failing templates to only admins

          Alex Taylor added a comment -

          Right now the only workaround is to do some strange-ness within a pipeline where you create a pipeline and parallel the template along with a groovy script(In my case I included it within a global shared library).

          So here are the steps I used:
          1. Create a system log called "Kubernetes Log" with the `org.csanchez.jenkins.plugins.kubernetes` logger set to ALL
          2. Create a pipeline job with a parallel statement which runs the template and loops through the following script until the template spins up:

          import java.util.logging.Level
          import java.util.logging.Logger
          import hudson.logging.LogRecorderManager
          import hudson.logging.LogRecorder
          import java.util.logging.LogRecord
          import hudson.util.RingBufferLogHandler;
          
          
          def AgentName= searchString
          List<LogRecord> records = new ArrayList<LogRecord>();
          
          //Grabs the log manager
          LogRecorderManager mgr = Jenkins.instance.getLog();
          
          //Grabs the records
          mgr.logRecorders.each{
             if (it.getValue().getName() == "KubernetesLog")
             {
               records = it.getValue().getLogRecords()
             }
           }
          
          //Iterates over the record messages looking for the agent name
          for (LogRecord r : records) {
            if (r.getMessage().contains(AgentName)){
              println(r.getMessage().toString())
            }
          }
          
          //Clears the logger
          mgr.logRecorders.each{
             if (it.getValue().getName() == "KubernetesLog")
             {
               it.getValue().doClear()
             }
           }
          

          This will show the messages relating to `searchString` which should be the name of your template.

          Alex Taylor added a comment - Right now the only workaround is to do some strange-ness within a pipeline where you create a pipeline and parallel the template along with a groovy script(In my case I included it within a global shared library). So here are the steps I used: 1. Create a system log called "Kubernetes Log" with the `org.csanchez.jenkins.plugins.kubernetes` logger set to ALL 2. Create a pipeline job with a parallel statement which runs the template and loops through the following script until the template spins up: import java.util.logging.Level import java.util.logging.Logger import hudson.logging.LogRecorderManager import hudson.logging.LogRecorder import java.util.logging.LogRecord import hudson.util.RingBufferLogHandler; def AgentName= searchString List<LogRecord> records = new ArrayList<LogRecord>(); //Grabs the log manager LogRecorderManager mgr = Jenkins.instance.getLog(); //Grabs the records mgr.logRecorders.each{ if (it.getValue().getName() == "KubernetesLog" ) { records = it.getValue().getLogRecords() } } //Iterates over the record messages looking for the agent name for (LogRecord r : records) { if (r.getMessage().contains(AgentName)){ println(r.getMessage().toString()) } } //Clears the logger mgr.logRecorders.each{ if (it.getValue().getName() == "KubernetesLog" ) { it.getValue().doClear() } } This will show the messages relating to `searchString` which should be the name of your template.

          Pierson Yieh added a comment - - edited

          We implemented a change to the kubernetes-plugin that will check the message of containers in the Waiting state. The message will contain the String "Back-off pulling image" when it can't locate the docker image (e.g. a badly defined docker image). We then grab the corresponding build job from the Jenkins Queue, print a message to the build's console output to notify users that they've specified a bad docker image, then cancel the build. Canceling the job and not simply labeling it as failed was necessary or else Jenkins would continuously re-try to create the Kubernetes pod using the bad docker image and fail. 
          Our solution solves the problem of customers not knowing why their job is stuck in a perpetual limbo due to a bad docker image and not knowing / having permissions to view the kubernetes logs, as well as the problem of jobs being stuck in the aforementioned perpetual waiting state due to a malformed docker image. 

          We are currently in the process of refining it and will submit a formal PR once that's ready. Any suggestions and comments would be appreciated.

          Pierson Yieh added a comment - - edited We implemented a change to the kubernetes-plugin that will check the message of containers in the Waiting state. The message will contain the String "Back-off pulling image" when it can't locate the docker image (e.g. a badly defined docker image). We then grab the corresponding build job from the Jenkins Queue, print a message to the build's console output to notify users that they've specified a bad docker image, then cancel the build. Canceling the job and not simply labeling it as failed was necessary or else Jenkins would continuously re-try to create the Kubernetes pod using the bad docker image and fail.  Our solution solves the problem of customers not knowing why their job is stuck in a perpetual limbo due to a bad docker image and not knowing / having permissions to view the kubernetes logs, as well as the problem of jobs being stuck in the aforementioned perpetual waiting state due to a malformed docker image.  We are currently in the process of refining it and will submit a formal PR once that's ready. Any suggestions and comments would be appreciated.

          Pierson Yieh added a comment - - edited

          UPDATE

          Here is the PR with our suggested changes: https://github.com/jenkinsci/kubernetes-plugin/pull/440

          Pierson Yieh added a comment - - edited UPDATE Here is the PR with our suggested changes:  https://github.com/jenkinsci/kubernetes-plugin/pull/440

            pyieh Pierson Yieh
            ataylor Alex Taylor
            Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: