Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-65930

Feature to utilise build number from a job instead of agent ID when creating Cloudwatch logs

      As an engineer

      I want an option to utilise build number from job instead of agent ID in the creation of logs

      So that the agent logs are more easily identifiable from a build.

       

      Background:

      When using k8s slaves through the k8s cloud plugin the agent ID is ephemeral and not durable across reboots. Agents are created on demand by the k8s plugin and are given and ordinal number by Jenkins which resets to 1 on each restart meaning that the more durable log state in Cloudwatch becomes confused where the next agent number may be less than the previous one if the Jenkins master has been restarted.

       

      Adding a feature to use the BUILD_NUMBER instead of the agent ID makes the logs predictable and easily referrable back to jenkins and across logs created by the master.

       

      This is a major benefit if staff are not provided with access to the Jenkins server directly if it is used for GitOps kinds of operations and it is then trivial to generate a URL in the log with a predicable URL based upon the job name and the BUILD ID, something which can then be sent to a message channel to gain feedback if the user only has access to the message channel and the Cloudwatch logs.

          [JENKINS-65930] Feature to utilise build number from a job instead of agent ID when creating Cloudwatch logs

          Jesse Glick added a comment -

          I am not quite sure what you are talking about here. Can you give an example to make this more concrete?

          Jesse Glick added a comment - I am not quite sure what you are talking about here. Can you give an example to make this more concrete?

          Andy added a comment - - edited

          new job created called problem.demo

           

           

          podTemplate(
            containers: [ containerTemplate(name: 'busybox', image: 'busybox:latest', ttyEnabled: true, command: 'cat') ],
            cloud: 'EKS',
            namespace: 'default'
          ) {
            node(POD_LABEL) {
              stage('First') {
                container('busybox') {
                  stage('Test') {
                    sh 'whoami'
                  }
                }
              }
            }
          } 

           

          RUN 1

          The job is run for the first time and creates an event stream called problem.demo@master with each event tagged with the build ID for the first run (1)

          k8s plugin generates an ephemeral slave for this task problem-demo-1-8q8rr-pchxh-2vgbm

          Additionally within the log it is recorded that build 1 was performed on node 5 (note this is not a newly booted jenkins instance)

          { "message": "", "build": "1", "node": "5" }

          We also see at this point, a newly created logstream called problem.demo@agent1 (noting that this doesn't actuallt match the node ID we saw in the master log).

          RUN 2

          The job is simply run again resulting in a new set of events within the problem.demo@master log stream tagged with build 2 and we also see this event.

          k8s plugin generates a slave for this task problem-demo-2-1jh4b-cxqvm-w5w0q

          { "message": "", "build": "2", "node": "5" }

          Agent log stream at this point is created as problem.demo@agent2 and it would APPEAR at first glance this was following the build number.

          RESTART JENKINS MASTER at this point.

          RUN 3

          The job is run again now and appears in the master log with events tagged with build 3, logging this message too

          { "message": "", "build": "3", "node": "5" }

          k8s plugin generates a slave for this task problem-demo-3-5x943-m9fx9-lq0lx

          At this point one notes that rather than an agent log problem.demo@agent2 being used, instead the log stream problem.demo@agent1 is used again for events.

          It appears that the index added to the agent is based upon some internal state of the plugin which is somewhat unpredictable from any information available in the build so simply knowing JOB ID and BUILD number is not enough to be able to generate a direct link to a log stream based upon the build number itself.

          Andy added a comment - - edited new job created called problem.demo     podTemplate( containers: [ containerTemplate(name: 'busybox' , image: 'busybox:latest' , ttyEnabled: true , command: 'cat' ) ], cloud: 'EKS' , namespace: ' default ' ) { node(POD_LABEL) { stage( 'First' ) { container( 'busybox' ) { stage( 'Test' ) { sh 'whoami' } } } } }   RUN 1 The job is run for the first time and creates an event stream called problem.demo@master with each event tagged with the build ID for the first run (1) k8s plugin generates an ephemeral slave for this task problem-demo-1-8q8rr-pchxh-2vgbm Additionally within the log it is recorded that build 1 was performed on node 5 (note this is not a newly booted jenkins instance) { "message": "", "build": "1", "node": "5" } We also see at this point, a newly created logstream called problem.demo@agent1 (noting that this doesn't actuallt match the node ID we saw in the master log). RUN 2 The job is simply run again resulting in a new set of events within the problem.demo@master log stream tagged with build 2 and we also see this event. k8s plugin generates a slave for this task problem-demo-2-1jh4b-cxqvm-w5w0q { "message": "", "build": "2", "node": "5" } Agent log stream at this point is created as problem.demo@agent2 and it would APPEAR at first glance this was following the build number. RESTART JENKINS MASTER at this point. RUN 3 The job is run again now and appears in the master log with events tagged with build 3, logging this message too { "message": "", "build": "3", "node": "5" } k8s plugin generates a slave for this task problem-demo-3-5x943-m9fx9-lq0lx At this point one notes that rather than an agent log problem.demo@agent2 being used, instead the log stream problem.demo@agent1 is used again for events. It appears that the index added to the agent is based upon some internal state of the plugin which is somewhat unpredictable from any information available in the build so simply knowing JOB ID and BUILD number is not enough to be able to generate a direct link to a log stream based upon the build number itself.

          Jesse Glick added a comment -

          node in the JSON is totally unrelated to agents. This is flow graph node, representing Pipeline structure. Meaningless for direct inspection, but used e.g. for Blue Ocean display, and rendering structure annotations in the classic build log.

          The @agent* log stream names are just based on counting the number of agents attached to Jenkins that were used for logging by this plugin. (LogStreamState.MasterState.agentLogStreamNames) It has nothing to do with the build number. In fact when a static agent pool is in use, this number will not grow much even if there are many builds; but on the other hand, if more than one node block is used in a given build in parallel, there will certainly be more than one stream. The constraint is that a given agent used for a given job needs to be given exclusive access to some log stream.

          The plugin does use a NotifyShutdown event which is supposed to recycle stream names, I think at the end of a build. Maybe that is not being called aggressively enough, so the number keeps going up even when builds complete and ephemeral agents are deleted?

          Anyway, this is all basically supposed to be an implementation detail: the effective log is jobName@<anything> filtered by the build field in JSON. It uses a DescribeLogStreamsRequest with a logStreamNamePrefix to find the relevant streams, then a FilterLogEventsRequest using interleaved to aggregate them logically. The suffix on the log stream name is there for debugging, not something to be relied on in any way.

          At the top of the classic build log the plugin will display a https://console.aws.amazon.com/cloudwatch/home#logEventViewer:… URL which for now is limited to @master events because I could not find a way to create a GUI permalink that works as nicely as the APIs do. It does filter by build, it just does not show agent-originated messages for now.

          Jesse Glick added a comment - node in the JSON is totally unrelated to agents. This is flow graph node, representing Pipeline structure. Meaningless for direct inspection, but used e.g. for Blue Ocean display, and rendering structure annotations in the classic build log. The @agent* log stream names are just based on counting the number of agents attached to Jenkins that were used for logging by this plugin. ( LogStreamState.MasterState.agentLogStreamNames ) It has nothing to do with the build number. In fact when a static agent pool is in use, this number will not grow much even if there are many builds; but on the other hand, if more than one node block is used in a given build in parallel, there will certainly be more than one stream. The constraint is that a given agent used for a given job needs to be given exclusive access to some log stream. The plugin does use a NotifyShutdown event which is supposed to recycle stream names, I think at the end of a build. Maybe that is not being called aggressively enough, so the number keeps going up even when builds complete and ephemeral agents are deleted? Anyway, this is all basically supposed to be an implementation detail: the effective log is jobName@<anything> filtered by the build field in JSON. It uses a DescribeLogStreamsRequest with a logStreamNamePrefix to find the relevant streams, then a FilterLogEventsRequest using interleaved to aggregate them logically. The suffix on the log stream name is there for debugging, not something to be relied on in any way. At the top of the classic build log the plugin will display a https: //console.aws.amazon.com/cloudwatch/home#logEventViewer:… URL which for now is limited to @master events because I could not find a way to create a GUI permalink that works as nicely as the APIs do. It does filter by build, it just does not show agent-originated messages for now.

          Jesse Glick added a comment -

          Since the AWS console view is not flexible enough to show all logs from a given build with a single query (last I checked), and anyway shows lots of visual noise, if you care about creating permalinks that would bypass Jenkins I would recommend creating some sort of microservice that checks read permission to a log stream and makes the above queries, or even doing this client-side with some manner of authentication TBD.

          Jesse Glick added a comment - Since the AWS console view is not flexible enough to show all logs from a given build with a single query (last I checked), and anyway shows lots of visual noise, if you care about creating permalinks that would bypass Jenkins I would recommend creating some sort of microservice that checks read permission to a log stream and makes the above queries, or even doing this client-side with some manner of authentication TBD.

          Andy added a comment - - edited

          So, in this case, where we know there is some re-use, and possible more re-use of a suffix to agent* which makes it difficult to correlate from the master job (since master says running on agent1 for example which could be re-used) would it not be more sensible to choose a scheme where the agent* didn't rely on an ordinal but something unique like a UUID? In this manner one could interrogate the predictable log for the master to get the more unique agent* value and use that to discern the specific agent log?

          Andy added a comment - - edited So, in this case, where we know there is some re-use, and possible more re-use of a suffix to agent* which makes it difficult to correlate from the master job (since master says running on agent1 for example which could be re-used) would it not be more sensible to choose a scheme where the agent* didn't rely on an ordinal but something unique like a UUID? In this manner one could interrogate the predictable log for the master to get the more unique agent* value and use that to discern the specific agent log?

          Jesse Glick added a comment -

          one could interrogate the predictable log for the master to get the more unique agent* value

          You could, but if you are going to be building custom tools to display the log anyway, just use DescribeLogStreamsRequest + FilterLogEventsRequest to get the full log with any number of agents displayed in the proper context. Probably something that could be developed in a few lines of <language of your choice> in Lambda, I am just not sure offhand what the right way is to authenticate a user (maybe varies depending on your local needs).

          Jesse Glick added a comment - one could interrogate the predictable log for the master to get the more unique agent* value You could, but if you are going to be building custom tools to display the log anyway, just use DescribeLogStreamsRequest + FilterLogEventsRequest to get the full log with any number of agents displayed in the proper context. Probably something that could be developed in a few lines of <language of your choice> in Lambda, I am just not sure offhand what the right way is to authenticate a user (maybe varies depending on your local needs).

            jglick Jesse Glick
            iamasmith Andy
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: