Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-9215

Detect changes by label generates excessive log

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • clearcase-plugin
    • None

      When using the detection of changes by label, lshistory adds a -minor option in order to detect changes based on mklabel.

      However the output generated as a result can be quite excessive and lengthy in time. In our case some of this builds only run after a month and take a lot of time just to check (2-3h) and generates a log with a large size (150-250Mb).

      The time probably can't be solved but the size can if you filter out the information leaving only the 'mklabel' and 'rmlabel' entries that are the ones actually considered for the changelog.

          [JENKINS-9215] Detect changes by label generates excessive log

          Jose Sa added a comment -

          I found a way to do the proper query to find all changes between two time stamps filtering by multiple branches and multiple labels.

          This produces a faster output (minutes instead of hours) and the log just shows only the needed to obtain the changelog (fixing the problem introduced with -minor implementation).

          This is a sample command:

          time cleartool find /Vobs/BETS /Vobs/Spots_3ta \
          -all -follow \
          -version "(created_since(22-feb-11.12:45:33utc+0000) && ! created_since(21-apr-11.17:45:22utc+0000))
                  && (brtype(SPOTS_V14-MNT) || brtype(main))
                  && (lbtype(SPOTS_V14W_BASE_READY) || lbtype(SPOTS_V14M_BASE_READY))" \
          -exec 'cleartool desc -fmt "\"%Nd\" \"%u\" \"%En\" \"%Vn\" \"%e\" \"%o\" \n%c\n" $CLEARCASE_XPN'
          

          Jose Sa added a comment - I found a way to do the proper query to find all changes between two time stamps filtering by multiple branches and multiple labels. This produces a faster output (minutes instead of hours) and the log just shows only the needed to obtain the changelog (fixing the problem introduced with -minor implementation). This is a sample command: time cleartool find /Vobs/BETS /Vobs/Spots_3ta \ -all -follow \ -version "(created_since(22-feb-11.12:45:33utc+0000) && ! created_since(21-apr-11.17:45:22utc+0000)) && (brtype(SPOTS_V14-MNT) || brtype(main)) && (lbtype(SPOTS_V14W_BASE_READY) || lbtype(SPOTS_V14M_BASE_READY))" \ -exec 'cleartool desc -fmt "\"%Nd\" \"%u\" \"%En\" \"%Vn\" \"%e\" \"%o\" \n%c\n" $CLEARCASE_XPN'

          Hi,

          I like your approach, but you will miss a scenario in this way. Cleartool find checks only for versions created since timestamp, not labels created since timestamp. If you track changes by label it is common to have a history like this:

          T=0 create version 1
          T=1 make label A on version 1
          <Jenkins polls by find and makes a build>
          T=2 create version 2
          <Jenkins polls by find and does not make a build, since the label was not applied>
          T=3 remove label A from version 1
          T=4 make label A on version 2
          <Jenkins polls by find since T=2 and does not make a build, because created_since predicate is false>

          Moreover, when tracking changes by a label, it is quite possible that label can be moved from one already existing version to another already existing version. So no new version can be created and the build should still be started due to label move. This also fails to be detected by created_since predicate.

          Krzysztof Malinowski added a comment - Hi, I like your approach, but you will miss a scenario in this way. Cleartool find checks only for versions created since timestamp, not labels created since timestamp. If you track changes by label it is common to have a history like this: T=0 create version 1 T=1 make label A on version 1 <Jenkins polls by find and makes a build> T=2 create version 2 <Jenkins polls by find and does not make a build, since the label was not applied> T=3 remove label A from version 1 T=4 make label A on version 2 <Jenkins polls by find since T=2 and does not make a build, because created_since predicate is false> Moreover, when tracking changes by a label, it is quite possible that label can be moved from one already existing version to another already existing version. So no new version can be created and the build should still be started due to label move. This also fails to be detected by created_since predicate.

          Jose Sa added a comment - - edited

          I guess we will have to live with lshistory... if only it allowed some additional filtering on the source like the find command does.

          Then it will be necessary to change current implementation to ignore everything on the lshistory output that isn't either events of type 'rmlabel' and 'mklabel' and considering only the entries that are referent to the specific labels you want to consider.

          I had to implement a workaround on my server on a cronjob "editing" the existing build log files performing this pruning offline. I got gains of 80% in some cases recovering a total of 30Gb of disk space.

          Here is the content of my bash script as a workaround if anyone is having the same problem as me, until this is fixed in the plugin.

          #!/usr/bin/bash
          
          # Takes a log file as argument and applies lshistory filtering 
          # based on job specific configured labels
          function cc_lshistory_prune_log() {
              local log_file=$1
              local log_new=${log_file}_new
              local log_bak=${log_file}_bak
              local job_dir=$(cd $(dirname ${log_file})/../.. && pwd)
              local job_name=$(basename ${job_dir})
              local config=${job_dir}/config.xml
              local label_line=$(grep "<label>" "${config}" | grep -v "<label></label>")
              [[ ${label_line} =~ "<label>(.*)</label>" ]]
              local labels=${BASH_REMATCH[1]}
              local labels_re=${labels//\ /\|}
          
              # Check if already executed and abort
              if [ -f "${log_bak}" ]; then
                  echo "Aborted. Backup still exists: ${log_bak}"
                  return 0
              fi
          
              gawk '
              /cleartool lshistory/ { 
                  in_lshistory = 1
                  print
              }
              in_lshistory == 1 && /\['${job_name}'\]/ {
                  in_lshistory = 0
              }
              in_lshistory == 1 && /rmlabel|mklabel/ && /'${labels_re}'/ {
                  print
                  next
              }
              in_lshistory == 0 {print}
              ' ${log_file} > ${log_new}
              touch -r ${log_file} ${log_new}
              mv ${log_file} ${log_bak}
              mv ${log_new} ${log_file}
              ls -lh ${log_file}*
          }
          
          # Processes all logs that may need prunning searching 
          # by specific modification time
          function process_all_logs() {
              local mtime=$1
              for config in /opt/hudson/jobs/*/config.xml; do
                  job_dir=$(dirname "$config")
                  job_name=$(basename "${job_dir}")
                  label_line=$(grep "<label>" "$config" | grep -v "<label></label>")
                  [[ $label_line =~ "<label>(.*)</label>" ]]
                  label=${BASH_REMATCH[1]}
                  if [[ "${label}" != "" ]]; then
                      find "${job_dir}" -name log -mtime ${mtime} -print
                  fi
              done | while read logfile; do
                  cc_lshistory_prune_log ${logfile}
              done
          }
          
          ## Main
          export PATH=$PATH:/opt/csw/bin
          #echo $PATH
          if [ -f "$1" ]; then
              cc_lshistory_prune_log "$1"
          else
              # Searches in all possible logs from yesterday
              process_all_logs 1
          fi
          

          EDIT: Updated script content that is currently working every day at 22h00, cleaning all logs of 'yesterday'.

          Jose Sa added a comment - - edited I guess we will have to live with lshistory... if only it allowed some additional filtering on the source like the find command does. Then it will be necessary to change current implementation to ignore everything on the lshistory output that isn't either events of type 'rmlabel' and 'mklabel' and considering only the entries that are referent to the specific labels you want to consider. I had to implement a workaround on my server on a cronjob "editing" the existing build log files performing this pruning offline. I got gains of 80% in some cases recovering a total of 30Gb of disk space. Here is the content of my bash script as a workaround if anyone is having the same problem as me, until this is fixed in the plugin. #!/usr/bin/bash # Takes a log file as argument and applies lshistory filtering # based on job specific configured labels function cc_lshistory_prune_log() { local log_file=$1 local log_new=${log_file}_new local log_bak=${log_file}_bak local job_dir=$(cd $(dirname ${log_file})/../.. && pwd) local job_name=$(basename ${job_dir}) local config=${job_dir}/config.xml local label_line=$(grep "<label>" "${config}" | grep -v "<label></label>") [[ ${label_line} =~ "<label>(.*)</label>" ]] local labels=${BASH_REMATCH[1]} local labels_re=${labels//\ /\|} # Check if already executed and abort if [ -f "${log_bak}" ]; then echo "Aborted. Backup still exists: ${log_bak}" return 0 fi gawk ' /cleartool lshistory/ { in_lshistory = 1 print } in_lshistory == 1 && /\['${job_name}'\]/ { in_lshistory = 0 } in_lshistory == 1 && /rmlabel|mklabel/ && /'${labels_re}'/ { print next } in_lshistory == 0 {print} ' ${log_file} > ${log_new} touch -r ${log_file} ${log_new} mv ${log_file} ${log_bak} mv ${log_new} ${log_file} ls -lh ${log_file}* } # Processes all logs that may need prunning searching # by specific modification time function process_all_logs() { local mtime=$1 for config in /opt/hudson/jobs/*/config.xml; do job_dir=$(dirname "$config") job_name=$(basename "${job_dir}") label_line=$(grep "<label>" "$config" | grep -v "<label></label>") [[ $label_line =~ "<label>(.*)</label>" ]] label=${BASH_REMATCH[1]} if [[ "${label}" != "" ]]; then find "${job_dir}" -name log -mtime ${mtime} -print fi done | while read logfile; do cc_lshistory_prune_log ${logfile} done } ## Main export PATH=$PATH:/opt/csw/bin #echo $PATH if [ -f "$1" ]; then cc_lshistory_prune_log "$1" else # Searches in all possible logs from yesterday process_all_logs 1 fi EDIT: Updated script content that is currently working every day at 22h00, cleaning all logs of 'yesterday'.

          Jose Sa added a comment -

          I've created an RFE in IBM that hopefully will give us faster feedback when polling with Labels:
          http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=10850

          Jose Sa added a comment - I've created an RFE in IBM that hopefully will give us faster feedback when polling with Labels: http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=10850

            Unassigned Unassigned
            josesa Jose Sa
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: