When using the detection of changes by label, lshistory adds a -minor option in order to detect changes based on mklabel.
However the output generated as a result can be quite excessive and lengthy in time. In our case some of this builds only run after a month and take a lot of time just to check (2-3h) and generates a log with a large size (150-250Mb).
The time probably can't be solved but the size can if you filter out the information leaving only the 'mklabel' and 'rmlabel' entries that are the ones actually considered for the changelog.
I found a way to do the proper query to find all changes between two time stamps filtering by multiple branches and multiple labels.
This produces a faster output (minutes instead of hours) and the log just shows only the needed to obtain the changelog (fixing the problem introduced with -minor implementation).
Jose Sa
added a comment - I found a way to do the proper query to find all changes between two time stamps filtering by multiple branches and multiple labels.
This produces a faster output (minutes instead of hours) and the log just shows only the needed to obtain the changelog (fixing the problem introduced with -minor implementation).
This is a sample command:
time cleartool find /Vobs/BETS /Vobs/Spots_3ta \
-all -follow \
-version "(created_since(22-feb-11.12:45:33utc+0000) && ! created_since(21-apr-11.17:45:22utc+0000))
&& (brtype(SPOTS_V14-MNT) || brtype(main))
&& (lbtype(SPOTS_V14W_BASE_READY) || lbtype(SPOTS_V14M_BASE_READY))" \
-exec 'cleartool desc -fmt "\"%Nd\" \"%u\" \"%En\" \"%Vn\" \"%e\" \"%o\" \n%c\n" $CLEARCASE_XPN'
I like your approach, but you will miss a scenario in this way. Cleartool find checks only for versions created since timestamp, not labels created since timestamp. If you track changes by label it is common to have a history like this:
T=0 create version 1
T=1 make label A on version 1
<Jenkins polls by find and makes a build>
T=2 create version 2
<Jenkins polls by find and does not make a build, since the label was not applied>
T=3 remove label A from version 1
T=4 make label A on version 2
<Jenkins polls by find since T=2 and does not make a build, because created_since predicate is false>
Moreover, when tracking changes by a label, it is quite possible that label can be moved from one already existing version to another already existing version. So no new version can be created and the build should still be started due to label move. This also fails to be detected by created_since predicate.
Krzysztof Malinowski
added a comment - Hi,
I like your approach, but you will miss a scenario in this way. Cleartool find checks only for versions created since timestamp, not labels created since timestamp. If you track changes by label it is common to have a history like this:
T=0 create version 1
T=1 make label A on version 1
<Jenkins polls by find and makes a build>
T=2 create version 2
<Jenkins polls by find and does not make a build, since the label was not applied>
T=3 remove label A from version 1
T=4 make label A on version 2
<Jenkins polls by find since T=2 and does not make a build, because created_since predicate is false>
Moreover, when tracking changes by a label, it is quite possible that label can be moved from one already existing version to another already existing version. So no new version can be created and the build should still be started due to label move. This also fails to be detected by created_since predicate.
I guess we will have to live with lshistory... if only it allowed some additional filtering on the source like the find command does.
Then it will be necessary to change current implementation to ignore everything on the lshistory output that isn't either events of type 'rmlabel' and 'mklabel' and considering only the entries that are referent to the specific labels you want to consider.
I had to implement a workaround on my server on a cronjob "editing" the existing build log files performing this pruning offline. I got gains of 80% in some cases recovering a total of 30Gb of disk space.
Here is the content of my bash script as a workaround if anyone is having the same problem as me, until this is fixed in the plugin.
#!/usr/bin/bash
# Takes a log file as argument and applies lshistory filtering
# based on job specific configured labels
function cc_lshistory_prune_log() {
local log_file=$1
local log_new=${log_file}_new
local log_bak=${log_file}_bak
local job_dir=$(cd $(dirname ${log_file})/../.. && pwd)
local job_name=$(basename ${job_dir})
local config=${job_dir}/config.xml
local label_line=$(grep "<label>" "${config}" | grep -v "<label></label>")
[[ ${label_line} =~ "<label>(.*)</label>" ]]
local labels=${BASH_REMATCH[1]}
local labels_re=${labels//\ /\|}
# Check if already executed and abort
if [ -f "${log_bak}" ]; then
echo "Aborted. Backup still exists: ${log_bak}"
return 0
fi
gawk '
/cleartool lshistory/ {
in_lshistory = 1
print
}
in_lshistory == 1 && /\['${job_name}'\]/ {
in_lshistory = 0
}
in_lshistory == 1 && /rmlabel|mklabel/ && /'${labels_re}'/ {
print
next
}
in_lshistory == 0 {print}
' ${log_file} > ${log_new}
touch -r ${log_file} ${log_new}
mv ${log_file} ${log_bak}
mv ${log_new} ${log_file}
ls -lh ${log_file}*
}
# Processes all logs that may need prunning searching
# by specific modification time
function process_all_logs() {
local mtime=$1
for config in /opt/hudson/jobs/*/config.xml; do
job_dir=$(dirname "$config")
job_name=$(basename "${job_dir}")
label_line=$(grep "<label>" "$config" | grep -v "<label></label>")
[[ $label_line =~ "<label>(.*)</label>" ]]
label=${BASH_REMATCH[1]}
if [[ "${label}" != "" ]]; then
find "${job_dir}" -name log -mtime ${mtime} -print
fi
done | while read logfile; do
cc_lshistory_prune_log ${logfile}
done
}
## Main
export PATH=$PATH:/opt/csw/bin
#echo $PATH
if [ -f "$1" ]; then
cc_lshistory_prune_log "$1"
else
# Searches in all possible logs from yesterday
process_all_logs 1
fi
EDIT: Updated script content that is currently working every day at 22h00, cleaning all logs of 'yesterday'.
Jose Sa
added a comment - - edited I guess we will have to live with lshistory... if only it allowed some additional filtering on the source like the find command does.
Then it will be necessary to change current implementation to ignore everything on the lshistory output that isn't either events of type 'rmlabel' and 'mklabel' and considering only the entries that are referent to the specific labels you want to consider.
I had to implement a workaround on my server on a cronjob "editing" the existing build log files performing this pruning offline. I got gains of 80% in some cases recovering a total of 30Gb of disk space.
Here is the content of my bash script as a workaround if anyone is having the same problem as me, until this is fixed in the plugin.
#!/usr/bin/bash
# Takes a log file as argument and applies lshistory filtering
# based on job specific configured labels
function cc_lshistory_prune_log() {
local log_file=$1
local log_new=${log_file}_new
local log_bak=${log_file}_bak
local job_dir=$(cd $(dirname ${log_file})/../.. && pwd)
local job_name=$(basename ${job_dir})
local config=${job_dir}/config.xml
local label_line=$(grep "<label>" "${config}" | grep -v "<label></label>")
[[ ${label_line} =~ "<label>(.*)</label>" ]]
local labels=${BASH_REMATCH[1]}
local labels_re=${labels//\ /\|}
# Check if already executed and abort
if [ -f "${log_bak}" ]; then
echo "Aborted. Backup still exists: ${log_bak}"
return 0
fi
gawk '
/cleartool lshistory/ {
in_lshistory = 1
print
}
in_lshistory == 1 && /\['${job_name}'\]/ {
in_lshistory = 0
}
in_lshistory == 1 && /rmlabel|mklabel/ && /'${labels_re}'/ {
print
next
}
in_lshistory == 0 {print}
' ${log_file} > ${log_new}
touch -r ${log_file} ${log_new}
mv ${log_file} ${log_bak}
mv ${log_new} ${log_file}
ls -lh ${log_file}*
}
# Processes all logs that may need prunning searching
# by specific modification time
function process_all_logs() {
local mtime=$1
for config in /opt/hudson/jobs/*/config.xml; do
job_dir=$(dirname "$config")
job_name=$(basename "${job_dir}")
label_line=$(grep "<label>" "$config" | grep -v "<label></label>")
[[ $label_line =~ "<label>(.*)</label>" ]]
label=${BASH_REMATCH[1]}
if [[ "${label}" != "" ]]; then
find "${job_dir}" -name log -mtime ${mtime} -print
fi
done | while read logfile; do
cc_lshistory_prune_log ${logfile}
done
}
## Main
export PATH=$PATH:/opt/csw/bin
#echo $PATH
if [ -f "$1" ]; then
cc_lshistory_prune_log "$1"
else
# Searches in all possible logs from yesterday
process_all_logs 1
fi
EDIT: Updated script content that is currently working every day at 22h00, cleaning all logs of 'yesterday'.
Jose Sa
added a comment - I've created an RFE in IBM that hopefully will give us faster feedback when polling with Labels:
http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=10850
[{"id":-1,"name":"My open issues","jql":"assignee = currentUser() AND resolution = Unresolved order by updated DESC","isSystem":true,"sharePermissions":[],"requiresLogin":true},{"id":-2,"name":"Reported by me","jql":"reporter = currentUser() order by created DESC","isSystem":true,"sharePermissions":[],"requiresLogin":true},{"id":-4,"name":"All issues","jql":"order by created DESC","isSystem":true,"sharePermissions":[],"requiresLogin":false},{"id":-5,"name":"Open issues","jql":"resolution = Unresolved order by priority DESC,updated DESC","isSystem":true,"sharePermissions":[],"requiresLogin":false},{"id":-9,"name":"Done issues","jql":"statusCategory = Done order by updated DESC","isSystem":true,"sharePermissions":[],"requiresLogin":false},{"id":-3,"name":"Viewed recently","jql":"issuekey in issueHistory() order by lastViewed DESC","isSystem":true,"sharePermissions":[],"requiresLogin":false},{"id":-6,"name":"Created recently","jql":"created >= -1w order by created DESC","isSystem":true,"sharePermissions":[],"requiresLogin":false},{"id":-7,"name":"Resolved recently","jql":"resolutiondate >= -1w order by updated DESC","isSystem":true,"sharePermissions":[],"requiresLogin":false},{"id":-8,"name":"Updated recently","jql":"updated >= -1w order by updated DESC","isSystem":true,"sharePermissions":[],"requiresLogin":false}]
I found a way to do the proper query to find all changes between two time stamps filtering by multiple branches and multiple labels.
This produces a faster output (minutes instead of hours) and the log just shows only the needed to obtain the changelog (fixing the problem introduced with -minor implementation).
This is a sample command: