Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-61383

Current new messages detection mechanism is not working for simple line insertion in case of many similar adjacent warning

      Summary

      1)  Warning-ng  marks messages as new although they are outstanding in some cases.

      2)  When it happens more then once it also might mess the age information.

      It seems that adding git forensics might help to improve the detection algorithm. Some thoughts about what can be done will be added as a comment.

       

      The problem 

      As you can see the message at line 9264 was marked as new therefore it's age was reset to 1.

      but all lines were committed at the same time (only part of the line is shown) and were not changed in the last years. 

       

      Moreover every build that the detection fails gets new referenceId for that message and as a result the Age of the warning is reset to 1. 

      As a result after some builds with changes on other lines the age of lines that were committed together might be totally messed up as can be seen below. 

      you can see that for the same commit we get different ages. 

       

      The reason is that those messages were identified (incorrectly) as new along the time and their age was reset to 1 at the time.

       

      see the 502_diff.xlsx attachment for more details 

       

        1. 502_diff.xlsx
          26 kB
        2. blame_cmd2.JPG
          blame_cmd2.JPG
          191 kB
        3. blames_screen.JPG
          blames_screen.JPG
          159 kB
        4. new_report_head.JPG
          new_report_head.JPG
          43 kB

          [JENKINS-61383] Current new messages detection mechanism is not working for simple line insertion in case of many similar adjacent warning

          Why Name added a comment - - edited

          Well, some more info.

          We tried an alternative algorithm.

          All we do is we do diff for each reference file of sorted (by lines) warning to current warning file

          1) The source_file_name, error id, error message, and the source line contents (NOT the line number). 

          This simple algorithm is giving quite excellent results for our case (currently as POC).

          The only 'flaws' that we found

          1) Someone changed the source code line  but did not resolved the issue

          • This will be shown in our algorithm as resolved on one hand and a new error will be reported for the same line on the other hand.
          • For our case this is perfectly ok. It is true that the real error was not resolved. However since someone bothered to touch the line, then from our point of view we can mark it as new, since the code is active it is better to show the error.
          • BUT this triggered a new idea that will be described below.

           2) Reordering content the source file without real change might cause real noise.

          • Well in the POC we did not find such a case in our code base, and we tend to think that this is rare enough and will be accepted by users as known limitation.
          • However once we will find such a case and assuming users will treat it as a bug we will enhance the simple algorithm to overcome it.

           

          New idea:

          The case that the line was changed but the problem exists although acceptable by users is still interesting to be identified for data mining.

          It seems that we would like to add another category to the [new, resolved, outstanding]: touched.

          So resolved means that the problem really solved and touched means that the line was changed therefore the full comparison fails but since the same error type still being reported for the line we will tag it as touched.

           

          So in the case of changed line instead of getting 2 messages: resolved + new. We will get only one : touched 

          We still did not implement it, but soon will.

          What do you think?

           

          Why Name added a comment - - edited Well, some more info. We tried an alternative algorithm. All we do is we do diff for each reference file of sorted (by lines) warning to current warning file 1) The source_file_name, error id, error message, and the source line contents (NOT the line number).  This simple algorithm is giving quite excellent results for our case (currently as POC). The only 'flaws' that we found 1) Someone changed the source code line  but did not resolved the issue This will be shown in our algorithm as resolved on one hand and a new error will be reported for the same line on the other hand. For our case this is perfectly ok. It is true that the real error was not resolved. However since someone bothered to touch the line, then from our point of view we can mark it as new, since the code is active it is better to show the error. BUT this triggered a new idea that will be described below.  2) Reordering content the source file without real change might cause real noise. Well in the POC we did not find such a case in our code base, and we tend to think that this is rare enough and will be accepted by users as known limitation. However once we will find such a case and assuming users will treat it as a bug we will enhance the simple algorithm to overcome it.   New idea: The case that the line was changed but the problem exists although acceptable by users is still interesting to be identified for data mining. It seems that we would like to add another category to the [new, resolved, outstanding] : touched. So resolved means that the problem really solved and touched means that the line was changed therefore the full comparison fails but since the same error type still being reported for the line we will tag it as touched.   So in the case of changed line instead of getting 2 messages: resolved + new. We will get only one : touched  We still did not implement it, but soon will. What do you think?  

          Ulli Hafner added a comment -

          Just a few comments:

          1) The source_file_name, error id, error message, and the source line contents (NOT the line number).

          Isn't that almost the same algorithm I am using? Just without using a multi-line context - just the line with the warning. I think we can make that property configurable.

          Another thing to consider: currently there is no API to access Git (or any other version control system) to get diffs. So if you plan to extend that part it should be a generic approach that still works if nobody is using Git.

          Ulli Hafner added a comment - Just a few comments: 1) The source_file_name, error id, error message, and the source line contents (NOT the line number). Isn't that almost the same algorithm I am using? Just without using a multi-line context - just the line with the warning. I think we can make that property configurable. Another thing to consider: currently there is no API to access Git (or any other version control system) to get diffs. So if you plan to extend that part it should be a generic approach that still works if nobody is using Git.

          Why Name added a comment -

          1) Yes I agree,
               the main difference is that we compare

          • the line and not a range of line
          • we omit the line number from the comparison.  

          It will be really great if you could make it configurable. 

          2) Since we need it quite urgently I will appreciate your estimation when you can release such a fix/ improvement ?

          3) I did not understood your comment about diff. There are many diff tools. We currently use a python lib for diff, but as you mentioned git diff is very strong tool for diff and can be used as command line even if the  source repository  is not  git, see https://stackoverflow.com/a/17194704 for example. 

          Why Name added a comment - 1) Yes I agree,      the main difference is that we compare the line and not a range of line we omit the line number from the comparison.   It will be really great if you could make it configurable.  2) Since we need it quite urgently I will appreciate your estimation when you can release such a fix/ improvement ? 3) I did not understood your comment about diff. There are many diff tools. We currently use a python lib for diff, but as you mentioned git diff is very strong tool for diff and can be used as command line even if the  source repository  is not  git, see https://stackoverflow.com/a/17194704  for example. 

          Ulli Hafner added a comment -

          I created JENKINS-61607 to track the improvement.

          3) I did not understood your comment about diff. There are many diff tools. We currently use a python lib for diff, but as you mentioned git diff is very strong tool for diff and can be used as command line even if the source repository is not git, see https://stackoverflow.com/a/17194704 for example.

          Ah now I understand your idea. This is not possible since there is no previous code available! All we have is the source code of the current build (and the fingerprint hashes of the warnings before).

          Ulli Hafner added a comment - I created JENKINS-61607 to track the improvement. 3) I did not understood your comment about diff. There are many diff tools. We currently use a python lib for diff, but as you mentioned git diff is very strong tool for diff and can be used as command line even if the source repository is not git, see https://stackoverflow.com/a/17194704 for example. Ah now I understand your idea. This is not possible since there is no previous code available! All we have is the source code of the current build (and the fingerprint hashes of the warnings before).

          Why Name added a comment -

          I am not comparing the source code neither the fingerprint.

          I am comparing  the full report of the current warning to the full report of the reference.

           

          Why Name added a comment - I am not comparing the source code neither the fingerprint. I am comparing  the full report of the current warning to the full report of the reference.  

          Ulli Hafner added a comment -

          But how do you then find a warning if you add new lines?

          Example:
          old.file: line 1 and line 3 have the same warning type
          new.file: one empty line is added in line 1, i.e. line 2 and line 4 now have the same warning type

          How should we find the new warnings without inspecting the source code?

          Ulli Hafner added a comment - But how do you then find a warning if you add new lines? Example: old.file: line 1 and line 3 have the same warning type new.file: one empty line is added in line 1, i.e. line 2 and line 4 now have the same warning type How should we find the new warnings without inspecting the source code?

          Why Name added a comment -

          Here is extract from a real example

          new build # category Error id File Line Code
          991100167 Outstanding useInitializationList collector.cpp 64  m_szHotfixName = "";
          991100167 Outstanding useInitializationList collector.cpp 65  m_szBuildTake = "";
          991100167 Outstanding useInitializationList collector.cpp 70  m_szHotfixName = i_szHotfixName;
          991100167 Outstanding useInitializationList collector.cpp 71  m_szBuildTake = i_szBuildTake;
          991100167 Outstanding useInitializationList collector.cpp 125  m_szProductName = "";
          991100167 Outstanding useInitializationList collector.cpp 130  m_szProductName = i_szProductName;
          991100168 Outstanding useInitializationList collector.cpp 64  m_szHotfixName = "";
          991100168 Outstanding useInitializationList collector.cpp 65  m_szBuildTake = "";
          991100168 Outstanding useInitializationList collector.cpp 70  m_szHotfixName = i_szHotfixName;
          991100168 Outstanding useInitializationList collector.cpp 71  m_szBuildTake = i_szBuildTake;
          991100168 Outstanding useInitializationList collector.cpp 129  m_szProductName = "";
          991100168 Outstanding useInitializationList collector.cpp 134  m_szProductName = i_szProductName;

           

          As you can see in build 168 4 lines were added between line 71 to 125 as a result the last 2 warning moved 4 lines down. However it is marked as outstanding.

          What we do currently (in the mean time till we will have a real solution) is the following:
          1) For every build we keep 2 report files - one with the line number and another one without line number
          2) we use diff (since it does smart comparison) between the current and reference report file that does not contain the line number. As you can see since without the line number all the details match the diff will know to handle it (even if in line 128 there will be a new warning that was not in last build).
          [3) Then in the second pass we compare all the details with the line number but without the code itself in order to find "touched" as was explained above]

          Let me know if this is clear enough. if needed we can do a Zoom meeting (It is popular nowadays...

           

           

          Why Name added a comment - Here is extract from a real example new build # category Error id File Line Code 991100167 Outstanding useInitializationList collector.cpp 64  m_szHotfixName = ""; 991100167 Outstanding useInitializationList collector.cpp 65  m_szBuildTake = ""; 991100167 Outstanding useInitializationList collector.cpp 70  m_szHotfixName = i_szHotfixName; 991100167 Outstanding useInitializationList collector.cpp 71  m_szBuildTake = i_szBuildTake; 991100167 Outstanding useInitializationList collector.cpp 125  m_szProductName = ""; 991100167 Outstanding useInitializationList collector.cpp 130  m_szProductName = i_szProductName; 991100168 Outstanding useInitializationList collector.cpp 64  m_szHotfixName = ""; 991100168 Outstanding useInitializationList collector.cpp 65  m_szBuildTake = ""; 991100168 Outstanding useInitializationList collector.cpp 70  m_szHotfixName = i_szHotfixName; 991100168 Outstanding useInitializationList collector.cpp 71  m_szBuildTake = i_szBuildTake; 991100168 Outstanding useInitializationList collector.cpp 129  m_szProductName = ""; 991100168 Outstanding useInitializationList collector.cpp 134  m_szProductName = i_szProductName;   As you can see in build 168 4 lines were added between line 71 to 125 as a result the last 2 warning moved 4 lines down. However it is marked as outstanding. What we do currently (in the mean time till we will have a real solution) is the following: 1) For every build we keep 2 report files - one with the line number and another one without line number 2) we use diff (since it does smart comparison) between the current and reference report file that does not contain the line number. As you can see since without the line number all the details match the diff will know to handle it (even if in line 128 there will be a new warning that was not in last build). [3) Then in the second pass we compare all the details with the line number but without the code itself in order to find "touched" as was explained above] Let me know if this is clear enough. if needed we can do a Zoom meeting (It is popular nowadays... )     

          Ulli Hafner added a comment -

          I think we are mixing two many things in one issue (warnings that are marked as new but are not and the case you are describing in the last issue): from my understanding the last two warnings are actually outstanding, so the current algorithm is working just fine: nothing changed for the two warnings. They are just moved down 4 lines. In your understanding they are not outstanding?

          Maybe it would makes sense if we create some scenarios (maybe based on a unit test) where we can actually discuss the current behavior and the expected behavior. Otherwise I think we've been going in circles... I'm using such a approach to show that using an AST is better to detect new warnings. I think I need to generalize this approach in order to get a good test fixture...

          Ulli Hafner added a comment - I think we are mixing two many things in one issue (warnings that are marked as new but are not and the case you are describing in the last issue): from my understanding the last two warnings are actually outstanding, so the current algorithm is working just fine: nothing changed for the two warnings. They are just moved down 4 lines. In your understanding they are not outstanding? Maybe it would makes sense if we create some scenarios (maybe based on a unit test) where we can actually discuss the current behavior and the expected behavior. Otherwise I think we've been going in circles... I'm using such a approach to show that using an AST is better to detect new warnings. I think I need to generalize this approach in order to get a good test fixture...

          Why Name added a comment - - edited

          Well again if needed we can talk.

          1) I agree that a good test test case is a good idea. Generally I am in favor of tests and TDD.

          2) All this bug report started since I run cppcheck on many build and looked at the new and resolved of the warning-ng. I found many times that the warning ng was confused due to line insertions and reported new on many cases it should not. For example see above at the beginning of the thread.
          3) Since I am not fluent in Java and Jenkins development and I don't have a working setup. We wrote python tool to see if we can prove that the algorithm works kinds of POC.

          4) It seems that it works quite well and does not report the amount of 'false' new as the warning ng. The problem with what we did is that it is not integrated with Jenkins and the nice graph the warning ng shows.

          In other words.

           from my understanding the last two warnings are actually outstanding, so the current algorithm is working just fine

          The output above is from OUR tool that why it shows outstanding.  I am quite sure that the current algorithm will mark it as new (I did not tested this particular example, but I do have a lot of examples that the current algorithm fails and mine works OK).

          The test scenario:

          Have a file with same error type but different code for 10 lines.

          Do a report 

          Add a 10 lines in the middle with 2 lines with the same error type   

          Add one line with different error type

          Do another report 

           

          Why Name added a comment - - edited Well again if needed we can talk. 1) I agree that a good test test case is a good idea. Generally I am in favor of tests and TDD. 2) All this bug report started since I run cppcheck on many build and looked at the new and resolved of the warning-ng. I found many times that the warning ng was confused due to line insertions and reported new on many cases it should not. For example see above at the beginning of the thread. 3) Since I am not fluent in Java and Jenkins development and I don't have a working setup. We wrote python tool to see if we can prove that the algorithm works kinds of POC. 4) It seems that it works quite well and does not report the amount of 'false' new as the warning ng. The problem with what we did is that it is not integrated with Jenkins and the nice graph the warning ng shows. In other words.  from my understanding the last two warnings are actually outstanding, so the current algorithm is working just fine The output above is from OUR tool that why it shows outstanding.  I am quite sure that the current algorithm will mark it as new (I did not tested this particular example, but I do have a lot of examples that the current algorithm fails and mine works OK ). The test scenario: Have a file with same error type but different code for 10 lines. Do a report  Add a 10 lines in the middle with 2 lines with the same error type    Add one line with different error type Do another report   

          Why Name added a comment - - edited

          I looked today and saw that you linked JENKINS-6675 Invalid number of new warnings detected - to this issue.

          I think that the example mentioned there can be a good test, (BTW my proposed algorithm  will act there correctly).

           

          Why Name added a comment - - edited I looked today and saw that you linked  JENKINS-6675  Invalid number of new warnings detected - to this issue. I think that the example mentioned there can be a good test, (BTW my proposed algorithm  will act there correctly).  

            drulli Ulli Hafner
            gainful_gary Why Name
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: