Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-5185

Improve performance of parsers for long log files

    XMLWordPrintable

Details

    Description

      Currently, the performance when parsing the log files is poor for some parsers. Maybe we should use another regexp library or strip off some text from the log before starting the parser.

      Here are some performance results on a log file of 840 lines:

      AcuCobol Compiler: 2439ms
      Ada Compiler (gnat): 21ms
      Buckminster Compiler: 12ms
      Coolflux DSP Compiler: 959ms
      Doxygen: 18ms
      Eclipse Java Compiler: 19ms
      Erlang Compiler: 11ms
      Flex SDK Compilers (compc & mxmlc): 113ms
      GNU compiler (gcc): 710ms
      GNU compiler 4 (gcc): 19ms
      GNU compiler 4 (ld): 700ms
      IAR compiler (C/C++): 14ms
      Intel compiler: 977ms
      Java Compiler: 137ms
      JavaDoc: 17ms
      MSBuild: 4491ms
      Oracle Invalids: 53ms
      PC-Lint: 4678ms
      PHP Runtime Warning: 1569ms
      Perforce Compiler: 2531ms
      Robocopy (please use /V in your commands!): 1733ms
      SUN C++ Compiler: 14ms
      Texas Instruments Code Composer Studio (C/C++): 3ms

      Here are the results after the optimzation:

      AcuCobol Compiler: 15ms
      Ada Compiler (gnat): 27ms
      Buckminster Compiler: 19ms
      Coolflux DSP Compiler: 2ms
      Doxygen: 18ms
      Eclipse Java Compiler: 29ms
      Erlang Compiler: 24ms
      Flex SDK Compilers (compc & mxmlc): 6ms
      GNU compiler (gcc): 96ms
      GNU compiler 4 (gcc): 20ms
      GNU compiler 4 (ld): 68ms
      IAR compiler (C/C++): 19ms
      Intel compiler: 5ms
      Java Compiler: 53ms
      JavaDoc: 13ms
      MSBuild: 72ms
      Oracle Invalids: 6ms
      PC-Lint: 99ms
      PHP Runtime Warning: 2ms
      Perforce Compiler: 3ms
      Robocopy (please use /V in your commands!): 2ms
      SUN C++ Compiler: 4ms
      Texas Instruments Code Composer Studio (C/C++): 3ms

      Attachments

        Activity

          gorrus gorrus added a comment -

          We have java compile log ~10Mb and looking for warnings only in certain packages (via "Warnings to include"), still takes ~ 3 minutes to scan. Can this be speedup somehow?

          gorrus gorrus added a comment - We have java compile log ~10Mb and looking for warnings only in certain packages (via "Warnings to include"), still takes ~ 3 minutes to scan. Can this be speedup somehow?
          drulli Ulli Hafner added a comment -

          Which parser are you using?

          drulli Ulli Hafner added a comment - Which parser are you using?
          aleksas aleksas added a comment -

          What about parsing msbuild? Is it possible to speed it up somehow? My logs are over 100M so it takes a while to parse them.

          aleksas aleksas added a comment - What about parsing msbuild? Is it possible to speed it up somehow? My logs are over 100M so it takes a while to parse them.
          drulli Ulli Hafner added a comment -

          I think there are 2 optimizations: improve the regular expressions or reduce the size of the file to parse.

          A solution for the second approach:

          • Add a kind of 'grep' step to the parser that filters the log. The interesting question would be what kind of expression to use. E.g., when I'm using only lines that contain the strings 'error' or 'warning' then the scanning should be much faster. However, currently some warnings don't contain these strings, these will then not be picked.
          • Or I can introduce some start and end tags and only the log part that is in between will be scanned

          Any other ideas?

          For the first approach:

          • maybe a different regexp library will help.
          drulli Ulli Hafner added a comment - I think there are 2 optimizations: improve the regular expressions or reduce the size of the file to parse. A solution for the second approach: Add a kind of 'grep' step to the parser that filters the log. The interesting question would be what kind of expression to use. E.g., when I'm using only lines that contain the strings 'error' or 'warning' then the scanning should be much faster. However, currently some warnings don't contain these strings, these will then not be picked. Or I can introduce some start and end tags and only the log part that is in between will be scanned Any other ideas? For the first approach: maybe a different regexp library will help.
          sdirector Monty Taylor added a comment -

          I'm experiencing this as well, using the gcc 4 parser on the output of building drizzle with all warnings on. (It's been running for about 4 hours now and still isn't done)

          Feel free to check out: http://hudson.drizzle.org/job/drizzle-build-all-warnings/

          If you want to poke further, I'd be happy to give you whatever access you'd like.

          sdirector Monty Taylor added a comment - I'm experiencing this as well, using the gcc 4 parser on the output of building drizzle with all warnings on. (It's been running for about 4 hours now and still isn't done) Feel free to check out: http://hudson.drizzle.org/job/drizzle-build-all-warnings/ If you want to poke further, I'd be happy to give you whatever access you'd like.
          sdirector Monty Taylor added a comment -

          I should add: our build log being parsed winds up being 41M.

          sdirector Monty Taylor added a comment - I should add: our build log being parsed winds up being 41M.

          Code changed in hudson
          User: : drulli
          Path:
          trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/AntJavacParser.java
          trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/CoolfluxChessccParser.java
          trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/FlexSDKParser.java
          trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/IntelCParser.java
          trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/JavacParser.java
          trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/PhpParser.java
          trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/RegexpLineParser.java
          http://jenkins-ci.org/commit/31225
          Log:
          JENKINS-5185 Added a string matcher before a line is parsed by regular expression.

          scm_issue_link SCM/JIRA link daemon added a comment - Code changed in hudson User: : drulli Path: trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/AntJavacParser.java trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/CoolfluxChessccParser.java trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/FlexSDKParser.java trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/IntelCParser.java trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/JavacParser.java trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/PhpParser.java trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/RegexpLineParser.java http://jenkins-ci.org/commit/31225 Log: JENKINS-5185 Added a string matcher before a line is parsed by regular expression.
          drulli Ulli Hafner added a comment - - edited

          Speedup for the following parsers:

          AcuCobol Compiler: 154ms
          Coolflux DSP Compiler: 45ms
          Flex SDK Compilers (compc & mxmlc): 4ms
          Intel compiler: 231ms
          Java Compiler: 37ms
          PHP Runtime Warning: 2ms

          drulli Ulli Hafner added a comment - - edited Speedup for the following parsers: AcuCobol Compiler: 154ms Coolflux DSP Compiler: 45ms Flex SDK Compilers (compc & mxmlc): 4ms Intel compiler: 231ms Java Compiler: 37ms PHP Runtime Warning: 2ms

          Code changed in hudson
          User: : drulli
          Path:
          trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/AcuCobolParser.java
          http://jenkins-ci.org/commit/31226
          Log:
          JENKINS-5185 Added a string matcher before a line is parsed by regular expression.

          scm_issue_link SCM/JIRA link daemon added a comment - Code changed in hudson User: : drulli Path: trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/AcuCobolParser.java http://jenkins-ci.org/commit/31226 Log: JENKINS-5185 Added a string matcher before a line is parsed by regular expression.
          drulli Ulli Hafner added a comment - - edited

          The Java parser still can be split into a maven and non-maven parser to reduce the time by one half.

          The following parsers are still critical:

          GNU compiler (gcc): 754ms
          GNU compiler 4 (ld): 740ms
          MSBuild: 4936ms
          Perforce Compiler: 2175ms
          Robocopy (please use /V in your commands!): 2251ms

          If someone finds some key words (e.g., "Warning") or file extensions (e.g., ".java") that must exist on a single line with a warning please let me know then I can add such a sting comparison before computing the time intensive regular expression scan.

          drulli Ulli Hafner added a comment - - edited The Java parser still can be split into a maven and non-maven parser to reduce the time by one half. The following parsers are still critical: GNU compiler (gcc): 754ms GNU compiler 4 (ld): 740ms MSBuild: 4936ms Perforce Compiler: 2175ms Robocopy (please use /V in your commands!): 2251ms If someone finds some key words (e.g., "Warning") or file extensions (e.g., ".java") that must exist on a single line with a warning please let me know then I can add such a sting comparison before computing the time intensive regular expression scan.
          deccico deccico added a comment -

          Hi, I will take a look on Perforce and Robocopy parsers.

          deccico deccico added a comment - Hi, I will take a look on Perforce and Robocopy parsers.

          Code changed in hudson
          User: : deccico
          Path:
          trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/P4Parser.java
          trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/RobocopyParser.java
          trunk/hudson/plugins/warnings/src/test/java/hudson/plugins/warnings/parser/P4ParserTest.java
          trunk/hudson/plugins/warnings/src/test/resources/hudson/plugins/warnings/parser/all.txt
          trunk/hudson/plugins/warnings/src/test/resources/hudson/plugins/warnings/parser/perforce.txt
          http://jenkins-ci.org/commit/31227
          Log:
          Improving test cases for Perforce parser and trying to speed Perforce and Robocopy parsers as per #JENKINS-5185

          scm_issue_link SCM/JIRA link daemon added a comment - Code changed in hudson User: : deccico Path: trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/P4Parser.java trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/RobocopyParser.java trunk/hudson/plugins/warnings/src/test/java/hudson/plugins/warnings/parser/P4ParserTest.java trunk/hudson/plugins/warnings/src/test/resources/hudson/plugins/warnings/parser/all.txt trunk/hudson/plugins/warnings/src/test/resources/hudson/plugins/warnings/parser/perforce.txt http://jenkins-ci.org/commit/31227 Log: Improving test cases for Perforce parser and trying to speed Perforce and Robocopy parsers as per # JENKINS-5185

          Code changed in hudson
          User: : fchateau
          Path:
          trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/AcuCobolParser.java
          trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/AntJavacParser.java
          trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/CoolfluxChessccParser.java
          trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/Gcc4LinkerParser.java
          trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/GccParser.java
          trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/IntelCParser.java
          trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/InvalidsParser.java
          trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/JavaDocParser.java
          trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/JavacParser.java
          trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/MsBuildParser.java
          trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/P4Parser.java
          trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/PhpParser.java
          trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/RobocopyParser.java
          trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/SunCParser.java
          http://jenkins-ci.org/commit/31229
          Log:
          JENKINS-5185: Fixed. Added beginning-of-line '^' and end-of-line '$' anchors to all regular expressions. Just putting these marks achieves a tremendous speedup (which is logical because it decrease algorithm complexity by one degree).

          scm_issue_link SCM/JIRA link daemon added a comment - Code changed in hudson User: : fchateau Path: trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/AcuCobolParser.java trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/AntJavacParser.java trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/CoolfluxChessccParser.java trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/Gcc4LinkerParser.java trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/GccParser.java trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/IntelCParser.java trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/InvalidsParser.java trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/JavaDocParser.java trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/JavacParser.java trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/MsBuildParser.java trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/P4Parser.java trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/PhpParser.java trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/RobocopyParser.java trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/SunCParser.java http://jenkins-ci.org/commit/31229 Log: JENKINS-5185 : Fixed. Added beginning-of-line '^' and end-of-line '$' anchors to all regular expressions. Just putting these marks achieves a tremendous speedup (which is logical because it decrease algorithm complexity by one degree).
          fchateau fchateau added a comment -

          I just fixed this issue.
          The problem was occuring because most regular expressions were far too permissive. The matched substring could just begin anywhere in the line !!!
          By anchoring the regular expression to the beginning of a line, the complexity of regular expression matching is decreased by a factor of n (n being the length of the string). In other words: the regexp engine doesn't have to try matching the regexp by starting at each character of the line one after another !

          In the future we should check that every patterns begins by ^ and ends by $, and that there are no pipes '|' at the topmost level. Indeed, anchors do not enclose all alternatives if you don't put them into a group.
          In other words:
          ^a|b$ is wrong
          ^a$|^b$ is good, but redundant
          ^(?:a|b)$ is better

          fchateau fchateau added a comment - I just fixed this issue. The problem was occuring because most regular expressions were far too permissive. The matched substring could just begin anywhere in the line !!! By anchoring the regular expression to the beginning of a line, the complexity of regular expression matching is decreased by a factor of n (n being the length of the string). In other words: the regexp engine doesn't have to try matching the regexp by starting at each character of the line one after another ! In the future we should check that every patterns begins by ^ and ends by $, and that there are no pipes '|' at the topmost level. Indeed, anchors do not enclose all alternatives if you don't put them into a group. In other words: ^a|b$ is wrong ^a$|^b$ is good, but redundant ^(?:a|b)$ is better
          drulli Ulli Hafner added a comment -

          Thanks for improving the regular expressions!

          drulli Ulli Hafner added a comment - Thanks for improving the regular expressions!
          drulli Ulli Hafner added a comment -

          Integrated in Hudson Plug-ins #70
          JENKINS-5185: Fixed. Added beginning-of-line '^' and end-of-line '$' anchors to all regular expressions. Just putting these marks achieves a tremendous speedup (which is logical because it decrease algorithm complexity by one degree).

          drulli Ulli Hafner added a comment - Integrated in Hudson Plug-ins #70 JENKINS-5185 : Fixed. Added beginning-of-line '^' and end-of-line '$' anchors to all regular expressions. Just putting these marks achieves a tremendous speedup (which is logical because it decrease algorithm complexity by one degree).

          Code changed in hudson
          User: : fchateau
          Path:
          trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/DoxygenParser.java
          trunk/hudson/plugins/warnings/src/test/java/hudson/plugins/warnings/parser/DoxygenParserTest.java
          http://jenkins-ci.org/commit/31336
          Log:
          JENKINS-5185: Fixed CheckStyle warnings introduced by [31231].

          scm_issue_link SCM/JIRA link daemon added a comment - Code changed in hudson User: : fchateau Path: trunk/hudson/plugins/warnings/src/main/java/hudson/plugins/warnings/parser/DoxygenParser.java trunk/hudson/plugins/warnings/src/test/java/hudson/plugins/warnings/parser/DoxygenParserTest.java http://jenkins-ci.org/commit/31336 Log: JENKINS-5185 : Fixed CheckStyle warnings introduced by [31231] .
          drulli Ulli Hafner added a comment -

          Integrated in Hudson Plug-ins #71
          JENKINS-5185: Fixed CheckStyle warnings introduced by [31231].

          drulli Ulli Hafner added a comment - Integrated in Hudson Plug-ins #71 JENKINS-5185 : Fixed CheckStyle warnings introduced by [31231] .

          People

            drulli Ulli Hafner
            drulli Ulli Hafner
            Votes:
            5 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: