Details
-
Improvement
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
None
Description
Currently, the performance when parsing the log files is poor for some parsers. Maybe we should use another regexp library or strip off some text from the log before starting the parser.
Here are some performance results on a log file of 840 lines:
AcuCobol Compiler: 2439ms
Ada Compiler (gnat): 21ms
Buckminster Compiler: 12ms
Coolflux DSP Compiler: 959ms
Doxygen: 18ms
Eclipse Java Compiler: 19ms
Erlang Compiler: 11ms
Flex SDK Compilers (compc & mxmlc): 113ms
GNU compiler (gcc): 710ms
GNU compiler 4 (gcc): 19ms
GNU compiler 4 (ld): 700ms
IAR compiler (C/C++): 14ms
Intel compiler: 977ms
Java Compiler: 137ms
JavaDoc: 17ms
MSBuild: 4491ms
Oracle Invalids: 53ms
PC-Lint: 4678ms
PHP Runtime Warning: 1569ms
Perforce Compiler: 2531ms
Robocopy (please use /V in your commands!): 1733ms
SUN C++ Compiler: 14ms
Texas Instruments Code Composer Studio (C/C++): 3ms
Here are the results after the optimzation:
AcuCobol Compiler: 15ms
Ada Compiler (gnat): 27ms
Buckminster Compiler: 19ms
Coolflux DSP Compiler: 2ms
Doxygen: 18ms
Eclipse Java Compiler: 29ms
Erlang Compiler: 24ms
Flex SDK Compilers (compc & mxmlc): 6ms
GNU compiler (gcc): 96ms
GNU compiler 4 (gcc): 20ms
GNU compiler 4 (ld): 68ms
IAR compiler (C/C++): 19ms
Intel compiler: 5ms
Java Compiler: 53ms
JavaDoc: 13ms
MSBuild: 72ms
Oracle Invalids: 6ms
PC-Lint: 99ms
PHP Runtime Warning: 2ms
Perforce Compiler: 3ms
Robocopy (please use /V in your commands!): 2ms
SUN C++ Compiler: 4ms
Texas Instruments Code Composer Studio (C/C++): 3ms
I just fixed this issue.
The problem was occuring because most regular expressions were far too permissive. The matched substring could just begin anywhere in the line !!!
By anchoring the regular expression to the beginning of a line, the complexity of regular expression matching is decreased by a factor of n (n being the length of the string). In other words: the regexp engine doesn't have to try matching the regexp by starting at each character of the line one after another !
In the future we should check that every patterns begins by ^ and ends by $, and that there are no pipes '|' at the topmost level. Indeed, anchors do not enclose all alternatives if you don't put them into a group.
In other words:
^a|b$ is wrong
^a$|^b$ is good, but redundant
^(?:a|b)$ is better