Wednesday, March 4, 2009

Some code metrics

I recently received an email with Jack Ganssle's last Embedded Muse, mentioning two tools to calculate code metrics. I wanted to try one of them so I downloaded SourceMonitor.

I ran SourceMonitor and computed the statistics for one of the largest projects I'm working on (embedded ARM7 processor, ethernet, USB along with live audio recording/playing among other functions). I got impressed by some numbers, here are the results:


NOTE:SourceMonitor processed my code only, excluding the TCP/IP stack and the FreeRTOS code which is part of the firmware too.







































Files191
Lines34,019
Statements12,765
Percent Branch Statements19.9
Percent Lines with Comments26.1
Functions538
Average Statements per
Function
25.7
Average Block Depth1.54

One of the first things that impressed me was the fact that 26% of the lines are comments, which is something I'm glad of, considering how hard it can be to maintain such a large project, specially if someone else who never got in touch with this code needs to change or add some functionality or worse: correct a bug. However, there is a trick here which may help SourceMonitor to show a percentage way high from an intuitive value: I usually comment functions in Doxygen style, so one to two lines are 'wasted' in spite of code clarity.

Regarding real statements, 12,765 lines mean about 37% of the total line count. This may look like an unbelievable lie, but actually it's due to many blank lines and comments that improve code readability. SourceMonitor's help clearly explains what 'statements' mean for C coding: "Statements: in C, computational statements are terminated with a semicolon character. Branches such as if, for, while, and goto are also counted as statements. Preprocessor directives #include, #define, and #undef are counted as statements. All other preprocessor directives are ignored. In addition all statements between an #else or #elif statement and its closing #endif statement are ignored, to eliminate fractured block"

Another interesting value is the one regarding the average of statements per function. Modularization and splitting code into relatively is a well known good practice that helps to ease code understanding and extension and also bug solving.

There are other metrics not calculated by SourceMonitor which would be nice to investigate, like preprocessor usage (#define, #else, #if, etc) which is something I often abuse of (in a good sense). It would be nice to be able to count the number of defined macros and macro usage too.

It is also important to remember that code metrics is just one side of the code. Lines can be counted and many statistics can be plotted but code organization is not something easy to measure. I could now say that I spent 26% of the time writing comments, which would sound scary to some people. I could also say I spent 37% of the time coding real statements. Both are big lies. Most of the time is spent thinking on how to implement this or that functionality and probably testing it does right (debugging takes time too). Coding is usually quite straight forward once one's brain is organized: that is, of course, not measurable with SourceMonitor.

UPDATE: I downloaded cloc and ran it over the same code, obtaining nearly 7200 blank lines which is about 21%, which I guess should have it's own post ;) The linux kernel 2.6.26 had about 13% of blank lines (see here), I guess I might be overcoding for beauty...

No comments:

Post a Comment