This article is part of a series of blog posts reviewing academic studies into open source software quality.
Exploring the Effects of Process Characteristics on Product Quality in Open Source Software Development, Koch and Neumann, 2008
Koch and Neumann from Vienna University published this paper in 2008. It built on prior published research into open source development processes and was a landmark study in terms of scale, extracting metrics from over 2 million lines of code. Prior published research in this area analysed data from code repositories and bug trackers, however Koch and Neumann were concerned that the resulting metrics focused too much on product rather than process. Their study would review both product AND process, with the particular aim of assessing the impact of software processes on product quality.
12 Java presentation frameworks were selected as the focus of the research including Cocoon, Struts, Maverick and others. Product metrics were taken by running code analysis tools on downloaded stable releases, while process metrics were extracted from the CVS versioning logs relating to the same versions they had downloaded. As the various metrics all related to respective filenames, both product and process metrics could be merged into a single record in a database, following which queries could be run on the merged data. It was the largest study of its kind to date and analysed over 6,000 Java classes containing 2 million lines of code. No prior study had reached more that 700 classes at that time so it’s a significant study in its field.
Cutting to the chase (and through twenty pages of detailed methodology and analysis) some of the interesting results from a software practitioner’s perspective included:A high number of programmers and commits is associated with problems in quality at class levelThe most important negative impact was on code complexity and class sizeOpen source projects often bypass important aspects of design, common violations include failing to refactor designs and large class size.
The authors suggested that some of these pitfalls can be avoided by setting up a design that will cope well with increasing numbers of classes and complexity. They also suggest striving for more equal distribution of commits by organising programmers into small teams to keep quality high. Of course that’s easier said than done in open source development where large numbers of programmers may be active on a project but with a tendency to choose what they work on rather than be allocated tasks.
What is particularly interesting is that Koch and Neumann also proposed thresholds for a set of metrics relating to software quality and suggest that ranking software projects against these thresholds would assist in comparing software quality among projects. Such a thing would be a really useful tool. We already have tools such as Ohloh which publish various metrics through codebase analysis but none of the Ohloh metrics relate to quality. Using some of the methods applied in this research, a tool like Ohloh could be extended to reveal some really insightful data on open source code quality.