Nicolas Gold introduces an EPSRC funded project entitled Hypothesis-Based Conceptual Mapping of Source Code.
Program comprehension is the acquisition of knowledge about programs and a key issue is to understand the link between business concepts and their implementation in source code. Concept recovery is the re-establishment of links between source code and business concepts. This is necessary since initial design knowledge is typically lost during the evolution of software systems.
Hypothesis-Based Concept Assignment (HB-CA) is a successful knowledge-based technique for assigning concepts (actions/objects of interest to the maintainer) to regions of source code that implement those concepts. It undertakes its analysis using informal indicators such as variable names, procedure names and comments.
The process begins by creating a conceptual map of the program being analysed, generating hypotheses for concepts whenever the appropriate indicators are found in code (the matching can be flexible using sub-strings and synonyms).
The resulting list is divided into segments, initially using subroutine boundaries. A particular feature of the method is its use of a self-organising map (an unsupervised neural network) to undertake segmentation flexibly using the conceptual structure of the code in programs with large subroutines or no subroutine structure at all.
HB-CA has been evaluated on real-world COBOL II code from a system provided by a financial services organisation. It shows high recognition accuracy and the flexible segmentation approach works well in most cases.
The approach becomes more difficult to apply as the code evolves.
The aim of this new project is to understand the relationship between software evolution, the degradation of indicators and their structures, and the effectiveness of the concept recovery technique so that the limits of the method can be established and where appropriate extended to maintain concept recovery performance on heavily maintained code.
This will be achieved through empirical study of the method's performance on multiple versions of programs and it is hypothesised that improvements can be made by increasing the richness of the conceptual map produced by the Hypothesis-Based Concept Assignment technique.
The project is funded by EPSRC, will last three years and includes a Ph.D. studentship.